lc_watchdog_disable will not delete kernel timer, benefit of this
is no overhead of del_timer in each serivce loop, however, there
is a corner case that lc_watchdog_touch can race with lcw_cb when
service thread is actually idle:
1. service thread armed a timer before processing request, after
it handled request, it called lc_watchdog_disable() and set
lc_watchdog as disabled, but kernel timer is still outstanding.
2. service thread slept for many seconds because there is no
request to handle
3. it's waken up by incoming request
4. it called lc_watchdog_touch and set watchdog status to enabled.
now because timer is still alive, if timer is expired right
before cfs_timer_arm(), then lcw_cb will set a disabled watchdog
as "expired", which is wrong.
5. the next call of lc_watchdog_disable will complain for expired
watchdog.
There are two options to fix this issue:
1. always del_timer in lc_watchdog_disable, however, it may increase
system overhead when service threads are busy.
2. set watchdog as "enabled" after cfs_timer_arm, so there is no
window for lcw_cb to see false status of watchdog.
This patch chooses the second approach to fix the problem
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: Iac24082e2d63de8330285cf243ed585da6524ab9
Reviewed-on: http://review.whamcloud.com/11415
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
static void lcw_update_time(struct lc_watchdog *lcw, const char *message)
{
- cfs_time_t newtime = cfs_time_current();;
+ cfs_time_t newtime = cfs_time_current();
if (lcw->lcw_state == LC_WATCHDOG_EXPIRED) {
struct timeval timediff;
lc_watchdog_del_pending(lcw);
lcw_update_time(lcw, "resumed");
- lcw->lcw_state = LC_WATCHDOG_ENABLED;
cfs_timer_arm(&lcw->lcw_timer, cfs_time_current() +
cfs_time_seconds(timeout));
+ lcw->lcw_state = LC_WATCHDOG_ENABLED;
EXIT;
}