From 1d8cfa61e873ddcaecfe60a2a2c3e26bde5b4e34 Mon Sep 17 00:00:00 2001 From: Liang Zhen Date: Tue, 12 Aug 2014 22:36:42 +0800 Subject: [PATCH] LU-5428 libcfs: false alarm of libcfs watchdog lc_watchdog_disable will not delete kernel timer, benefit of this is no overhead of del_timer in each serivce loop, however, there is a corner case that lc_watchdog_touch can race with lcw_cb when service thread is actually idle: 1. service thread armed a timer before processing request, after it handled request, it called lc_watchdog_disable() and set lc_watchdog as disabled, but kernel timer is still outstanding. 2. service thread slept for many seconds because there is no request to handle 3. it's waken up by incoming request 4. it called lc_watchdog_touch and set watchdog status to enabled. now because timer is still alive, if timer is expired right before cfs_timer_arm(), then lcw_cb will set a disabled watchdog as "expired", which is wrong. 5. the next call of lc_watchdog_disable will complain for expired watchdog. There are two options to fix this issue: 1. always del_timer in lc_watchdog_disable, however, it may increase system overhead when service threads are busy. 2. set watchdog as "enabled" after cfs_timer_arm, so there is no window for lcw_cb to see false status of watchdog. This patch chooses the second approach to fix the problem Signed-off-by: Liang Zhen Change-Id: Iac24082e2d63de8330285cf243ed585da6524ab9 Reviewed-on: http://review.whamcloud.com/11415 Tested-by: Jenkins Reviewed-by: Niu Yawei Reviewed-by: Bobi Jam Tested-by: Maloo Reviewed-by: Oleg Drokin --- libcfs/libcfs/watchdog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libcfs/libcfs/watchdog.c b/libcfs/libcfs/watchdog.c index ed1acf7..0bbe5c9 100644 --- a/libcfs/libcfs/watchdog.c +++ b/libcfs/libcfs/watchdog.c @@ -378,7 +378,7 @@ EXPORT_SYMBOL(lc_watchdog_add); static void lcw_update_time(struct lc_watchdog *lcw, const char *message) { - cfs_time_t newtime = cfs_time_current();; + cfs_time_t newtime = cfs_time_current(); if (lcw->lcw_state == LC_WATCHDOG_EXPIRED) { struct timeval timediff; @@ -419,10 +419,10 @@ void lc_watchdog_touch(struct lc_watchdog *lcw, int timeout) lc_watchdog_del_pending(lcw); lcw_update_time(lcw, "resumed"); - lcw->lcw_state = LC_WATCHDOG_ENABLED; cfs_timer_arm(&lcw->lcw_timer, cfs_time_current() + cfs_time_seconds(timeout)); + lcw->lcw_state = LC_WATCHDOG_ENABLED; EXIT; } -- 1.8.3.1