Whamcloud - gitweb
LU-5816 target: don't trigger watchdog waiting in recovery 72/12672/7
authorHongchao Zhang <hongchao.zhang@intel.com>
Thu, 9 Oct 2014 22:43:31 +0000 (06:43 +0800)
committerOleg Drokin <oleg.drokin@intel.com>
Tue, 3 Feb 2015 17:54:01 +0000 (17:54 +0000)
In target_recovery_thread, the process should not be considered
to be "blocked state" if it was waiting something to happen,
otherwise, the kernel watchdog will print:

task tgt_recov:19764 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
tgt_recov     D 0000000000000003     0 19764      2 0x00000000
Call Trace:
check_for_clients+0x0/0x70 [ptlrpc]
target_recovery_overseer+0x9d/0x230 [ptlrpc]
exp_connect_healthy+0x0/0x20 [ptlrpc]
autoremove_wake_function+0x0/0x40
target_recovery_thread+0x0/0x1920 [ptlrpc]

Change-Id: Ic1ad4dce1df974dd99e0b28cee211de173d178e5
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/12672
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
lustre/ldlm/ldlm_lib.c

index 8c5ad6b..d8825bb 100644 (file)
@@ -1768,7 +1768,11 @@ repeat:
                obd->obd_abort_recovery = 1;
        }
 
-       wait_event(obd->obd_next_transno_waitq, check_routine(obd));
+       while (wait_event_timeout(obd->obd_next_transno_waitq,
+                                 check_routine(obd),
+                                 msecs_to_jiffies(60 * MSEC_PER_SEC)) == 0)
+               /* wait indefinitely for event, but don't trigger watchdog */;
+
        if (obd->obd_abort_recovery) {
                CWARN("recovery is aborted, evict exports in recovery\n");
                /** evict exports which didn't finish recovery yet */