From ba0a1b36870807e8182189bcb08f7b105aff6c57 Mon Sep 17 00:00:00 2001 From: Hongchao Zhang Date: Fri, 10 Oct 2014 06:43:31 +0800 Subject: [PATCH] LU-5816 target: don't trigger watchdog waiting in recovery In target_recovery_thread, the process should not be considered to be "blocked state" if it was waiting something to happen, otherwise, the kernel watchdog will print: task tgt_recov:19764 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. tgt_recov D 0000000000000003 0 19764 2 0x00000000 Call Trace: check_for_clients+0x0/0x70 [ptlrpc] target_recovery_overseer+0x9d/0x230 [ptlrpc] exp_connect_healthy+0x0/0x20 [ptlrpc] autoremove_wake_function+0x0/0x40 target_recovery_thread+0x0/0x1920 [ptlrpc] Change-Id: Ic1ad4dce1df974dd99e0b28cee211de173d178e5 Signed-off-by: Hongchao Zhang Reviewed-on: http://review.whamcloud.com/12672 Tested-by: Jenkins Tested-by: Maloo Reviewed-by: Andreas Dilger Reviewed-by: Fan Yong --- lustre/ldlm/ldlm_lib.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lustre/ldlm/ldlm_lib.c b/lustre/ldlm/ldlm_lib.c index 8c5ad6b..d8825bb 100644 --- a/lustre/ldlm/ldlm_lib.c +++ b/lustre/ldlm/ldlm_lib.c @@ -1768,7 +1768,11 @@ repeat: obd->obd_abort_recovery = 1; } - wait_event(obd->obd_next_transno_waitq, check_routine(obd)); + while (wait_event_timeout(obd->obd_next_transno_waitq, + check_routine(obd), + msecs_to_jiffies(60 * MSEC_PER_SEC)) == 0) + /* wait indefinitely for event, but don't trigger watchdog */; + if (obd->obd_abort_recovery) { CWARN("recovery is aborted, evict exports in recovery\n"); /** evict exports which didn't finish recovery yet */ -- 1.8.3.1