From: Patrick Farrell Date: Mon, 20 Jun 2016 21:15:51 +0000 (-0500) Subject: LU-8307 ldlm: cond_resched in ldlm_bl_thread_main X-Git-Tag: 2.9.58~80 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=c156613b29be6fcee13d0df7008f0cd7847a5263;hp=2ffbcc9f9ad930fee2df53238b3244b7c3e6bb91 LU-8307 ldlm: cond_resched in ldlm_bl_thread_main When clearing all of the ldlm LRUs (as Cray does at the end of a job), a ldlm_bl_work_item is generated for each namespace and then they are placed on a list for the ldlm_bl threads to iterate over. If the number of namespaces greatly exceeds the number of ldlm_bl threads, a given thread will iterate over many namespaces without sleeping looking for work. This can go on for an extremely long time and result in an RCU stall. This patch adds a cond_resched() between completing one work item and looking for the next. This is a fairly cheap operation, as it will only schedule if there is an interrupt waiting, and it will not be called too much - Even the largest file systems have < 100 namespaces per ldlm_bl_thread currently. Signed-off-by: Patrick Farrell Change-Id: Ic8022faf641ad6ab02462ab376a4bfd510dca14c Reviewed-on: https://review.whamcloud.com/20888 Tested-by: Jenkins Tested-by: Maloo Reviewed-by: Ned Bass Reviewed-by: Ann Koehler Reviewed-by: Ben Evans Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- diff --git a/lustre/ldlm/ldlm_lockd.c b/lustre/ldlm/ldlm_lockd.c index 651eebe..b29b721 100644 --- a/lustre/ldlm/ldlm_lockd.c +++ b/lustre/ldlm/ldlm_lockd.c @@ -2767,6 +2767,11 @@ static int ldlm_bl_thread_main(void *arg) if (rc == LDLM_ITER_STOP) break; + + /* If there are many namespaces, we will not sleep waiting for + * work, and must do a cond_resched to avoid holding the CPU + * for too long */ + cond_resched(); } atomic_dec(&blp->blp_num_threads);