From: Serguei Smirnov Date: Wed, 7 Oct 2020 22:51:06 +0000 (-0400) Subject: LU-13892 lnet: lock-up during router check X-Git-Tag: 2.12.6-RC1~25 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=refs%2Fchanges%2F72%2F40172%2F2;p=fs%2Flustre-release.git LU-13892 lnet: lock-up during router check This is a fix for the issue with LNet lock-up while waiting for routers to become active with check_routers_before_use option. Release ln_api_mutex while waiting to allow incoming connections to be handled. Signed-off-by: Serguei Smirnov Change-Id: I63b1d1ce5ee2b27a3bd2cea78713fc6fc7502cf7 Reviewed-on: https://review.whamcloud.com/40172 Tested-by: jenkins Reviewed-by: Olaf Faaland-LLNL Tested-by: Maloo Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin --- diff --git a/lnet/lnet/router.c b/lnet/lnet/router.c index 806cf8c..e2966cf 100644 --- a/lnet/lnet/router.c +++ b/lnet/lnet/router.c @@ -842,6 +842,7 @@ lnet_wait_known_routerstate(void) LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_RUNNING); + /* the_lnet.ln_api_mutex must be locked */ for (;;) { int cpt = lnet_net_lock_current(); @@ -865,8 +866,10 @@ lnet_wait_known_routerstate(void) if (all_known) return; + mutex_unlock(&the_lnet.ln_api_mutex); set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(cfs_time_seconds(1)); + mutex_lock(&the_lnet.ln_api_mutex); } }