Whamcloud - gitweb
LU-12233 lnet: deadlock on LNet shutdown 71/40171/2
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Wed, 7 Oct 2020 22:13:31 +0000 (18:13 -0400)
committerOleg Drokin <green@whamcloud.com>
Thu, 29 Oct 2020 07:49:44 +0000 (07:49 +0000)
Release ln_api_mutex during LNet shutdown while waiting
for zombie LNI to allow other threads to read the LNet
state updated by the shutdown and fall through, avoiding
the deadlock

Lustre-change: https://review.whamcloud.com/39933
Lustre-commit: e0c445648a38fb72cc426ac0c16c33f5183cda08

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If0886f1bc4412dd9cacb08a0f06fa69aeeed1c5b
Reviewed-on: https://review.whamcloud.com/40171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/api-ni.c

index 57ec9c8..04a14be 100644 (file)
@@ -2002,7 +2002,13 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
                }
 
                if (!list_empty(&ni->ni_netlist)) {
+                       /* Unlock mutex while waiting to allow other
+                        * threads to read the LNet state and fall through
+                        * to avoid deadlock
+                        */
                        lnet_net_unlock(LNET_LOCK_EX);
+                       mutex_unlock(&the_lnet.ln_api_mutex);
+
                        ++i;
                        if ((i & (-i)) == i) {
                                CDEBUG(D_WARNING,
@@ -2011,6 +2017,8 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
                        }
                        set_current_state(TASK_UNINTERRUPTIBLE);
                        schedule_timeout(cfs_time_seconds(1));
+
+                       mutex_lock(&the_lnet.ln_api_mutex);
                        lnet_net_lock(LNET_LOCK_EX);
                        continue;
                }