Whamcloud - gitweb
LU-14810 lnet: Cancel discovery ping/push on shutdown 56/53356/2
authorChris Horn <chris.horn@hpe.com>
Tue, 5 Dec 2023 09:56:57 +0000 (03:56 -0600)
committerOleg Drokin <green@whamcloud.com>
Wed, 20 Dec 2023 01:44:45 +0000 (01:44 +0000)
Discovery shutdown can race with ping and push events. In some cases
this can result in failing to unlink ping/push MDs on shutdown.
Protect against this by checking for PING/PUSH_FAILED state on peers
on the request queue.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=500,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I84a1f5beb6508651bc62e1dd93271f9e72f5081c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53356
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/peer.c

index 90b376b..4811fb0 100644 (file)
@@ -4143,6 +4143,14 @@ static int lnet_peer_discovery(void *arg)
        while (!list_empty(&the_lnet.ln_dc_request)) {
                lp = list_first_entry(&the_lnet.ln_dc_request,
                                      struct lnet_peer, lp_dc_list);
+               lnet_net_unlock(LNET_LOCK_EX);
+               spin_lock(&lp->lp_lock);
+               if (lp->lp_state & LNET_PEER_PING_FAILED)
+                       rc = lnet_peer_ping_failed(lp);
+               if (lp->lp_state & LNET_PEER_PUSH_FAILED)
+                       rc = lnet_peer_push_failed(lp);
+               spin_unlock(&lp->lp_lock);
+               lnet_net_lock(LNET_LOCK_EX);
                lnet_peer_discovery_complete(lp, -ESHUTDOWN);
        }
        lnet_net_unlock(LNET_LOCK_EX);