Whamcloud - gitweb
LU-14810 lnet: Cancel discovery ping/push on shutdown
authorChris Horn <chris.horn@hpe.com>
Tue, 5 Dec 2023 09:56:57 +0000 (03:56 -0600)
committerAndreas Dilger <adilger@whamcloud.com>
Fri, 2 Feb 2024 16:06:01 +0000 (16:06 +0000)
Discovery shutdown can race with ping and push events. In some cases
this can result in failing to unlink ping/push MDs on shutdown.
Protect against this by checking for PING/PUSH_FAILED state on peers
on the request queue.

Lustre-change: https://review.whamcloud.com/53356
Lustre-commit: c3b9597742d5118a96f56129e7dd30d84468d2c8

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=500,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I84a1f5beb6508651bc62e1dd93271f9e72f5081c
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53848
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lnet/lnet/peer.c

index 7833576..7362e0c 100644 (file)
@@ -3873,6 +3873,14 @@ static int lnet_peer_discovery(void *arg)
        while (!list_empty(&the_lnet.ln_dc_request)) {
                lp = list_first_entry(&the_lnet.ln_dc_request,
                                      struct lnet_peer, lp_dc_list);
+               lnet_net_unlock(LNET_LOCK_EX);
+               spin_lock(&lp->lp_lock);
+               if (lp->lp_state & LNET_PEER_PING_FAILED)
+                       rc = lnet_peer_ping_failed(lp);
+               if (lp->lp_state & LNET_PEER_PUSH_FAILED)
+                       rc = lnet_peer_push_failed(lp);
+               spin_unlock(&lp->lp_lock);
+               lnet_net_lock(LNET_LOCK_EX);
                lnet_peer_discovery_complete(lp, -ESHUTDOWN);
        }
        lnet_net_unlock(LNET_LOCK_EX);