Whamcloud - gitweb
LU-14206 lnet: Router ping timeout with discovery disabled
authorChris Horn <chris.horn@hpe.com>
Wed, 9 Dec 2020 20:38:57 +0000 (14:38 -0600)
committerAndreas Dilger <adilger@whamcloud.com>
Fri, 9 Sep 2022 01:39:20 +0000 (01:39 +0000)
Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Lustre-change: https://review.whamcloud.com/40923
Lustre-commit: 173d86c6e9a704a84de36ae57a337a3fdae7b1ed

Test-Parameters: trivial
Fixes: 9f337d94e7 ("LU-13029 lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I009c69d4f8990b72d83d9426c782c0e55c1023a4
Reviewed-on: https://review.whamcloud.com/48382
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lnet/lnet/router.c

index 26aa048..8ab85fa 100644 (file)
@@ -506,11 +506,11 @@ lnet_router_discovery_complete(struct lnet_peer *lp)
        lp->lp_alive = lp->lp_dc_error == 0;
        spin_unlock(&lp->lp_lock);
 
-       /* ping replies are being handled when discovery is disabled */
-       if (lnet_is_discovery_disabled_locked(lp))
-               return;
-
        if (!lp->lp_dc_error) {
+               /* ping replies are being handled when discovery is disabled */
+               if (lnet_is_discovery_disabled_locked(lp))
+                       return;
+
                /*
                * mark single-hop routes.  If the remote net is not configured on
                * the gateway we assume this is intentional and we mark the