Whamcloud - gitweb
LU-14206 lnet: Router ping timeout with discovery disabled 23/40923/3
authorChris Horn <chris.horn@hpe.com>
Wed, 9 Dec 2020 20:38:57 +0000 (14:38 -0600)
committerOleg Drokin <green@whamcloud.com>
Wed, 28 Apr 2021 02:10:46 +0000 (02:10 +0000)
Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Test-Parameters: trivial
Fixes: 2832478194 ("LU-13029 lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I009c69d4f8990b72d83d9426c782c0e55c1023a4
Reviewed-on: https://review.whamcloud.com/40923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/router.c

index 9357dd0..e5dd77b 100644 (file)
@@ -507,11 +507,11 @@ lnet_router_discovery_complete(struct lnet_peer *lp)
        lp->lp_alive = lp->lp_dc_error == 0;
        spin_unlock(&lp->lp_lock);
 
-       /* ping replies are being handled when discovery is disabled */
-       if (lnet_is_discovery_disabled_locked(lp))
-               return;
-
        if (!lp->lp_dc_error) {
+               /* ping replies are being handled when discovery is disabled */
+               if (lnet_is_discovery_disabled_locked(lp))
+                       return;
+
                /*
                * mark single-hop routes.  If the remote net is not configured on
                * the gateway we assume this is intentional and we mark the