From 173d86c6e9a704a84de36ae57a337a3fdae7b1ed Mon Sep 17 00:00:00 2001 From: Chris Horn Date: Wed, 9 Dec 2020 14:38:57 -0600 Subject: [PATCH] LU-14206 lnet: Router ping timeout with discovery disabled Discovery pings are used to determine the health of gateways and associated routes. Ping replies from gateways with dynamic discovery (DD) disabled (or if DD is disabled locally) are handled in a special routine, lnet_router_discovery_ping_reply(), but this function and related code doesn't handle the case where a discovery ping hits the response tracker timeout and is unlinked by the monitor thread. In this case, an UNLINK event is generated and we do not call the lnet_router_discovery_ping_reply(). For gateways with DD enabled (and DD enabled locally), we handle this case in lnet_router_discovery_complete(). If discovery failed then lp_dc_error is set and we mark all routes down for the gateway. We can simply extend this logic to the case of gateways w/DD disabled (or DD disabled locally). Test-Parameters: trivial Fixes: 2832478194 ("LU-13029 lnet: fix asym routing with multi-hop") HPE-bug-id: LUS-9612 Signed-off-by: Chris Horn Change-Id: I009c69d4f8990b72d83d9426c782c0e55c1023a4 Reviewed-on: https://review.whamcloud.com/40923 Tested-by: jenkins Tested-by: Maloo Reviewed-by: Cyril Bordage Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- lnet/lnet/router.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lnet/lnet/router.c b/lnet/lnet/router.c index 9357dd0..e5dd77b 100644 --- a/lnet/lnet/router.c +++ b/lnet/lnet/router.c @@ -507,11 +507,11 @@ lnet_router_discovery_complete(struct lnet_peer *lp) lp->lp_alive = lp->lp_dc_error == 0; spin_unlock(&lp->lp_lock); - /* ping replies are being handled when discovery is disabled */ - if (lnet_is_discovery_disabled_locked(lp)) - return; - if (!lp->lp_dc_error) { + /* ping replies are being handled when discovery is disabled */ + if (lnet_is_discovery_disabled_locked(lp)) + return; + /* * mark single-hop routes. If the remote net is not configured on * the gateway we assume this is intentional and we mark the -- 1.8.3.1