Whamcloud - gitweb
LU-14206 lnet: Router ping timeout with discovery disabled 23/40923/3
authorChris Horn <chris.horn@hpe.com>
Wed, 9 Dec 2020 20:38:57 +0000 (14:38 -0600)
committerOleg Drokin <green@whamcloud.com>
Wed, 28 Apr 2021 02:10:46 +0000 (02:10 +0000)
commit173d86c6e9a704a84de36ae57a337a3fdae7b1ed
tree81de31deab720bca22f39a0c91f96c551cd440f6
parent8ce1d41b9cacafe8d3833dc8578c6057fba2ef9a
LU-14206 lnet: Router ping timeout with discovery disabled

Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Test-Parameters: trivial
Fixes: 2832478194 ("LU-13029 lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I009c69d4f8990b72d83d9426c782c0e55c1023a4
Reviewed-on: https://review.whamcloud.com/40923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/router.c