Whamcloud - gitweb
LU-17854 lnet: Router should not drop msg past deadline 31/55131/7
authorChris Horn <chris.horn@hpe.com>
Wed, 22 May 2024 19:34:25 +0000 (13:34 -0600)
committerOleg Drokin <green@whamcloud.com>
Mon, 10 Jun 2024 06:13:23 +0000 (06:13 +0000)
commit182b0101d3b803173040de2c69e41955fd6728b2
treef94d04c9521ceac638ece0e02ca8ecbcd73a2e41
parentdccb85fcb80f9536cbb5d54030d3c1d141635603
LU-17854 lnet: Router should not drop msg past deadline

It has been observed that messages can become queued in LNet on
router nodes so long that they exceed their message deadlines. These
messages will currently be dropped, even if the target peer is alive.
PtlRPC adaptive timeouts can dynamically increase to account for the
increased network latency, but if the RPCs are dropped on routers then
these operations will fail. Routers should only drop messages when
the router peer health feature determines the target is down. This
gives Lustre the best chance to complete operations during periods of
increased network latency.

A bug in sanity-lnet/do_route_del() is fixed. The lnetctl route show
output was stored in a variable named "output", but the variable
"lnetctl_text" was checked to determine if the route needed to be
deleted.

test_102() was also modified to call cleanup_router_test(). A
comment there indicated it was not needed because the routes were
already deleted, but cleanup_router_test() does more than just
delete the route entries. Namely, unloading modules on all nodes.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-12153
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1e6966d4a3a2b10dd7b99620774d5c32b7eccd1f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/lnet/lib-move.c
lustre/tests/sanity-lnet.sh