Whamcloud - gitweb
LU-16999 lnet: Restore lpni aliveness check 91/51791/2
authorChris Horn <chris.horn@hpe.com>
Tue, 25 Apr 2023 18:53:46 +0000 (13:53 -0500)
committerOleg Drokin <green@whamcloud.com>
Thu, 24 Aug 2023 04:34:42 +0000 (04:34 +0000)
commit993d27d9ecc86bd030ca788bf9249485b11cdf8a
treece0830bcf009b69d6ee1d2f61e04741f327a810f
parente10e1c9b0ae20ee550efb32b220239d35a00d1ac
LU-16999 lnet: Restore lpni aliveness check

This is a revert of the following master change:

Lustre-change: https://review.whamcloud.com/46623/
Lustre-commit: caf6095ade66f70d4bad99ced7a918814a3af092

That patch restored the historic behavior of the LNet router peer
health feature, but it did not account for the fact that the old lnet
router checker behaved differently than the current implementation
that leverages LNet discovery to perform the router checker pings.
Because of this change to use discovery we can no longer guarantee
that each router end point will be ping'd within the peer aliveness
window, and as a result the router may incorrectly determine that some
peer NIs are not alive.

Revert this change until a long term solution can be found.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11604
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I77f4bd64b616693ab2c91c747bf327c6f71689c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/lib-move.c
lustre/tests/sanity-lnet.sh