Whamcloud - gitweb
LU-13569 lnet: Age peer NI out of recovery
authorChris Horn <chris.horn@hpe.com>
Sun, 23 Aug 2020 15:14:22 +0000 (10:14 -0500)
committerAndreas Dilger <adilger@whamcloud.com>
Sat, 23 Mar 2024 20:30:29 +0000 (20:30 +0000)
commit8b5c3877fd694fe8513cca825624c79cfd90facf
treefb79fa24f18f495ef90f0d0d36b061f5fa44b0bb
parent5a52f4d8ad60b2fb7856cda4c35f0e322efbd416
LU-13569 lnet: Age peer NI out of recovery

No longer send recovery pings to a peer NI that has been in recovery
for the recovery time limit. A peer NI will become eligible for
recovery again once we receive a message from it.

The existing lpni_last_alive field is utilized for this new purpose.

A check for NULL lpni is removed from
lnet_handle_remote_failure_locked() because all callers of that
function already ensure the lpni is non-NULL.

lnet_peer_ni_add_to_recoveryq_locked() now takes the recovery queue
as an argument rather than using the_lnet.ln_mt_peerNIRecovq. This
allows the function to be used by lnet_recover_peer_nis().
lnet_peer_ni_add_to_recoveryq_locked() is also modified to take a ref
on the peer NI if it is added to the recovery queue. Previously, it
was the responsibility of callers to take this ref.

Lustre-change: https://review.whamcloud.com/39718
Lustre-commit: cc27201a76574b51dc3ffb37f039b3364cab386d

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib4676540ac4bb040690a4fb047236c54eea0e752
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54400
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/lnet/lib-move.c
lnet/lnet/lib-msg.c
lnet/lnet/peer.c