Whamcloud - gitweb
LU-13569 lnet: Recover peer NI w/exponential backoff interval
authorChris Horn <chris.horn@hpe.com>
Sun, 23 Aug 2020 15:16:18 +0000 (10:16 -0500)
committerAndreas Dilger <adilger@whamcloud.com>
Sat, 23 Mar 2024 20:30:50 +0000 (20:30 +0000)
commitc82bd3b3d13b7618a3185a6c76127b4de92fc02d
treefb9a8c51a6e9928cc9179d9535ae25f53f135f70
parent07427ce089e33f7fd8d305c265e447b95b934477
LU-13569 lnet: Recover peer NI w/exponential backoff interval

Perform LNet recovery pings of peer NIs with an exponential backoff
interval.
 - The interval is equal to 2^(number failed pings) up to a maximum
   of 900 seconds (15 minutes).
 - When a message is received the count of failed pings for the
   associated peer NI is reset to 0 so that recovery can happen more
   quickly.

Lustre-change: https://review.whamcloud.com/39720
Lustre-commit: 917553c537a8860f57a50dc9752e3ac69d06c11c

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic7e60455015a0236a96010c07fc0ddd02078cf92
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/include/lnet/lib-types.h
lnet/lnet/lib-move.c
lnet/lnet/lib-msg.c
lnet/lnet/peer.c