Whamcloud - gitweb
LU-13569 lnet: Recover peer NI w/exponential backoff interval 20/39720/15
authorChris Horn <chris.horn@hpe.com>
Sun, 23 Aug 2020 15:16:18 +0000 (10:16 -0500)
committerOleg Drokin <green@whamcloud.com>
Tue, 30 Mar 2021 04:16:05 +0000 (04:16 +0000)
commit917553c537a8860f57a50dc9752e3ac69d06c11c
tree1f8cc43ce2fd009b1bb6aacfb9d209bfce0b1b24
parent39a169cd02738a13866f3b88fbe3304dc20565d6
LU-13569 lnet: Recover peer NI w/exponential backoff interval

Perform LNet recovery pings of peer NIs with an exponential backoff
interval.
 - The interval is equal to 2^(number failed pings) up to a maximum
   of 900 seconds (15 minutes).
 - When a message is received the count of failed pings for the
   associated peer NI is reset to 0 so that recovery can happen more
   quickly.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic7e60455015a0236a96010c07fc0ddd02078cf92
Reviewed-on: https://review.whamcloud.com/39720
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/include/lnet/lib-types.h
lnet/lnet/lib-move.c
lnet/lnet/lib-msg.c
lnet/lnet/peer.c