Whamcloud - gitweb
LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status 98/39898/14
authorChris Horn <chris.horn@hpe.com>
Fri, 11 Sep 2020 18:41:39 +0000 (13:41 -0500)
committerOleg Drokin <green@whamcloud.com>
Thu, 26 Nov 2020 09:25:38 +0000 (09:25 +0000)
commitffd4523f2d50ef952112f44ffd524af991b4baed
tree8ef92cb17de4cfd550d7b0a5987af9e2719836b2
parent7ca495ec67f474e10352077fc40123e4818b8e69
LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status

The original intent of the LNET_MSG_STATUS_NETWORK_TIMEOUT health
status was to handle cases where the LND was unsure whether the
failure was due to the local or remote NI. In this case, we'll want
to decrement both the local and remote NI health and allow recovery
to ascertain which interface is actually healthy.

Test-Parameters: trivial
HPE-bug-id: LUS-9342
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib00ac260640100123e4e97e9c566289e92fb0b6e
Reviewed-on: https://review.whamcloud.com/39898
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/lib-msg.c
lustre/tests/sanity-lnet.sh