Whamcloud - gitweb
LU-9120 lnet: handle remote errors in LNet 67/32767/15
authorAmir Shehata <amir.shehata@intel.com>
Fri, 22 Jun 2018 17:42:23 +0000 (10:42 -0700)
committerAmir Shehata <ashehata@whamcloud.com>
Fri, 17 Aug 2018 19:55:19 +0000 (19:55 +0000)
commit76fad19c2deaa72b5b70eff4bf9d84e20a42a74e
treef6633601c90fee420874aab40ebd43670a48b192
parent25c1cb2c4d6f4430c8e1be915f5e8742ba16a94c
LU-9120 lnet: handle remote errors in LNet

Add health value in the peer NI structure. Decrement the
value whenever there is an error sending to the peer.
Modify the selection algorithm to look at the peer NI health
value when selecting the best peer NI to send to.

Put the peer NI on the recovery queue whenever there is
an error sending to it. Attempt only to resend on REMOTE
DROPPED since we're sure the message was never received by
the peer. For other errors finalize the message.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibcb41b3fb538e76b973bcb10fcd07638c118acb9
Reviewed-on: https://review.whamcloud.com/32767
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/include/lnet/lib-types.h
lnet/lnet/api-ni.c
lnet/lnet/lib-move.c
lnet/lnet/lib-msg.c
lnet/lnet/peer.c