Whamcloud - gitweb
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT 30/53930/2
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Mon, 5 Feb 2024 23:27:15 +0000 (15:27 -0800)
committerOleg Drokin <green@whamcloud.com>
Fri, 23 Feb 2024 07:16:04 +0000 (07:16 +0000)
commit099350d6e30218eb68d31cbfc7e9252a112e591f
tree6b8c457ccf407f61b2f9b6d36881eaab6f3a60fd
parentdba41355565397228f587f13a901b5d762521ed0
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT

Returning LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet on ETIMEDOUT
causes LNet to only decrement the local NI health score,
while the issue may actually be with the remote NI.

Changing this to return LNET_MSG_STATUS_NETWORK_TIMEOUT
causes LNet to decrement both local NI and peer NI health.
If local NI is ok, it will recover its health score quickly,
but the affected peer NI health is lowered until peer NI is recovered.
This helps LNet select healthy NIs of the same peer in the meantime.

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I916772477d1fd63571447262880a33830746f002
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/klnds/socklnd/socklnd_cb.c