Whamcloud - gitweb
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Mon, 5 Feb 2024 23:27:15 +0000 (15:27 -0800)
committerAndreas Dilger <adilger@whamcloud.com>
Sat, 24 Feb 2024 03:44:05 +0000 (03:44 +0000)
commit2417dec0362fd54a80b83705e584c6f635749796
tree118a804e1e8e072a87696de094abf6872dac5455
parent36644f0e3fc8cce14b4b62ed82ea7f7b075789d6
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT

Returning LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet on ETIMEDOUT
causes LNet to only decrement the local NI health score,
while the issue may actually be with the remote NI.

Changing this to return LNET_MSG_STATUS_NETWORK_TIMEOUT
causes LNet to decrement both local NI and peer NI health.
If local NI is ok, it will recover its health score quickly,
but the affected peer NI health is lowered until peer NI is recovered.
This helps LNet select healthy NIs of the same peer in the meantime.

Lustre-change: https://review.whamcloud.com/53930
Lustre-commit: 099350d6e30218eb68d31cbfc7e9252a112e591f

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I916772477d1fd63571447262880a33830746f002
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53964
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lnet/klnds/socklnd/socklnd_cb.c