Whamcloud - gitweb
LU-16563 lnet: use discovered ni status to set initial health 27/50027/8
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Thu, 16 Feb 2023 18:34:03 +0000 (10:34 -0800)
committerOleg Drokin <green@whamcloud.com>
Tue, 28 Mar 2023 22:15:33 +0000 (22:15 +0000)
commitda230373bd14306cb97fb48748ebce205f09d468
tree271edf1f0e9caf028d51963a12b3eec715e6fa76
parent0366422cfd1e972978d2617d174240656cf07f77
LU-16563 lnet: use discovered ni status to set initial health

If not routing, track local NI status in the ping buffer
such that locally recognized "down" state, for example,
due to a downed network interface/link, is available
to any discovering peer.
If NI 'fatal' status is changed, push update to peers.

On the active side of discovery, check peer NI status so if NI
is down, decrement its health score and queue for recovery.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I513c7942099c0da9088fa6d4460f76386ea91d3b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50027
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/klnds/o2iblnd/o2iblnd.c
lnet/klnds/socklnd/socklnd.c
lnet/lnet/api-ni.c
lnet/lnet/peer.c