Whamcloud - gitweb
LU-17578 lnet: fix &the_lnet.ln_mt_peerNIRecovq race 63/54163/8
authorBruno Faccini <bfaccini@nvidia.com>
Fri, 23 Feb 2024 12:16:36 +0000 (13:16 +0100)
committerOleg Drokin <green@whamcloud.com>
Wed, 13 Mar 2024 03:21:41 +0000 (03:21 +0000)
To avoid race &the_lnet.ln_mt_peerNIRecovq must always be
accessed with lnet_net_lock(0) protection.

Test-Parameters: trivial
Fixes: da23037 ("LU-16563 lnet: use discovered ni status to set initial health")
Change-Id: Ic5e0194020200afdecba4cbf5afed274b14da388
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54163
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
lnet/lnet/peer.c

index 5c3f3d3..f890d3f 100644 (file)
@@ -3191,9 +3191,11 @@ int ping_info_count_entries(struct lnet_ping_buffer *pbuf)
 
 static inline void handle_disc_lpni_health(struct lnet_peer_ni *lpni)
 {
-       if (lpni->lpni_ns_status == LNET_NI_STATUS_DOWN)
+       if (lpni->lpni_ns_status == LNET_NI_STATUS_DOWN) {
+               lnet_net_lock(0);
                lnet_handle_remote_failure_locked(lpni);
-       else if (lpni->lpni_ns_status == LNET_NI_STATUS_UP &&
+               lnet_net_unlock(0);
+       } else if (lpni->lpni_ns_status == LNET_NI_STATUS_UP &&
                 !lpni->lpni_last_alive)
                atomic_set(&lpni->lpni_healthv, LNET_MAX_HEALTH_VALUE);
 }