Whamcloud - gitweb
LU-13569 lnet: Only recover known good peer NIs 19/39719/15
authorChris Horn <chris.horn@hpe.com>
Thu, 16 Jul 2020 03:38:52 +0000 (22:38 -0500)
committerOleg Drokin <green@whamcloud.com>
Tue, 30 Mar 2021 04:15:59 +0000 (04:15 +0000)
A peer NI should not be eligible for recovery if we've never
received a message from it.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iec2fd015f6410ab91c6ef7c222cbed0204243106
Reviewed-on: https://review.whamcloud.com/39719
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/peer.c

index a1d2552..903bf78 100644 (file)
@@ -4017,6 +4017,14 @@ lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni,
        if (atomic_read(&lpni->lpni_healthv) == LNET_MAX_HEALTH_VALUE)
                return;
 
+       if (!lpni->lpni_last_alive) {
+               CDEBUG(D_NET,
+                      "lpni %s(%p) not eligible for recovery last alive %lld\n",
+                      libcfs_nid2str(lpni->lpni_nid), lpni,
+                      lpni->lpni_last_alive);
+               return;
+       }
+
        if (now > lpni->lpni_last_alive + lnet_recovery_limit) {
                CDEBUG(D_NET, "lpni %s aged out last alive %lld\n",
                       libcfs_nid2str(lpni->lpni_nid),