Whamcloud - gitweb
LU-15102 lnet: Reset ni_ping_count only on receive 35/45235/3
authorChris Horn <chris.horn@hpe.com>
Wed, 13 Oct 2021 23:30:01 +0000 (18:30 -0500)
committerOleg Drokin <green@whamcloud.com>
Sat, 20 Nov 2021 06:26:14 +0000 (06:26 +0000)
The lnet_ni:ni_ping_count is currently reset on every (healthy) tx.
We should only reset it when receiving a message over the NI. Taking
net_lock 0 on every tx results in a performance loss for certain
workloads.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8fdf2bc62a ("LU-13569 lnet: Recover local NI w/exponential backoff interval")
HPE-bug-id: LUS-10427
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I67ea3aa977cb5d67b04f7957120c29e9985c83e6
Reviewed-on: https://review.whamcloud.com/45235
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/lib-msg.c

index 607001b..1ed75cd 100644 (file)
@@ -890,8 +890,6 @@ lnet_health_check(struct lnet_msg *msg)
                 * faster recovery.
                 */
                lnet_inc_healthv(&ni->ni_healthv, lnet_health_sensitivity);
-               lnet_net_lock(0);
-               ni->ni_ping_count = 0;
                /*
                 * It's possible msg_txpeer is NULL in the LOLND
                 * case. Only increment the peer's health if we're
@@ -901,7 +899,9 @@ lnet_health_check(struct lnet_msg *msg)
                 * as indication that the router is fully healthy.
                 */
                if (lpni && msg->msg_rx_committed) {
+                       lnet_net_lock(0);
                        lpni->lpni_ping_count = 0;
+                       ni->ni_ping_count = 0;
                        /*
                         * If we're receiving a message from the router or
                         * I'm a router, then set that lpni's health to
@@ -927,8 +927,8 @@ lnet_health_check(struct lnet_msg *msg)
                                                &the_lnet.ln_mt_peerNIRecovq,
                                                ktime_get_seconds());
                        }
+                       lnet_net_unlock(0);
                }
-               lnet_net_unlock(0);
 
                /* we can finalize this message */
                return -1;