Whamcloud - gitweb
LU-12163 lnet: fix cpt locking 07/34607/9
authorAmir Shehata <ashehata@whamcloud.com>
Sat, 6 Apr 2019 00:38:38 +0000 (17:38 -0700)
committerAmir Shehata <ashehata@whamcloud.com>
Fri, 7 Jun 2019 18:06:44 +0000 (18:06 +0000)
In lnet_select_pathway() the call to lnet_handle_send_case_locked()
can result in sd_cpt being changed. If this function returns
REPEAT_SEND, we'll go back to the again label. It is possible at
this time to initiate discovery, which will unlock the cpt.
If the local cpt isn't updated we could potentially be manipulating
the wrong cpt resulting in some form of corruption or dead lock.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifd39b0d84f8cce859151f7cc900a082481dd7218
Reviewed-on: https://review.whamcloud.com/34607
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
lnet/lnet/lib-move.c

index 367de23..8e8734b 100644 (file)
@@ -2625,10 +2625,16 @@ again:
 
        rc = lnet_handle_send_case_locked(&send_data);
 
+       /*
+        * Update the local cpt since send_data.sd_cpt might've been
+        * updated as a result of calling lnet_handle_send_case_locked().
+        */
+       cpt = send_data.sd_cpt;
+
        if (rc == REPEAT_SEND)
                goto again;
 
-       lnet_net_unlock(send_data.sd_cpt);
+       lnet_net_unlock(cpt);
 
        return rc;
 }