Whamcloud - gitweb
LU-12163 lnet: fix cpt locking 32/36032/3
authorAmir Shehata <ashehata@whamcloud.com>
Sat, 6 Apr 2019 00:38:38 +0000 (17:38 -0700)
committerOleg Drokin <green@whamcloud.com>
Thu, 12 Sep 2019 03:50:50 +0000 (03:50 +0000)
In lnet_select_pathway() the call to lnet_handle_send_case_locked()
can result in sd_cpt being changed. If this function returns
REPEAT_SEND, we'll go back to the again label. It is possible at
this time to initiate discovery, which will unlock the cpt.
If the local cpt isn't updated we could potentially be manipulating
the wrong cpt resulting in some form of corruption or dead lock.

Lustre-change: https://review.whamcloud.com/34607
Lustre-commit: f6d63067e1ec00009b9da5cdb263fe14e7e503e1

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifd39b0d84f8cce859151f7cc900a082481dd7218
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/lib-move.c

index 2576fc0..ad86638 100644 (file)
@@ -2622,10 +2622,16 @@ again:
 
        rc = lnet_handle_send_case_locked(&send_data);
 
+       /*
+        * Update the local cpt since send_data.sd_cpt might've been
+        * updated as a result of calling lnet_handle_send_case_locked().
+        */
+       cpt = send_data.sd_cpt;
+
        if (rc == REPEAT_SEND)
                goto again;
 
-       lnet_net_unlock(send_data.sd_cpt);
+       lnet_net_unlock(cpt);
 
        return rc;
 }