Whamcloud - gitweb
LU-14027 ldlm: Do not wait for lock replay sending if import dsconnected 72/40272/4
authorOleg Drokin <green@whamcloud.com>
Fri, 16 Oct 2020 14:25:58 +0000 (10:25 -0400)
committerOleg Drokin <green@whamcloud.com>
Thu, 19 Nov 2020 10:20:40 +0000 (10:20 +0000)
If import disconnected while we were preparing to send some lock replays
the sending RPC would get stuck on the sending list and would keep
the reconnected import state from progressing from REPLAY
to REPLAY_LOCKS state waiting for the queued replay RPCs to finish.

Set them as no_delay to ensure they don't wait.

LU-13600 exacerbated this issue some but it certainly exist
before it as well.

Change-Id: Id276a0be7657d9ad6cf40ad8e7a165d5cd341cb8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40272
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>

index 23abc07..46b3751 100644 (file)
@@ -2482,6 +2482,8 @@ static int replay_one_lock(struct obd_import *imp, struct ldlm_lock *lock)
        /* We're part of recovery, so don't wait for it. */
        req->rq_send_state = LUSTRE_IMP_REPLAY_LOCKS;
+       /* If the state changed while we were prepared, don't wait */
+       req->rq_no_delay = 1;
        body = req_capsule_client_get(&req->rq_pill, &RMF_DLM_REQ);
        ldlm_lock2desc(lock, &body->lock_desc);