Whamcloud - gitweb
LU-14847 ptlrpc: two replay lock threads 94/44294/5
authorVitaly Fertman <c17818@cray.com>
Tue, 13 Jul 2021 16:07:14 +0000 (19:07 +0300)
committerOleg Drokin <green@whamcloud.com>
Tue, 31 Aug 2021 05:20:43 +0000 (05:20 +0000)
commitd7d7eb50c8f5fd3fc5a7808fb112d233bdef34d7
treed409f5b5677dbe8c7b552f7d2b867bcda06d17da
parenta91839eb57103e6fbf5a3f6bacfcc0c471620057
LU-14847 ptlrpc: two replay lock threads

conflict to each other what leads to:
        ASSERTION( atomic_read(&imp->imp_replay_inflight) == 1 )

replay_lock_interpret() does ptlrpc_connect_import() on error, and one
thread will appear starting with connect reply interpret.

replay_lock_interpret() also wakes up ldlm_lock_replay_thread() which
does ptlrpc_import_recovery_state_machine().

It may happen that both threads will get to ldlm_replay_locks() on the
next round at the same time, both increment imp_replay_inflight and
the second one will assert.

The problem appeared in LU-13600 which added ldlm_lock_replay_thread()
with the ptlrpc_import_recovery_state_machine() call.

HPE-bug-id: LUS-10147
Fixes: 3b613a442b ("LU-13600 ptlrpc: limit rate of lock replays")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ia9aafb631e3ba5f850504cc58b4826acec2813bd
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158931
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/44294
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ldlm/ldlm_request.c
lustre/obdclass/obd_config.c