LU-14027 ldlm: Do not hang if recovery restarted during lock replay
LU-13600 introduced lock ratelimiting logic, but it did not take
into account that if there's a disconnection in the REPLAY_LOCKS
phase then yet unsent locks get stuck in the sending queue so
the replay locks thread hangs with imp_replay_inflight elevated
above zero.
The direct consequence from that is recovery state machine never
advances from REPLAY to REPLAY_LOCKS status when imp_replay_inflight
is non zero.
Adjust __ldlm_replay_locks() to check if the import state changed
before attempting to send any more requests.
Add a testcase.
Lustre-change: https://review.whamcloud.com/40238
Lustre-commit:
7ca495ec67f474e10352077fc40123e4818b8e69
Change-Id: Idbaf5461f33d1884088269d67d01071c7e1bf8a5
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Fixes:
3b613a442b ("LU-13600 ptlrpc: limit rate of lock replays")
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Fixes:
6b6d9c0911 ("LU-13600 ptlrpc: limit rate of lock replays")
Reviewed-on: https://review.whamcloud.com/41224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>