git://git.whamcloud.com - fs/lustre-release.git/commit

author	Vladimir Saveliev <c17830@cray.com>
	Tue, 28 Jul 2020 22:28:22 +0000 (01:28 +0300)
committer	Oleg Drokin <green@whamcloud.com>
	Mon, 12 Oct 2020 05:48:46 +0000 (05:48 +0000)
commit	2d24238a80be9ca924369d142148d4f6f1891102
tree	7466683a99dd6c8f2212d414d4c3c39bc1336fdb	tree \| snapshot
parent	1e4bd16acfa26a06486ebd3426547502eedefa45	commit \| diff

LU-13614 ldlm: revert LU-11762

Commit fe5c801657 introduced a problem for recovery.

When recovery timeout reaches hard recovery timeout
target_recovery_overseer() leaves obd_recovery_expired flag set. That
makes check_for_next_transno() to not wait until next replay request
arrives which leads to assertion:
LASSERT(atomic_read(&obd->obd_req_replay_clients) == 0);

Test to illustrace the issue is added.

replay-single.sh:test_59 is added to EXCEPT_ALWAYS list:
it was broken harmlessly before this patch and this patch made that
test really fail due to that defect.

Fixes: fe5c80165 ("LU-11762 ldlm: ensure the recovery timer is armed")
HPE-bug-id: LUS-8299
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: Ia694a519b5d73620be3014e92fd671d388550979
Reviewed-on: https://review.whamcloud.com/39532
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

lustre/include/obd_support.h		diff \| blob \| history
lustre/ldlm/ldlm_lib.c		diff \| blob \| history
lustre/ptlrpc/niobuf.c		diff \| blob \| history
lustre/target/tgt_handler.c		diff \| blob \| history
lustre/tests/recovery-small.sh		diff \| blob \| history
lustre/tests/replay-single.sh		diff \| blob \| history