Whamcloud - gitweb
LU-13614 ldlm: revert LU-11762 32/39532/7
authorVladimir Saveliev <c17830@cray.com>
Tue, 28 Jul 2020 22:28:22 +0000 (01:28 +0300)
committerOleg Drokin <green@whamcloud.com>
Mon, 12 Oct 2020 05:48:46 +0000 (05:48 +0000)
commit2d24238a80be9ca924369d142148d4f6f1891102
tree7466683a99dd6c8f2212d414d4c3c39bc1336fdb
parent1e4bd16acfa26a06486ebd3426547502eedefa45
LU-13614 ldlm: revert LU-11762

Commit fe5c801657 introduced a problem for recovery.

When recovery timeout reaches hard recovery timeout
target_recovery_overseer() leaves obd_recovery_expired flag set. That
makes check_for_next_transno() to not wait until next replay request
arrives which leads to assertion:
LASSERT(atomic_read(&obd->obd_req_replay_clients) == 0);

Test to illustrace the issue is added.

replay-single.sh:test_59 is added to EXCEPT_ALWAYS list:
  it was broken harmlessly before this patch and this patch made that
  test really fail due to that defect.

Fixes: fe5c80165 ("LU-11762 ldlm: ensure the recovery timer is armed")
HPE-bug-id: LUS-8299
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: Ia694a519b5d73620be3014e92fd671d388550979
Reviewed-on: https://review.whamcloud.com/39532
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd_support.h
lustre/ldlm/ldlm_lib.c
lustre/ptlrpc/niobuf.c
lustre/target/tgt_handler.c
lustre/tests/recovery-small.sh
lustre/tests/replay-single.sh