Whamcloud - gitweb
LU-7638 recovery: do not abort update recovery. 85/17885/6
authorDi Wang <di.wang@intel.com>
Thu, 7 Jan 2016 22:40:09 +0000 (17:40 -0500)
committerOleg Drokin <oleg.drokin@intel.com>
Thu, 28 Jan 2016 16:52:01 +0000 (16:52 +0000)
commitb32e55b600ca2c9bf8b62287d9f889791d157426
tree7162018d213725123c26907d328847e6a3681be1
parent92890d8f555d12ad32dc9841a328e84c5d26e896
LU-7638 recovery: do not abort update recovery.

When normal recovery timeout, if there are update
replay in the queue, it should still keep the
exports of other MDTs and continue update replay
until recovery is manually aborted.

Add tdtd_recovery_threads_count/waitq to manage
the update recovery threads(retrieving the update
log), so during abort, these recovery threads
should be stopped, then it can cleanup the update
replay reqs in the list.

Fix the negative recovery time console message.

Add test cases replay-single 119 and 120 to verify
these cases.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Iedcc4922f1500aedec664ff70266b6d2e9f812de
Reviewed-on: http://review.whamcloud.com/17885
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
12 files changed:
lustre/include/lu_target.h
lustre/include/obd.h
lustre/include/obd_support.h
lustre/ldlm/ldlm_lib.c
lustre/lod/lod_dev.c
lustre/mdt/mdt_handler.c
lustre/mdt/mdt_lproc.c
lustre/ofd/ofd_obd.c
lustre/target/update_trans.c
lustre/tests/conf-sanity.sh
lustre/tests/replay-single.sh
lustre/tests/test-framework.sh