Whamcloud - gitweb
LU-7256 tests: wait current LFSCK to exit before next test 06/17406/10
authorFan Yong <fan.yong@intel.com>
Sun, 29 Nov 2015 00:57:13 +0000 (08:57 +0800)
committerOleg Drokin <oleg.drokin@intel.com>
Sun, 13 Mar 2016 06:25:48 +0000 (06:25 +0000)
commit6871453b053d5756aba2321122db4564df5c1c57
treed033d20ff0b8d15d5f36792aa9f0a25eee3b89ac
parentbcbcd5873589c71a5d1028c14e74f8897fc3ffc0
LU-7256 tests: wait current LFSCK to exit before next test

During the sanity-lfsck tests, some test cases only check the
LFSCK status on MDT0, and go ahead if the status matches the
expected one. For DNE cases, such check maybe not enough, and
may leave un-finished LFSCK instances on other MDT(s). It may
cause the following trouble:

1) When move to next test case, the un-finished LFSCK instance
   may cause new LFSCK instance failure or other strange LFSCK
   check/repair behaviour.

2) If it is the last test case, and some MDT(s) umounted, then
   when some un-finished LFSCK instance on another online MDT
   needs to talk with some umounted MDT, then related RPC will
   trigger reconnect to the umounted MDT, and the LFSCK engine
   will hung there till such MDT mount again. That is the case
   for this ticket hit. In fact, we should allow the server to
   umount even though some LFSCK instances run on other nodes.
   The LU-6684 patch (http://review.whamcloud.com/#/c/17032/)
   will handle that more properly.

This patch is mainly for adjusting test scripts to wait all
the LFSCK instances to exit before the next test case.

There are two options to wait all the LFSCK instances to exit:

1) Check the LFSCK status via related lproc interface on each
   target (MDT/OST) one by one.

2) Export some new interface to check all the LFSCK instances'
   status via single command.

We choose the later solution. Because it is convenient for the
sys-admin. The new interface is 'lctl lfsck_query'. Its usage:

lctl lfsck_query <-M | --device MDT_device> [-h | --help]
                 [-t | --type check_type[,check_type...]]
                 [-w | --wait]

options:
-M: device to query LFSCK on.
-t: LFSCK type(s) to be queried (default is all).
-w: do not return until LFSCK not running.
-h: help message.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0bdab85e47eb290bfe3605dfc37caf7ea35d186a
Reviewed-on: http://review.whamcloud.com/17406
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
22 files changed:
lustre/doc/lctl.8
lustre/include/lu_target.h
lustre/include/lustre/lustre_idl.h
lustre/include/lustre/lustre_lfsck_user.h
lustre/include/lustre_ioctl.h
lustre/include/lustre_lfsck.h
lustre/lfsck/lfsck_engine.c
lustre/lfsck/lfsck_internal.h
lustre/lfsck/lfsck_layout.c
lustre/lfsck/lfsck_lib.c
lustre/lfsck/lfsck_namespace.c
lustre/mdd/mdd_device.c
lustre/mdt/mdt_handler.c
lustre/ptlrpc/pack_generic.c
lustre/ptlrpc/wiretest.c
lustre/target/tgt_handler.c
lustre/tests/sanity-lfsck.sh
lustre/utils/lctl.c
lustre/utils/lustre_lfsck.c
lustre/utils/obdctl.h
lustre/utils/wirecheck.c
lustre/utils/wiretest.c