Whamcloud - gitweb
LU-6120 lfsck: notify ever failed server to exit LFSCK 25/13525/3
authorFan Yong <fan.yong@intel.com>
Mon, 10 Nov 2014 08:48:08 +0000 (16:48 +0800)
committerOleg Drokin <oleg.drokin@intel.com>
Sun, 8 Feb 2015 02:16:16 +0000 (02:16 +0000)
commit4a656bc480baeb7ca2745a9565742fbcaedd81c1
tree5a49e7317baab9e3ab93ade793b99616805369e4
parentecd28d9b6cb691bda8184a7e07f1acc1ccded391
LU-6120 lfsck: notify ever failed server to exit LFSCK

During the first-stage scanning, the local LFSCK instance records
which OSTs have ever failed to respond LFSCK verification requests
(maybe because of network issues or the OST itself trouble). Then
before start the second-stage scanning, the local LFSCK instance
will notify those ever failed OSTs to skip orphan handling since
they missed some OST-objects verification via la_sync_failures().

Originally, after la_sync_failures(), related OSTs will be removed
from the LFSCK targets list, in spite of whether la_sync_failures()
succeed or not, then the subsequent LFSCK notification RPCs will not
be sent to those OSTs. That may cause some OST(s) cannot exit LFSCK
expectedly, and then the subsequent LFSCK start will get failure
since former LFSCK instance has not exit.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Id0283c78527d6a3a6c563de7ce6af1fe2d3f1a30
Reviewed-on: http://review.whamcloud.com/13525
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/lfsck/lfsck_engine.c
lustre/lfsck/lfsck_internal.h
lustre/lfsck/lfsck_layout.c
lustre/lfsck/lfsck_lib.c
lustre/lfsck/lfsck_namespace.c