Whamcloud - gitweb
LU-8407 recovery: more clear message about recovery failure
Currently, the DNE recovery depends on the update logs on the MDTs.
If fail to get the update logs from some MDT(s), then the recovery
cannot go ahead. Different from client-side recovery failure, the
cross-MDT recovery failure may cause the namespace inconsistency.
Because we does not want to export the inconsistent namespace to
client, then we make the recovery (not abort because of timeout)
to wait there until related update logs available.
So if some MDT does not up or not mount, then the recovery on other
MDTs will hung there. As the time going, the client (re)connection
will trigger warning message on the MDTs to say about the recovery
hung. But such message does not clearly describe what happened.
This patch addes callback interface in target_distribute_txn_data,
called 'tdtd_show_update_logs_retrievers'. It allows the users to
check which MDTs are still in fetching update logs. Then the admin
can check related MDTs in detail when hit recovery trouble.
This patch also introduce new recovery status "WAITING" for the
case of update logs not ready for some MDT(s). Under such case,
the non-ready MDTs index and waited time will be shown.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: If5ed4487fe1e6d94f02479d83f6a187d6427b3a7
Reviewed-on: http://review.whamcloud.com/21759
Tested-by: Jenkins
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>