Whamcloud - gitweb
LU-8010 mdt: fix orphan layout_lock cases for restore 10/19710/7
authorBruno Faccini <bruno.faccini@intel.com>
Thu, 21 Apr 2016 13:56:28 +0000 (15:56 +0200)
committerOleg Drokin <oleg.drokin@intel.com>
Mon, 16 May 2016 16:47:41 +0000 (16:47 +0000)
commitc13ddec8e1cd3a63c16e08f28749771200b92f1b
tree1e32606c695bd88950ccd9640acfc4429469c538
parentfd4ab6e6ae877c88e46c35c517349285aa6226d2
LU-8010 mdt: fix orphan layout_lock cases for restore

Previously to this patch layout was not unlocked when a restore
was failed before being sent to CT, leading to a situation where
other requestors hang and also to an orphan restore_handle to be
kept on CDT's list of registered restore actions.
Only way to clear situation then, was to stop CDT.
This situation could at least occur if a restore was canceled
but the CT does not handle cancel operation, allowing the restore
to complete but also to have new restore requests to be
registered in the mean time and then to be failed due to
incompatible (no longer released) file state.

Also fix similar deadlock cases where layout lock was taken for
previously started restore requests upon CDT restart, by forcing
their replay/restart. This should also strengthen overall HSM
recovery process.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ib1ba9156793a230d256ff80d74372813f10b0321
Reviewed-on: http://review.whamcloud.com/19710
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Aurelien Degremont <aurelien.degremont@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/mdt/mdt_coordinator.c
lustre/mdt/mdt_hsm_cdt_agent.c
lustre/mdt/mdt_internal.h