Whamcloud - gitweb
LU-17634 hsm: serialize HSM restore for a file on a client
For a file in HSM released, exists, archived status, start tens of
processes to read it in parallel on a client, and one read process
may report "No data available" error.
After analyzed the error, we found the following bug in HSM code:
Reading a released file already granted LAYOUT lock on a client:
P1:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
->ll_layout_restore()
->mdc_ioc_hsm_request()
->mdc_hsm_request_lock_to_cancel()
->ldlm_cancel_resource_local()
remove LAYOUT lock from resource into cancel list
NOT yet cancel the LAYOUT lock on the client via ELC...
P2:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
->ll_layout_restore()
->mdc_ioc_hsm_request()
->mdc_hsm_request_lock_to_cancel()
SKIP: No any conflict LAYOUT lock on resource lock list as P1
has already move it (if any) into its cancel list
->mdt_hsm_request()
->cdt_restore_handle_add()
->cdt_restore_handle_find()
->list_add_tail(): add @crh to restore handle list
NOT yet obtain EX LAYOUT lock to cancel cached LAYOUT
locks on client side...
P3:
->ll_file_read_iter()
->ll_do_fast_read(): => return -ENODATA;
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
->ll_layout_restore()
->mdc_ioc_hsm_request()
->mdc_hsm_request_lock_to_cancel()
SKIP as P1 has already move the conflict LAYOUT lock
(if any) into its cancel list
->mdt_hsm_request()
->cdt_restore_handle_add()
->cdt_restore_handle_find()
SKIP as found a restore handle with same FID in the
the restore handle list added by P2.
->ll_layout_refresh()
->io->ci_need_restart = vio->vui_layout_gen != gen;
->LAYOUT gen does not have any change as the LAYOUT lock on
the client is not revoken yet, will not restart I/O...
->return -ENODATA; =>from fast read
We can fix this bug by serializing the HSM restore operation on a
client by using the @lli->lli_layout_mutex simply.
Add sanity-hsm/test_12{t, u} to verfiy it.
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idc2a8c1818386c64798d7e28500c20c80ff369f1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>