Whamcloud - gitweb
LU-15132 hsm: Protect against parallel HSM restore requests
authorEtienne AUJAMES <etienne.aujames@cea.fr>
Thu, 21 Oct 2021 14:31:01 +0000 (16:31 +0200)
committerAndreas Dilger <adilger@whamcloud.com>
Tue, 11 Oct 2022 07:48:13 +0000 (07:48 +0000)
commit4624a4f69dcb4bce583619b56d89ed621bee5ae4
treec3b3e4f697a7c33f39f1b2b99efec90af29aaee2
parent532fa8a41bdb611e6338750888dcddfff901fc4e
LU-15132 hsm: Protect against parallel HSM restore requests

Multiple parallel accesses (read/write) to the same released file
could cause multiple HSM restore requests to be sent.
On the MDT side, each restore request waits the first one to complete
before grabbing the MDS_INODELOCK_LAYOUT LCK_EX and registering the
llog record.

This could cause several MDT threads to hang for the same restore
request sent in parallel. In the worst case, all MDT threads can
hang and the MDS is not longer able to handle requests.

This patch checks if an HSM restore handle exists before taking the
lock.

Lustre-change: https://review.whamcloud.com/45367
Lustre-commit: 66b3e74bccf1451d135b7f331459b6af1c06431b

Test-Parameters: testlist=sanity-hsm,sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=12s,ONLY_REPEAT=50
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I9584edc2c7411aa41b2e318e55f57c117d1c3dfb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48650
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lustre/include/obd_support.h
lustre/mdt/mdt_coordinator.c
lustre/mdt/mdt_hsm_cdt_client.c
lustre/tests/sanity-hsm.sh