Whamcloud - gitweb
LU-15132 hsm: Protect against parallel HSM restore requests 67/45367/15
authorEtienne AUJAMES <etienne.aujames@cea.fr>
Thu, 21 Oct 2021 14:31:01 +0000 (16:31 +0200)
committerOleg Drokin <green@whamcloud.com>
Mon, 18 Jul 2022 05:34:17 +0000 (05:34 +0000)
commit66b3e74bccf1451d135b7f331459b6af1c06431b
tree00a5ec69f79f486c9db6149dd2135cc94cb5d238
parentf238540c879dc668e18cf99cba62f117ccae64d6
LU-15132 hsm: Protect against parallel HSM restore requests

Multiple parallel accesses (read/write) to the same released file
could cause multiple HSM restore requests to be sent.
On the MDT side, each restore request waits the first one to complete
before grabbing the MDS_INODELOCK_LAYOUT LCK_EX and registering the
llog record.

This could cause several MDT threads to hang for the same restore
request sent in parallel. In the worst case, all MDT threads can
hang and the MDS is not longer able to handle requests.

This patch checks if an HSM restore handle exists before taking the
lock.

Test-Parameters: testlist=sanity-hsm,sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=12s,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I9584edc2c7411aa41b2e318e55f57c117d1c3dfb
Reviewed-on: https://review.whamcloud.com/45367
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd_support.h
lustre/mdt/mdt_coordinator.c
lustre/mdt/mdt_hsm_cdt_client.c
lustre/tests/sanity-hsm.sh