EX-5014 pcc: avoid deadlock during DIO open attach on rhel7
The Maloo testing fails with sanity-pcc/45 due to the following
deadlock on rhel7 kernel:
ll_fid_path_cop D
ffff9a32db5eb180 0 10783 10782 0x00000080
Call Trace:
schedule_preempt_disabled+0x29/0x70
__mutex_lock_slowpath+0xc7/0x1d0
mutex_lock+0x1f/0x2f
lookup_slow+0x33/0xa7
link_path_walk+0x80f/0x8b0
path_openat+0xae/0x5a0
do_filp_open+0x4d/0xb0
do_sys_open+0x124/0x220
SyS_open+0x1e/0x20
dd D
ffff9a32fb5b6300 0 10779 10755 0x00000080
Call Trace:
wait_for_completion+0xfd/0x140
call_usermodehelper_exec+0x179/0x1a0
call_usermodehelper+0x40/0x60
pcc_copy_data_dio+0x267/0x340 [lustre]
pcc_attach_data_archive+0x6ff/0xe80 [lustre]
pcc_readonly_attach+0x3d2/0xad0 [lustre]
pcc_readonly_attach_sync+0x205/0x260 [lustre]
pcc_file_open+0x798/0xdd0 [lustre]
ll_atomic_open+0xd80/0x1780 [lustre]
do_last+0xa53/0x1340
path_openat+0xcd/0x5a0
do_filp_open+0x4d/0xb0
do_sys_open+0x124/0x220
SyS_open+0x1e/0x20
This bug only happened on el7 kernel which uses mutex for inode
locking.
During ->ll_atomic_open(), the kernel will take this mutex on the
parent inode. However, when copy data via the user space helper
program ll_fid_path_copy, it will also try to obtain this mutex
lock on the parent inode during lookup, resulting in deadlock.
Test-Parameters: clientdistro=el7.9 testlist=sanity-pcc
Test-Parameters: clientdistro=el8.5 mdscount=2 mdtcount=4 testlist=sanity-pcc env=ONLY=45,ONLY_REPEAT=10
Change-Id: I384c7b1979d93183b86bbde311d29a50346a8d56
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/48405
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>