Whamcloud - gitweb
LU-17190 osc: client-side high prio I/O under blocking AST
We found the following deadlock with parallel DIO:
T1: writer
Obtain DLM extent lock: L1=<PW, [0, EOF]>
T2: parallel DIO reader
- read 50M data, iosize=64M, max_pages_per_rpc=1024(4M)
max_rpcs_in_flight=8
ll_direct_IO_impl()
- use out all available RPC slots: number of read RPC in flight: 9
- OST side
->tgt_brw_read()
->tgt_brw_lock()
- Server side locking. Try to cancel the conflict lock L1.
T3: reader
- Take DLM lock ref on L1=<PW, [0, EOF]>.
->ll_readpage()
->ll_io_read_page()
->cl_io_submit_rw()
- wait fro RPC slots to send the read RPC to OST
...
deadlock:
- T2 => T3: T2 is waiting for T3 to release DLM lock L1;
- T3 => T2: T3 is waiting for T2 finished to free RPC slots;
To solve this problem, we introduce a client-side high priority
I/O handling mechanism where the extent lock protecting the I/O is
under blocking AST.
It implements as follows:
When received a lock blocking AST and the corresponding lock is in
use (reader and writer count is not zero), it checks whether there
are any I/O (osc_extent) used this lock is outstanding (i.e. wait
for RPC slot). If found, make this kind of I/Os with high
priority and put them into the HP list of the client. Thus the
client will force to send HP I/Os even the available RPC slots
are used out.
This makes I/O engine on OSC layer more efficent.
For normal I/Os, the client needs to iterate over the object list
and send I/O one by one. Moreover, the in-flight I/Os can not
exceed the @max_rpcs_in_flight.
High priority I/Os are put into HP list and will be handled more
quickly. This can avoid possible deadlock caused by parallel DIO
and the client can reponse the lock blocking AST more quickly.
Test-Parameters: testlist=sanity-pcc env=ONLY=99a,ONLY_REPEAT=100 clientdistro=el8.10
Test-Parameters: testlist=sanity-pcc env=ONLY=99a,ONLY_REPEAT=100 clientdistro=el9.3
Test-Parameters: testlist=sanity-pcc env=ONLY=99b,ONLY_REPEAT=100 clientdistro=el8.10
Test-Parameters: testlist=sanity-pcc env=ONLY=99b,ONLY_REPEAT=100 clientdistro=el9.3
Change-Id: I9afe032a79f40d55b800ddb13d8b8e9a3e10ba56
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
12 files changed: