Whamcloud - gitweb
LU-12347 llite: do not take mod rpc slot for getxattr
The following scenario may lead to client eviction:
clientA clientB MDS
threadA1: write to file F1, get
and hold DoM MDC LDLM lock L1:
->cl_io_loop()
->cl_io_lock()
:
->mdc_lock_granted()
->lock->l_writers++
[hold ref until write done]
threadA2-A8: create files F2-F8:
->ll_file_open()
->mdc_enqueue_base()
->ldlm_cli_enqueue()
->ptlrpc_get_mod_rpc_slot()
->ptlrpc_queue_wait()
[hold RPC slot until create done]
OST(s) in recovery.
MDS waiting on OST(s) to
precreate new objects.
threadA1:
-> cl_io_start()
-> __generic_file_aio_write()
-> file_remove_suid()
-> ll_xattr_cache_refill()
-> mdc_xattr_common()
-> ptlrpc_get_mod_rpc_slot()
[blocked waiting for RPC slot]
threadB1: write file F1,
enqueue DoM MDC lock L1
MDS sends blocking AST
to clientA for lock L1
ldlm_threadA3: cannot cancel busy lock L1:
-> ldlm_handle_bl_callback()
["Lock L1 referenced, will be cancelled later"]
MDS evicts clientA for
not cancelling lock L1
threadA1: never completes write:
->cl_io_end()
->cl_io_unlock()
->osc_lock_cancel()
->lock->l_writers--;
The fix is to add IT_GETXATTR to list of operations which do not
need mod rpc slot.
Tests to illustrate the issue is added.
wait_for_function(): total sleep time (wait) is to be equal to max
when 1 is returned.
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
HPE-bug-id: LUS-7271
Change-Id: I1b80677df084bda141b9ac58a78b765bd0b14a41
Reviewed-on: https://review.whamcloud.com/44151
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>