Whamcloud - gitweb
LU-11359 mdt: fix mdt_dom_discard_data() timeouts 97/35197/3
authorMikhail Pershin <mpershin@whamcloud.com>
Wed, 31 Oct 2018 13:28:29 +0000 (16:28 +0300)
committerOleg Drokin <green@whamcloud.com>
Thu, 27 Jun 2019 21:47:19 +0000 (21:47 +0000)
commite5810126b3fb488a3fed37e085e3ca4ae585324c
treedea2c5758bf78e81a107cc40821716da9fd9c9e1
parent568f3991739b47f8aabbcdd17c3a7d9b0b2cae8a
LU-11359 mdt: fix mdt_dom_discard_data() timeouts

The mdt_dom_discard_data() issues new lock to cause data
discard for all conflicting client locks. This was done in
context of unlink RPC processing and may cause it to be stuck
waiting for client to cancel their locks leading to cascading
timeouts for any other locks waiting on the same resource and
parent directory.

Patch skips discard lock waiting in the current context by
using own CP callback for that which doesn't wait for blocking
locks. They will be finished later by LDLM and cleaned up in
that completion callback. So current thread just makes sure
discard locks are taken and BL ASTs are sent but doesnt't wait
for lock granting and that fixes the original problem.

At the same time that opens window for race with data being
flushed on client, so it is possible that new IO from client
will happen on just unlinked object causing error message and
it is not possible to distinguish that case from other
possibly critical situations. To solve that the unlinked object
is pinned in memory while until discard lock is granted.
Therefore, such objects can be easily distinguished as stale one
and any IO against it can be just silently ignored.

Older clients are not fully compatible with async DoM discard so
patch adds also new connection flag ASYNC_DISCARD to distinguish
old clients and use old blocking discard for then.

Lustre-change: https://review.whamcloud.com/34071
Lustre-commit: 9c028e74c2202a8a481557c4cb22225734aaf19f

Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I419677af43c33e365a246fe12205b506209deace
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35197
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 files changed:
lustre/include/lustre_dlm.h
lustre/include/uapi/linux/lustre/lustre_idl.h
lustre/ldlm/ldlm_internal.h
lustre/ldlm/ldlm_request.c
lustre/llite/llite_lib.c
lustre/llite/namei.c
lustre/mdt/mdt_internal.h
lustre/mdt/mdt_io.c
lustre/mdt/mdt_open.c
lustre/mdt/mdt_reint.c
lustre/obdclass/lprocfs_status.c
lustre/osc/osc_cache.c
lustre/ptlrpc/service.c
lustre/ptlrpc/wirehdr.c
lustre/ptlrpc/wiretest.c
lustre/tests/sanity.sh
lustre/utils/wirecheck.c
lustre/utils/wirehdr.c
lustre/utils/wiretest.c