From 5693165eb550296b83e0496ec848a27f6ed90804 Mon Sep 17 00:00:00 2001 From: Patrick Farrell Date: Tue, 6 Jul 2021 11:20:56 -0400 Subject: [PATCH] LU-14814 osc: osc: Do not flush on lockless cancel The cancellation of a an OSC lock without an LDLM lock (a 'lockless' OSC lock) should not flush pages. Only direct i/o is allowed to use a lockless OSC lock, and direct i/o does not create flushable pages. DIO pages are not flushable because: A) all synced ASAP, and B) the OSC extents created for them are not added to the extent tree which is used to track these pages. Instead, this has the effect of trying to flush pages from ongoing buffered i/o. This can lead to crashes like the following: osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed This assert essentially says the lock cancellation (hp == 1) found an active i/o (an extent in the OES_ACTIVE state). This is not allowed because the flushing code assumes an LDLM lock is being cancelled, which will only start once there is no active i/o. Because the OSC lock being cancelled is not associated with an LDLM lock, this is not true, and nothing prevents active i/o under a different lock, leading to this assert. The solution is simply to not flush pages when cancelling a no-LDLM-lock OSC lock. Additional note: New lockless OSC locks cannot be created if they are blocked by a regular OSC lock, but a new regular lock can be created if there is a lockless lock present. Thus, the sequence is something like this: Direct i/o creates lockless OSC lock Buffered i/o creates OSC and LDLM lock on the same range Direct i/o finishes, starts cancelling its OSC lock Buffered i/o is still ongoing, with extents in OES_ACTIVE This results in the above crash during the OSC lock cancellation. Note it would be possible to resolve this issue by not allowing lockless OSC locks to match regular OSC locks, but this is not necessary, since there's no reason for lockless locks to flush pages on cancellation. Lustre-change: https://review.whamcloud.com/44152 Lustre-commit: 6717c573ed90da9175e3c93c19759ea2dcd38bec Test-Parameters: env=ONLY=398b,ONLY_REPEAT=200 testlist=sanity Test-Parameters: env=ONLY=77,ONLY_REPEAT=100 testlist=sanityn Signed-off-by: Patrick Farrell Change-Id: Iceb1747b66232cad3f7e90ec271310a13a687a33 Reviewed-on: https://review.whamcloud.com/44438 Reviewed-by: Bobi Jam Tested-by: jenkins Tested-by: Maloo Reviewed-by: Andreas Dilger --- lustre/osc/osc_lock.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/lustre/osc/osc_lock.c b/lustre/osc/osc_lock.c index 204b93f..1dbb1a5 100644 --- a/lustre/osc/osc_lock.c +++ b/lustre/osc/osc_lock.c @@ -1138,16 +1138,8 @@ static void osc_lock_lockless_cancel(const struct lu_env *env, { struct osc_lock *ols = cl2osc_lock(slice); struct osc_object *osc = cl2osc(slice->cls_obj); - struct cl_lock_descr *descr = &slice->cls_lock->cll_descr; - int result; LASSERT(ols->ols_dlmlock == NULL); - result = osc_lock_flush(osc, descr->cld_start, descr->cld_end, - descr->cld_mode, false); - if (result) - CERROR("Pages for lockless lock %p were not purged(%d)\n", - ols, result); - osc_lock_wake_waiters(env, osc, ols); } -- 1.8.3.1