From: Patrick Farrell Date: Tue, 6 Jul 2021 15:20:56 +0000 (-0400) Subject: LU-14814 osc: osc: Do not flush on lockless cancel X-Git-Tag: 2.14.53~2 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=6717c573ed90da9175e3c93c19759ea2dcd38bec LU-14814 osc: osc: Do not flush on lockless cancel The cancellation of a an OSC lock without an LDLM lock (a 'lockless' OSC lock) should not flush pages. Only direct i/o is allowed to use a lockless OSC lock, and direct i/o does not create flushable pages. DIO pages are not flushable because: A) all synced ASAP, and B) the OSC extents created for them are not added to the extent tree which is used to track these pages. Instead, this has the effect of trying to flush pages from ongoing buffered i/o. This can lead to crashes like the following: osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed This assert essentially says the lock cancellation (hp == 1) found an active i/o (an extent in the OES_ACTIVE state). This is not allowed because the flushing code assumes an LDLM lock is being cancelled, which will only start once there is no active i/o. Because the OSC lock being cancelled is not associated with an LDLM lock, this is not true, and nothing prevents active i/o under a different lock, leading to this assert. The solution is simply to not flush pages when cancelling a no-LDLM-lock OSC lock. Additional note: New lockless OSC locks cannot be created if they are blocked by a regular OSC lock, but a new regular lock can be created if there is a lockless lock present. Thus, the sequence is something like this: Direct i/o creates lockless OSC lock Buffered i/o creates OSC and LDLM lock on the same range Direct i/o finishes, starts cancelling its OSC lock Buffered i/o is still ongoing, with extents in OES_ACTIVE This results in the above crash during the OSC lock cancellation. Note it would be possible to resolve this issue by not allowing lockless OSC locks to match regular OSC locks, but this is not necessary, since there's no reason for lockless locks to flush pages on cancellation. Test-Parameters: env=ONLY=398b,ONLY_REPEAT=200 testlist=sanity Test-Parameters: env=ONLY=77,ONLY_REPEAT=100 testlist=sanityn Signed-off-by: Patrick Farrell Change-Id: Iceb1747b66232cad3f7e90ec271310a13a687a33 Reviewed-on: https://review.whamcloud.com/44152 Tested-by: jenkins Tested-by: Maloo Reviewed-by: Li Dongyang Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin --- diff --git a/lustre/osc/osc_lock.c b/lustre/osc/osc_lock.c index 6b8b460..2ffb63b 100644 --- a/lustre/osc/osc_lock.c +++ b/lustre/osc/osc_lock.c @@ -1148,16 +1148,8 @@ static void osc_lock_lockless_cancel(const struct lu_env *env, { struct osc_lock *ols = cl2osc_lock(slice); struct osc_object *osc = cl2osc(slice->cls_obj); - struct cl_lock_descr *descr = &slice->cls_lock->cll_descr; - int result; LASSERT(ols->ols_dlmlock == NULL); - result = osc_lock_flush(osc, descr->cld_start, descr->cld_end, - descr->cld_mode, false); - if (result) - CERROR("Pages for lockless lock %p were not purged(%d)\n", - ols, result); - osc_lock_wake_waiters(env, osc, ols); }