From: Arshad Hussain Date: Wed, 19 May 2021 11:04:30 +0000 (+0530) Subject: LU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page) X-Git-Tag: 2.14.55~76 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=76c5e8ed9560fe232bcc0c2ee0069dbdb8411565 LU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page) fallocate(PUNCH_HOLE) was leaving the partially-zeroed page in the buffer cache. This was causing ASSERT when doing large direct read/write operations. This was see when executing a fsx run with options:- $ fsx -c 50 -p 1000 -S 7919 -P /tmp -l 5407677 -N 100000 Lustre: DEBUG MARKER: GENERIC DEBUG start start LustreError: 15768:0:(osd_io.c:1563:osd_write_commit()) ASSERTION( !PageDirty(lnb[i].lnb_page) ) failed: LustreError: 15768:0:(osd_io.c:1563:osd_write_commit()) LBUG Pid: 15768, comm: ll_ost_io00_000 3.10.0-957.el7_lustre.x86_64 Call Trace: [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [<0>] osd_write_commit+0x52c/0x870 [osd_ldiskfs] [<0>] ofd_commitrw_write+0xe79/0x1510 [ofd] [<0>] ofd_commitrw+0x2ad/0x9a0 [ofd] [<0>] tgt_brw_write+0xfd0/0x1cb0 [ptlrpc] [<0>] tgt_request_handle+0x7ea/0x1750 [ptlrpc] [<0>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [<0>] ptlrpc_main+0xb3c/0x14e0 [ptlrpc] [<0>] kthread+0xd1/0xe0 [<0>] ret_from_fork_nospec_begin+0xe/0x21 [<0>] 0xfffffffffffffffe Kernel panic - not syncing: LBUG Test-case: sanity-benchmark/fsx_partial_punch added Test-Parameters: testlist=sanity-benchmark Signed-off-by: Arshad Hussain Change-Id: I89fcbc6af0cbf4b544b8d149703053909ecb6cad Reviewed-on: https://review.whamcloud.com/43462 Tested-by: jenkins Tested-by: Maloo Reviewed-by: Mike Pershin Reviewed-by: Alex Zhuravlev Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin --- diff --git a/lustre/osd-ldiskfs/osd_io.c b/lustre/osd-ldiskfs/osd_io.c index d72896e..40ae417 100644 --- a/lustre/osd-ldiskfs/osd_io.c +++ b/lustre/osd-ldiskfs/osd_io.c @@ -2760,6 +2760,22 @@ void osd_trunc_unlock_all(const struct lu_env *env, struct list_head *list) } } +/* For a partial-page punch, flush punch range to disk immediately */ +static void osd_partial_page_flush_punch(struct osd_device *d, + struct inode *inode, loff_t start, + loff_t end) +{ + if (osd_use_page_cache(d)) { + filemap_fdatawrite_range(inode->i_mapping, start, end); + } else { + /* Notice we use "wait" version to ensure I/O is complete */ + filemap_write_and_wait_range(inode->i_mapping, start, + end); + invalidate_mapping_pages(inode->i_mapping, start >> PAGE_SHIFT, + end >> PAGE_SHIFT); + } +} + /* * For a partial-page truncate, flush the page to disk immediately to * avoid data corruption during direct disk write. b=17397 @@ -2827,8 +2843,7 @@ void osd_execute_punch(const struct lu_env *env, struct osd_object *obj, struct file *file = osd_quasi_file(env, inode); file->f_op->fallocate(file, mode, start, end - start); - osd_partial_page_flush(d, inode, start); - osd_partial_page_flush(d, inode, end - 1); + osd_partial_page_flush_punch(d, inode, start, end - 1); } void osd_process_truncates(const struct lu_env *env, struct list_head *list) diff --git a/lustre/tests/sanity-benchmark.sh b/lustre/tests/sanity-benchmark.sh index b01dc45..35a58fc 100644 --- a/lustre/tests/sanity-benchmark.sh +++ b/lustre/tests/sanity-benchmark.sh @@ -199,6 +199,31 @@ test_fsx() { } run_test fsx "fsx" +test_fsx_partial_punch() { + local fsx_count=100000 + local testfile=$DIR/f0.fsxfile + local fsx_size=5407677 # upper bound file size + local fsx_seed=7919 + + check_set_fallocate + + rm -f $testfile + $LFS setstripe -c -1 $testfile + + # + # $fsx_seed, $fsx_count and $fsx_size combination almost + # always reproduces the LASSERT under LU-14640. Therefore these + # constants are used as reproducer vs using a random value and + # hoping it hits the error condition + # + CMD="$FSX -c 50 -p 1000 -S $fsx_seed -P $TMP -l $fsx_size \ + -N $fsx_count $testfile" + echo "Using: $CMD" + $CMD || error "fsx failed" + rm -f $testfile +} +run_test fsx_partial_punch "Verify fsx with partial punch via fallocate" + complete $SECONDS check_and_cleanup_lustre exit_status