From d83c8868974785fd1bd378153424baecf29a915b Mon Sep 17 00:00:00 2001 From: Johann Lombardi Date: Fri, 22 Jan 2010 22:03:26 +0100 Subject: [PATCH] b=21406 fix deadlock between kjournald2 and ost_io thread i=adilger i=girish Calling clear_page_dirty_for_io() is no longer needed since we are granted that no dirty pages can be left in the page cache by partial truncate. The problem is that clear_page_dirty_for_io() can temporarilly mark the page as dirty in the radix tree, which can cause deadlock between jbd commit and bulk write handling. --- lustre/ChangeLog | 20 +++++++++++++++++--- lustre/obdfilter/filter_io_26.c | 8 +++++--- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/lustre/ChangeLog b/lustre/ChangeLog index 1315f90..07bb6c4 100644 --- a/lustre/ChangeLog +++ b/lustre/ChangeLog @@ -335,7 +335,7 @@ Description: add cascading_rw.c to lustre/tests Severity : normal Bugzilla : 21565 Description: filter_last_id() NULL deref -Details : lprocfs_filter_rd_last_id() should check for the fully +Details : lprocfs_filter_rd_last_id() should check for the fully setup obd device, before proceeding further. Severity : enhancement @@ -343,9 +343,10 @@ Bugzilla : 21571 Description: Loadgen improvements Details : stacksize and locking fixes for loadgen +Severity : normal Bugzilla : 21656 Description: Quiet CERROR("dirty %d > system dirty_max %d\n" -Details : The atomic_read() allowing the atomic_inc() are not covered +Details : The atomic_read() allowing the atomic_inc() are not covered by a lock. Thus they may safely race and trip this CERROR() unless we add in a small fudge factor (+1). @@ -354,30 +355,36 @@ Bugzilla : 21800 Description: shrink_slab: nr=-9223362083340912175 Details : fix spurious message from shrink_slab reporing negative nr +Severity : normal Bugzilla : 21681 Description: Quiet bogus previously committed transno error -Details : suppress the "server went back in time" error message which +Details : suppress the "server went back in time" error message which is always printed even in the common case after a client eviction +Severity : enhancement Bugzilla : 20065 Description: Parallel statfs() calls result in client eviction Details : cache statfs data for 1s. +Severity : normal Bugzilla : 21574 Description: parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 Details : fix serveral issues in pinger code causing clients not to ping servers for too long, resulting in evictions. +Severity : normal Bugzilla : 21564 Description: e2fsck should warn when MMP update interval is extended Details : print mmp_check_interval and make it possible to abort mount operation in case it takes too long. +Severity : normal Bugzilla : 21595 Description: mdsrate-create-large.sh, BUG: soft lockup - CPU#0 stuck for 10s! Details : fix bug in the RHEL5's jbd2 callback patch. +Severity : normal Bugzilla : 21828 Description: drop number of active requests when queued for recovery Details : Now that we take a reference on the original request instead of @@ -385,12 +392,19 @@ Details : Now that we take a reference on the original request instead of active requests or the queued requests will prevent all request processing when they exceed (srv->srv_threads_running - 1). +Severity : enhancement Bugzilla : 21826 Description: refuse to invalidate operational quota files when they are in use Details : an attempt to invalidate operational quota files on the quota master is not actually permitted by VFS (returning -EPERM), but we should not depend on that and should return the error earlier. +Severity : normal +Bugzilla : 21406 +Description: Applications stuck in jbd2_log_wait_commit during exit +Details : fix deadlock between kjournald2 trying to acquire the page lock + owned by an ost_io thread waiting for journal commit. + ------------------------------------------------------------------------------- 2009-10-16 Sun Microsystems, Inc. diff --git a/lustre/obdfilter/filter_io_26.c b/lustre/obdfilter/filter_io_26.c index bc884ad..eb0badf 100644 --- a/lustre/obdfilter/filter_io_26.c +++ b/lustre/obdfilter/filter_io_26.c @@ -632,9 +632,11 @@ int filter_commitrw_write(struct obd_export *exp, struct obdo *oa, LASSERT(PageLocked(lnb->page)); LASSERT(!PageWriteback(lnb->page)); - /* preceding filemap_write_and_wait() should have clean pages */ - if (fo->fo_writethrough_cache) - clear_page_dirty_for_io(lnb->page); + /* Calling clear_page_dirty_for_io() is no longer needed since + * write and truncate are serialized by i_alloc_sem and + * filter_setattr_internal() flushes any partial-page + * truncates before releasing i_alloc_sem, so there should + * never be any dirty pages at this time. */ LASSERT(!PageDirty(lnb->page)); SetPageUptodate(lnb->page); -- 1.8.3.1