From: bobijam Date: Thu, 15 Feb 2007 00:23:59 +0000 (+0000) Subject: Branch HEAD X-Git-Tag: v1_7_100~341 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=6e9ad1d588e96a607721df65b998fc6743e81eb3 Branch HEAD b=9829 r=adilger Description: client incorrectly hits assertion in ptlrpc_replay_req() Details : for a short time RPCs with bulk IO are in the replay list, but replay of bulk IOs is unimplemented. If the OST filesystem is corrupted due to disk cache incoherency and then replay is started it is possible to trip an assertion. Avoid putting committed RPCs into the replay list at all to avoid this issue. --- diff --git a/lustre/ChangeLog b/lustre/ChangeLog index 17b852b..8fee9dd 100644 --- a/lustre/ChangeLog +++ b/lustre/ChangeLog @@ -197,6 +197,16 @@ Details : Grouping plain/inodebits in granted list by their request modes and bits policy, thus improving the performance of search through the granted list. +Severity : major +Frequency : only if OST filesystem is corrupted +Bugzilla : 9829 +Description: client incorrectly hits assertion in ptlrpc_replay_req() +Details : for a short time RPCs with bulk IO are in the replay list, + but replay of bulk IOs is unimplemented. If the OST filesystem + is corrupted due to disk cache incoherency and then replay is + started it is possible to trip an assertion. Avoid putting + committed RPCs into the replay list at all to avoid this issue. + ------------------------------------------------------------------------------ TBD Cluster File Systems, Inc. diff --git a/lustre/ptlrpc/client.c b/lustre/ptlrpc/client.c index b5ecb75..142f4e4 100644 --- a/lustre/ptlrpc/client.c +++ b/lustre/ptlrpc/client.c @@ -642,7 +642,12 @@ static int after_reply(struct ptlrpc_request *req) if (req->rq_import->imp_replayable) { spin_lock(&imp->imp_lock); - if (req->rq_transno != 0) + /* no point in adding already-committed requests to the replay + * list, we will just remove them immediately. b=9829 */ + if (req->rq_transno != 0 && + (req->rq_transno > + lustre_msg_get_last_committed(req->rq_repmsg) || + req->rq_replay)) ptlrpc_retain_replayable_request(req, imp); else if (req->rq_commit_cb != NULL) { spin_unlock(&imp->imp_lock);