b=9829
r=adilger
Description: client incorrectly hits assertion in ptlrpc_replay_req()
Details : for a short time RPCs with bulk IO are in the replay list,
but replay of bulk IOs is unimplemented. If the OST filesystem
is corrupted due to disk cache incoherency and then replay is
started it is possible to trip an assertion. Avoid putting
committed RPCs into the replay list at all to avoid this issue.
and bits policy, thus improving the performance of search through
the granted list.
+Severity : major
+Frequency : only if OST filesystem is corrupted
+Bugzilla : 9829
+Description: client incorrectly hits assertion in ptlrpc_replay_req()
+Details : for a short time RPCs with bulk IO are in the replay list,
+ but replay of bulk IOs is unimplemented. If the OST filesystem
+ is corrupted due to disk cache incoherency and then replay is
+ started it is possible to trip an assertion. Avoid putting
+ committed RPCs into the replay list at all to avoid this issue.
+
------------------------------------------------------------------------------
TBD Cluster File Systems, Inc. <info@clusterfs.com>
if (req->rq_import->imp_replayable) {
spin_lock(&imp->imp_lock);
- if (req->rq_transno != 0)
+ /* no point in adding already-committed requests to the replay
+ * list, we will just remove them immediately. b=9829 */
+ if (req->rq_transno != 0 &&
+ (req->rq_transno >
+ lustre_msg_get_last_committed(req->rq_repmsg) ||
+ req->rq_replay))
ptlrpc_retain_replayable_request(req, imp);
else if (req->rq_commit_cb != NULL) {
spin_unlock(&imp->imp_lock);