r=johann,shadow
- fixes ptlrpcd blocking on very long reply unlink waiting. To do so new rpc phase introduced
RQ_PHASE_UNREGISTERING in which request stay until we have reply_in_callback() called by lnet
signaling that reply is unlinked. All requests in this state are skipped in processing by prlrcd
instead of waiting n * 300s on each of them. This allows ptlrpcd to process other rpcs in the set;
- make sure that inflight count is coherent with being present on sending or delay list. That is,
if we see inflight != 0, rpc must be on one of these lists. This is very helpful in
ptlrpc_invalidate_import() to show all rpcs still waiting after invalidating import;
- in ptlrpc_invalidate_import() wait maximal rq_deadline - now from all inflight rpcs instead of
obd_timeout which may be much longer. If calculated timeout is 0, obd_timeout is used. This fixes
the issue that rq_deadline - now > obd_timeout (very easy to see in logs) which led to inflight !=
0 assert because inflight rpcs timed out later than our wait period is finished;
- in ptlrpc_invalidate_import() wait forever for rpcs in UNREGISTERING phase. Check in assert for
inflight == 0 for wait timed out case if no rpcs in UNREGISTERING phase. Only those in
UNREGISTERING phase are allowed to stay longer than obd_timeout;
- added ptlrpc_move_rqphase() function. All phase changes go through it. Add debug_req() there to
track down all phase changes;
- conf_sanity.sh test_45 added to emulate very long reply unlink and also situation when
rq_deadline - now > obd_timeout;
- fixed using rq_timedout in debug_req();
- do not wait forever in ptlrpc_unregister_reply() for async case (using it from sets). Sync case
left unchanged.
remote_ost_nodsh && skip "remote OST with nodsh" && exit 0
#
-[ "$SLOW" = "no" ] && EXCEPT_SLOW="0 1 2 3 6 7 15 18 24b 25 30 31 32 33 34a "
+[ "$SLOW" = "no" ] && EXCEPT_SLOW="0 1 2 3 6 7 15 18 24b 25 30 31 32 33 34a 45"
assert_DIR
}
run_test 44 "mounted client proc entry exists"
+test_45() { #17310
+ setup
+ check_mount || return 2
+ stop_mds
+ df -h $MOUNT &
+ log "sleep 60 sec"
+ sleep 60
+#define OBD_FAIL_PTLRPC_LONG_UNLINK 0x50f
+ do_facet client "lctl set_param fail_loc=0x50f"
+ log "sleep 10 sec"
+ sleep 10
+ manual_umount_client --force || return 3
+ do_facet client "lctl set_param fail_loc=0x0"
+ start_mds
+ mount_client $MOUNT || return 4
+ cleanup
+ return 0
+}
+run_test 45 "long unlink handling in ptlrpcd"
+
+
equals_msg `basename $0`: test complete
[ -f "$TESTSUITELOG" ] && cat $TESTSUITELOG || true