When removing a request from the active set on error, we
must also remove it from "inflight" or we will not reduce
inflight as needed and hang on cleanup.
This bug has been latent for some time, but running sanity
414 with hybrid IO tends to trigger it.
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Ib73980724f6e2f5a74400a39840df2e8835a6e23
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
rc = ptl_send_rpc(req, 0);
if (rc == -ENOMEM) {
spin_lock(&imp->imp_lock);
- if (!list_empty(&req->rq_list))
+ if (!list_empty(&req->rq_list)) {
list_del_init(&req->rq_list);
+ if (atomic_dec_and_test(&imp->imp_inflight))
+ wake_up(&imp->imp_recovery_waitq);
+ }
spin_unlock(&imp->imp_lock);
ptlrpc_rqphase_move(req, RQ_PHASE_NEW);
continue;
$LCTL set_param fail_loc=0x80000521
dd if=/dev/zero of=$DIR/$tfile bs=2M count=1 oflag=sync
rm -f $DIR/$tfile
+ # This error path has sometimes left inflight requests dangling, so
+ # test for this by remounting the client (umount will hang if there's
+ # a dangling request)
+ umount_client $MOUNT
+ mount_client $MOUNT
}
run_test 414 "simulate ENOMEM in ptlrpc_register_bulk()"