From: Trung Nguyen Date: Mon, 9 Nov 2020 06:46:44 +0000 (-0700) Subject: EX-2010 scsi: requeue aborted commands instead of retry X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=8acbebd431675755fc605cc931a21c7eb0c6ca32;p=fs%2Flustre-release.git EX-2010 scsi: requeue aborted commands instead of retry If the underlying SCSI command returns an abort, rather than retry it quickly in a loop, which can finish within a few milliseconds, requeue it with delay so that the hardware has a chance to recover. The command requeue will take several seconds each time and allows more chance for the problem to be resolved at the SCSI layer instead of returning an error to the filesystem and causing server failover. Test-Parameters: trivial testlist=sanity Signed-off-by: Trung Nguyen Change-Id: Ibdf1b3a52dd0a1b388c7f5f97aa7a51620138845 Tested-by: Shuichi Ihara Reviewed-on: https://review.whamcloud.com/41852 Reviewed-by: Andreas Dilger Tested-by: jenkins Tested-by: Maloo --- diff --git a/lustre/kernel_patches/patches/scsi-requeue-aborted-commands-instead-of-retry.patch b/lustre/kernel_patches/patches/scsi-requeue-aborted-commands-instead-of-retry.patch new file mode 100644 index 0000000..61e7304 --- /dev/null +++ b/lustre/kernel_patches/patches/scsi-requeue-aborted-commands-instead-of-retry.patch @@ -0,0 +1,22 @@ +DDN-1501 scsi: requeue aborted commands instead of retry + +If the underlying SCSI command returns an abort, rather than retry +it quickly in a loop, which can finish within a few milliseconds, +requeue it with delay so that the hardware has a chance to recover. + +The command requeue will take several seconds each time and allows +more chance for the problem to be resolved at the SCSI layer instead +of returning an error to the filesystem and causing server failover. + +Signed-off-by: Trung Nguyen +--- ./drivers/scsi/scsi_error.c.orig 2020-02-12 06:45:22.000000000 -0700 ++++ ./drivers/scsi/scsi_error.c 2020-11-08 23:11:41.045007688 -0700 +@@ -510,7 +510,7 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) + if (sshdr.asc == 0x10) /* DIF */ + return SUCCESS; + +- return NEEDS_RETRY; ++ return ADD_TO_MLQUEUE; + case NOT_READY: + case UNIT_ATTENTION: + /* diff --git a/lustre/kernel_patches/series/3.10-rhel7.7.series b/lustre/kernel_patches/series/3.10-rhel7.7.series index 001175d..994bcba 100644 --- a/lustre/kernel_patches/series/3.10-rhel7.7.series +++ b/lustre/kernel_patches/series/3.10-rhel7.7.series @@ -3,6 +3,7 @@ blkdev_tunables-3.9.patch vfs-project-quotas-rhel7.patch fix-integrity-verify-rhel7.patch fix-sd-dif-complete-rhel7.patch +scsi-requeue-aborted-commands-instead-of-retry.patch block-integrity-allow-optional-integrity-functions-rhel7.patch block-pass-bio-into-integrity_processing_fn-rhel7.patch block-Ensure-we-only-enable-integrity-metadata-for-reads-and-writes-rhel7.patch