Whamcloud - gitweb
LU-15068 ptlrpc: Do not unlink difficult reply until sent
authorChris Horn <chris.horn@hpe.com>
Tue, 5 Oct 2021 19:11:29 +0000 (14:11 -0500)
committerAndreas Dilger <adilger@whamcloud.com>
Thu, 10 Mar 2022 04:34:55 +0000 (04:34 +0000)
commit2a57dfb1cf61396cdce714f164795acf7a23e94a
tree0c7263f4e9afa1cad3eff650ad8fdb744370d96c
parent23a44b3071f23082cd618bbfc6241eb9567e09db
LU-15068 ptlrpc: Do not unlink difficult reply until sent

If a difficult reply is queued in LNet, or the PUT for it is
otherwise delayed, then it is possible for the commit callback
to unlink the reply MD which will abort the send. This results in
client hitting "slow reply" timeout for the associated RPC and
an unnecessary reconnect (and possibly resend).

This patch replaces the rs_on_net flag with rs_sent and rs_unlinked.
These flags indicate whether the send event for the reply MD has
been generated, and whether the MD has been unlinked, respectively.

If rs_sent is set, but rs_unlinked has not been set, then ptlrpc_hr
is free to unlink the reply MD as a result of the commit callback.
The reply-ack will simply be dropped by the server.

If ptlrpc_hr is processing the reply because of commit callback, and
rs_sent has not been set, then ptlrpc_hr will not unlink the reply
MD. This means that the reply_out_callback must also be modified to
check for this case when the send event occurs. Otherwise, if the ACK
never arrives from the client, then the MD would never be unlinked.
Thus when the send event occurs, and rs_handled is set, the
reply_out_callback will schedule the reply for handling by ptlrpc_hr.

Lustre-change: https://review.whamcloud.com/45138
Lustre-commit: 5c156b48425aae245537aaf10229734166463347

HPE-bug-id: LUS-10505
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib8f4853c7ab35d72624fce7ee3fba9e59a746e1f
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lustre/include/lustre_net.h
lustre/ldlm/ldlm_lib.c
lustre/ptlrpc/events.c
lustre/ptlrpc/pack_generic.c
lustre/ptlrpc/service.c