Whamcloud - gitweb
LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM 09/36309/8
authorAnn Koehler <amk@cray.com>
Mon, 14 Oct 2019 16:30:56 +0000 (11:30 -0500)
committerOleg Drokin <green@whamcloud.com>
Fri, 6 Dec 2019 01:09:19 +0000 (01:09 +0000)
Another path through ptl_send_rpc() can cause the assert reported
in LU-10643. The assertion in ptlrpc_register_bulk() on
!desc->bd_registered fails when an rpc is resent and the first
send attempt failed to successfully attach the reply buffer. The
bulk error cleanup in ptl_send_rpc() does not reset the
bd_registered flag.

Cray-bug-id: LUS-7946
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I474211f196ea9bd83a036747e25c91c37c85ffbb
Reviewed-on: https://review.whamcloud.com/36309
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ptlrpc/niobuf.c

index eaf6bd0..4720592 100644 (file)
@@ -908,18 +908,20 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
                 GOTO(out, rc);
 
  cleanup_me:
-        /* MEUnlink is safe; the PUT didn't even get off the ground, and
-         * nobody apart from the PUT's target has the right nid+XID to
-         * access the reply buffer. */
-        rc2 = LNetMEUnlink(reply_me_h);
-        LASSERT (rc2 == 0);
-        /* UNLINKED callback called synchronously */
-        LASSERT(!request->rq_receiving_reply);
+       /* MEUnlink is safe; the PUT didn't even get off the ground, and
+        * nobody apart from the PUT's target has the right nid+XID to
+        * access the reply buffer. */
+       rc2 = LNetMEUnlink(reply_me_h);
+       LASSERT (rc2 == 0);
+       /* UNLINKED callback called synchronously */
+       LASSERT(!request->rq_receiving_reply);
 
  cleanup_bulk:
-        /* We do sync unlink here as there was no real transfer here so
-         * the chance to have long unlink to sluggish net is smaller here. */
+       /* We do sync unlink here as there was no real transfer here so
+        * the chance to have long unlink to sluggish net is smaller here. */
         ptlrpc_unregister_bulk(request, 0);
+       if (request->rq_bulk != NULL)
+               request->rq_bulk->bd_registered = 0;
  out:
        if (rc == -ENOMEM) {
                /* set rq_sent so that this request is treated