From cd3038b769ba1b7c5a4888ad84bdf03ecf51c709 Mon Sep 17 00:00:00 2001 From: Amir Shehata Date: Mon, 8 Jul 2019 12:33:31 -0700 Subject: [PATCH] LU-10931 lnet: handle unlink before send completes If LNetMDUnlink() is called on an md with md->md_refcount > 0 then the eq callback isn't called. There is a scenario where the response times out before the send completes. So we have a refcount on the MD. The Unlink callback gets dropped on the floor. Send completes, but because we've already timed out, the REPLY for the GET is dropped. Now we're left with a peer that is in the following state: LNET_PEER_MULTI_RAIL LNET_PEER_DISCOVERING LNET_PEER_PING_SENT But no more events are coming to it, and the discovery never completes. This scenario can get RPCs stuck as well if the response times out before the send completes. The solution is to set the event status to -ETIMEDOUT to inform the send event handler that it should not expect a reply. Lustre-commit: d8fc5c23fe541e0ff6ce5bec6302957714c3f69f Lustre-change: https://review.whamcloud.com/35444 Test-Parameters: trivial Signed-off-by: Amir Shehata Change-Id: Ica0e1a823d0d1200bb8cc42a6e058785da1d4fa4 Reviewed-by: Chris Horn Reviewed-by: Alexandr Boyko Reviewed-by: Olaf Weber Reviewed-by: Serguei Smirnov Reviewed-by: Li Xi Reviewed-on: https://review.whamcloud.com/45898 Reviewed-by: Chris Horn Tested-by: jenkins Tested-by: Maloo Reviewed-by: Oleg Drokin --- lnet/lnet/lib-msg.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lnet/lnet/lib-msg.c b/lnet/lnet/lib-msg.c index 959c370..9e52200 100644 --- a/lnet/lnet/lib-msg.c +++ b/lnet/lnet/lib-msg.c @@ -864,7 +864,12 @@ lnet_msg_detach_md(struct lnet_msg *msg, int cpt, int status) unlink = lnet_md_unlinkable(md); if (md->md_eq != NULL) { - msg->msg_ev.status = status; + if ((md->md_flags & LNET_MD_FLAG_ABORTED) && !status) { + msg->msg_ev.status = -ETIMEDOUT; + CDEBUG(D_NET, "md 0x%p already unlinked\n", md); + } else { + msg->msg_ev.status = status; + } msg->msg_ev.unlinked = unlink; lnet_eq_enqueue_event(md->md_eq, &msg->msg_ev); } -- 1.8.3.1