From: Frank Sehr Date: Tue, 10 Sep 2024 23:30:21 +0000 (-0700) Subject: LU-18208 lnet: Server VM crashed: unable to handle X-Git-Tag: 2.15.91~3 X-Git-Url: https://git.whamcloud.com/gitweb?a=commitdiff_plain;h=dca487cd2dd6c0a805928bd22f49ceb7328df814;p=fs%2Flustre-release.git LU-18208 lnet: Server VM crashed: unable to handle Revert "LU-18160 lnet: ensure lnetctl ping completes in a finite time" t seems like the patch for LU-18160 introduced crashes. Maybe the change from wait_for_complete from timeout to interupt. Reverting that patch solved the problem. This reverts commit 1666840bb06bbeeb35b2f9a51f9235c36886a3c6. Test-Parameters: trivial testlist=sanity Signed-off-by: Frank Sehr Change-Id: Ie48185eb973eee65df2810d7acf940cf6981b83e Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56322 Reviewed-by: Timothy Day Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Tested-by: Maloo Tested-by: jenkins --- diff --git a/lnet/lnet/api-ni.c b/lnet/lnet/api-ni.c index 2053f1b..91f993f 100644 --- a/lnet/lnet/api-ni.c +++ b/lnet/lnet/api-ni.c @@ -10019,6 +10019,7 @@ static int lnet_ping(struct lnet_processid *id, struct lnet_nid *src_nid, u32 *st; int nob; int rc; + int rc2; genradix_init(&plist->lgpl_list); @@ -10056,22 +10057,27 @@ static int lnet_ping(struct lnet_processid *id, struct lnet_nid *src_nid, init_completion(&pd.completion); rc = LNetMDBind(&md, LNET_UNLINK, &pd.mdh); - if (rc) { + if (rc != 0) { CERROR("Can't bind MD: %d\n", rc); goto fail_ping_buffer_decref; } rc = LNetGet(src_nid, pd.mdh, id, LNET_RESERVED_PORTAL, LNET_PROTO_PING_MATCHBITS, 0, false); - if (rc) - LASSERT(!LNetMDUnlink(pd.mdh)); + if (rc != 0) { + /* Don't CERROR; this could be deliberate! */ + rc2 = LNetMDUnlink(pd.mdh); + LASSERT(rc2 == 0); - /* Ensure completion in finite time... */ - wait_for_completion_interruptible_timeout(&pd.completion, - timeout); + /* NB must wait for the UNLINK event below... */ + } - if (!pd.pd_unlinked) + /* Ensure completion in finite time... */ + wait_for_completion_timeout(&pd.completion, timeout); + if (!pd.pd_unlinked) { LNetMDUnlink(pd.mdh); + wait_for_completion(&pd.completion); + } if (!pd.replied) { rc = pd.rc ?: -EIO;