Whamcloud - gitweb
LU-17476 lnet: prefer to use bits only to match ME 88/55488/2
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Sat, 27 Jan 2024 20:17:34 +0000 (12:17 -0800)
committerOleg Drokin <green@whamcloud.com>
Sat, 22 Jun 2024 06:39:55 +0000 (06:39 +0000)
commiteb35ce5538512b67fd82955c54a148eb707a10ee
tree839d287484155562cabe5bb84f2ba10282eb50ce
parent7b45e8e96283b9b5c3552d672243a44bca4edc65
LU-17476 lnet: prefer to use bits only to match ME

In some cases, it has been observed that a reply will arrive
at the portal with the correct match bits, but is dropped by
lnet_parse_put().  This appears to happen with LNet Multi-Rail
peers, each having two separate NIDs.

If a reply arrives with matchbits available and matching, but
the NIDs don't match, confirm the match if the NIDs are found
to belong to the same peer.  This will only happen in cases
where the reply would be dropped entirely, causing hundreds of
seconds of delay until the RPC is resent, so the extra overhead
of checking for a peer match before dropping the request is
only in the error path and minimal compared to the alternative.

Add CFS_FAIL_CHECK() for exercising the match NIDs code.

That is in a hot codepath, but CFS_FAIL_CHECK() is marked unlikely()
and this check is in the error case and _should_ only be hit when
the message would have been dropped anyway, so it seems unlikely to
impact performance in any meaningful way.

Lustre-change: https://review.whamcloud.com/53843
Lustre-commit: 0b61b7d6d7940f67b75db2f4747169478512dd09

Test-Parameters: testlist=sanity env=ONLY=350,ONLY_REPEAT=10
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I10e1a2142539ddf5dabc26ce962cec1f2cfcf3db
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55488
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/lnet/lib-ptl.c
lustre/tests/sanity.sh