Whamcloud - gitweb
LU-17476 lnet: prefer to use bits only to match ME
In some cases, it has been observed that a reply will arrive
at the portal with the correct match bits, but is dropped by
lnet_parse_put(). This appears to happen with LNet Multi-Rail
peers, each having two separate NIDs.
If a reply arrives with matchbits available and matching, but
the NIDs don't match, confirm the match if the NIDs are found
to belong to the same peer. This will only happen in cases
where the reply would be dropped entirely, causing hundreds of
seconds of delay until the RPC is resent, so the extra overhead
of checking for a peer match before dropping the request is
only in the error path and minimal compared to the alternative.
Add CFS_FAIL_CHECK() for exercising the match NIDs code.
That is in a hot codepath, but CFS_FAIL_CHECK() is marked unlikely()
and this check is in the error case and _should_ only be hit when
the message would have been dropped anyway, so it seems unlikely to
impact performance in any meaningful way.
Test-Parameters: testlist=sanity env=ONLY=350,ONLY_REPEAT=10
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I10e1a2142539ddf5dabc26ce962cec1f2cfcf3db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53843
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>