Whamcloud - gitweb
LU-17999 lnet: prevent race in access to peer rtrcredits count 20/55620/9
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Thu, 4 Jul 2024 00:02:32 +0000 (17:02 -0700)
committerOleg Drokin <green@whamcloud.com>
Wed, 31 Jul 2024 15:57:14 +0000 (15:57 +0000)
commitf3ee8ec7c820979f278c3bcbf36789d9b9b24ae6
tree310ccbcd73ba281072dd30dd8edaa2a0df0940db
parent319970b5febb4d50544cb0775a0d7d36b113a400
LU-17999 lnet: prevent race in access to peer rtrcredits count

Refactor lnet_parse_forward_locked and lnet_post_routed_recv_locked
to have the code which checks and acts on peer rtrcredits in a single
spot, in order to avoid the race when the count is decremented
(by another thread) after being checked initially for the purpose of
"eager receiving" the message, which might cause an assert on
msg_rx_ready_delay to get triggered.

This race is possible if messages from the same peer NID are being
processed on different local NIs mapped to different CPTs.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ibe938882a69d860554cd9c875403bfb0399df8ec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55620
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
lnet/lnet/lib-move.c