Whamcloud - gitweb
LU-14810 lnet: Do not issue multiple PUSHes
PUSH ACK may be delayed in network. Meanwhile, some event could cause
peer to go through discovery again (e.g. config change or NI state
change). The discovery state machine doesn't consider whether there
is an outstanding PUSH so it may issue another one for the same peer.
When delayed ACK arrives it will then clear PUSH_SENT, so now
discovery doesn't know that there is an outstanding PUSH. If discovery
is stopped then it doesn't unlink the push MD and this can cause an
assert in lnet_assert_handler_unused() because the push event handler
is still in use.
Modify the discovery state machine to check for PUSH_SENT when
determining whether a peer needs a PUSH.
sanity-lnet test_304 can reproduce this issue under ipv6
configuration if modules are unloaded at the end of the test.
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic3f7a8b44f85a18afb939fdbfa1f9bc5dc64d93d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>