LU-13001 lnet: Wait for single discovery attempt of routers
Historically, check_routers_before_use would cause LNet
initialization to pause until all routers had been ping'd once.
This behavior was changed in commit
fe17e9b8370affe063769b880f02b9190584baaa from LU-11298. Now, LNet
will wait indefinitely until discovery completes on all routers.
This is problematic, because if even one router is down then LNet
will stall forever.
Introduce a new lnet_peer state to indicate whether a router has
been discovered (either successfully or not) to restore the historic
behavior.
Fixes
fe17e9b8370a ("LU-11298 lnet: use peer for gateway")
Test-Parameters: trivial
Cray-bug-id: LUS-8184
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ia064ffeb3e918cdb8d5a6150f443c48aa14e7a7c
Reviewed-on: https://review.whamcloud.com/36820
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>