LU-13145 lnet: use conservative health timeouts
Use more conservative lnet_transaction_timeout and lnet_retry_count
values by default. Currently with timeout=10 and retry=3 there is
only a 3s window for the RPC to be sent before it is timed out.
This has caused fault injection rather than fault tolerance.
Increase the default timeout to 50s with retry=2, which is hopefully
long enough to cover virtually all uses, but still allows LNet Health
to be enabled by default and resend before Lustre times out itself.
Fixes:
8632e94aeb7e ("LU-11816 lnet: setup health timeout defaults")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6bfc4d61cebab38c1554e1b42834b1f38fc34ba8
Reviewed-on: https://review.whamcloud.com/37430
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>