Whamcloud - gitweb
LU-13569 lnet: Introduce lnet_recovery_limit parameter 16/39716/12
authorChris Horn <chris.horn@hpe.com>
Fri, 21 Aug 2020 18:33:12 +0000 (13:33 -0500)
committerOleg Drokin <green@whamcloud.com>
Wed, 9 Dec 2020 07:48:09 +0000 (07:48 +0000)
This parameter controls how long LNet will attempt to recover an
unhealthy interface.

Defaults to 0 to indicate indefinite recovery. This maintains the
current behavior.

Test-Parameters: trivial
HPE-bug-id: LUS-9109
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I2f7897d9a293f0979f7402de2b91e160c77790d1
Reviewed-on: https://review.whamcloud.com/39716
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/include/lnet/lib-lnet.h
lnet/lnet/api-ni.c

index c3202cb..d245c11 100644 (file)
@@ -520,6 +520,7 @@ extern unsigned int lnet_lnd_timeout;
 extern unsigned int lnet_numa_range;
 extern unsigned int lnet_health_sensitivity;
 extern unsigned int lnet_recovery_interval;
+extern unsigned int lnet_recovery_limit;
 extern unsigned int lnet_peer_discovery_disabled;
 extern unsigned int lnet_drop_asym_route;
 extern unsigned int router_sensitivity_percentage;
index dff911d..b75a625 100644 (file)
@@ -124,6 +124,11 @@ module_param_call(lnet_recovery_interval, recovery_interval_set, param_get_int,
 MODULE_PARM_DESC(lnet_recovery_interval,
                "Interval to recover unhealthy interfaces in seconds");
 
+unsigned int lnet_recovery_limit;
+module_param(lnet_recovery_limit, uint, 0644);
+MODULE_PARM_DESC(lnet_recovery_limit,
+                "How long to attempt recovery of unhealthy peer interfaces in seconds. Set to 0 to allow indefinite recovery");
+
 static int lnet_interfaces_max = LNET_INTERFACES_MAX_DEFAULT;
 static int intf_max_set(const char *val, cfs_kernel_param_arg_t *kp);