From: Andreas Dilger Date: Thu, 2 Nov 2023 02:53:02 +0000 (-0600) Subject: LU-15246 params: document per-device timeouts X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=39f6b8063c2f9344734f4c7db2a10cfc347d3800;p=doc%2Fmanual.git LU-15246 params: document per-device timeouts Add a mention of per-device at_min, at_max, at_history, and ldlm_enqueue_min. Some reformatting of the timeout-related text to wrap at 80 cols. Signed-off-by: Andreas Dilger Change-Id: I42557e708950d729a55e0ed17d424965c6dbc357 Reviewed-on: https://review.whamcloud.com/c/doc/manual/+/52945 Tested-by: jenkins --- diff --git a/LustreProc.xml b/LustreProc.xml index bc70d6f..6d76ae1 100644 --- a/LustreProc.xml +++ b/LustreProc.xml @@ -1853,15 +1853,32 @@ rpcs in flight rpcs % cum % proc adaptive timeouts Configuring Adaptive Timeouts - The adaptive timeout parameters in the table below can be set persistently system-wide - using lctl conf_param on the MGS. For example, the following command sets - the at_max value for all servers and clients associated with the file - system - testfs:lctl conf_param testfs.sys.at_max=1500 + The adaptive timeout parameters in the table below can be set + persistently system-wide using lctl set_param -P + on the MGS. For example, the following command sets the + at_max value for all servers and clients + associated with the file systems connected to this MGS: + + +mgs# lctl set_param -P at_max=1500 + - Clients that access multiple Lustre file systems must use the same parameter values + Clients that access multiple Lustre file systems + must use the same adaptive timeout values for all file systems. + + Since Lustre 2.16 it is preferred to set + at_min as a per-target tunable using the + *.fsname*.at_min + parameter instead of the global at_min + parameter. This avoids issues if a single client mounts two + separate filesystems with different at_min + tunable settings. + + +mgs# lctl set_param -P *.testfs-*.at_max=1500 + @@ -1883,14 +1900,28 @@ rpcs in flight rpcs % cum % at_min - Minimum adaptive timeout (in seconds). The default value is 0. The - at_min parameter is the minimum processing time that a server - will report. Ideally, at_min should be set to its default - value. Clients base their timeouts on this value, but they do not use this value - directly. - If, for unknown reasons (usually due to temporary network outages), the - adaptive timeout value is too short and clients time out their RPCs, you can - increase the at_min value to compensate for this. + Minimum adaptive timeout (in seconds). The default value + is 5 (since 2.16). The at_min parameter is + the minimum processing time that a server will report. + Ideally, at_min should be left at its + default value. Clients base their timeouts on this value, + but they do not use this value directly. + + If, for some reason (usually due to temporary network + outages or sudden spikes in load immediately after mount), + the adaptive timeout value is too short and clients time + out their RPCs, you can increase the at_min + value to compensate for this. + + + Since Lustre 2.16 it is preferred to set + at_min as a per-target tunable using the + *.fsname*.at_min + parameter instead of the global at_min + parameter. This avoids issues if a single client mounts two + separate filesystems with different at_min + tunable settings. + @@ -1899,16 +1930,28 @@ rpcs in flight rpcs % cum % at_max - Maximum adaptive timeout (in seconds). The at_max parameter - is an upper-limit on the service time estimate. If at_max is + Maximum adaptive timeout (in seconds). The + at_max parameter is an upper-limit on the + service time estimate. If at_max is reached, an RPC request times out. - Setting at_max to 0 causes adaptive timeouts to be disabled + Setting at_max to 0 causes adaptive + timeouts to be disabled and a fixed timeout method to be used instead (see + + Since Lustre 2.16 it is preferred to set + at_max as a per-target tunable using the + *.fsname*.at_max + parameter instead of the global at_max + parameter. This avoids issues if a single client mounts two + separate filesystems with different at_max + settings. + - If slow hardware causes the service estimate to increase beyond the default - value of at_max, increase at_max to the - maximum time you are willing to wait for an RPC completion. + If slow hardware causes the service estimate to + increase beyond the default at_max value, + increase at_max to the maximum time you + are willing to wait for an RPC completion. @@ -1918,8 +1961,18 @@ rpcs in flight rpcs % cum % at_history - Time period (in seconds) within which adaptive timeouts remember the slowest + Time period (in seconds) within which adaptive timeouts + remember the slowest event that occurred. The default is 600. + + Since Lustre 2.16 it is preferred to set + at_history as a per-target tunable using the + *.fsname*.at_history + parameter instead of the global at_history + parameter. This avoids issues if a single client mounts two + filesystems with different at_history + values. + @@ -1928,8 +1981,8 @@ rpcs in flight rpcs % cum % at_early_margin - Amount of time before the Lustre server sends an early reply (in seconds). - Default is 5. + Amount of time before the Lustre server sends an early + reply (in seconds). Default is 5. @@ -1938,18 +1991,22 @@ rpcs in flight rpcs % cum % at_extra - Incremental amount of time that a server requests with each early reply (in - seconds). The server does not know how much time the RPC will take, so it asks for - a fixed value. The default is 30, which provides a balance between sending too - many early replies for the same RPC and overestimating the actual completion - time. - When a server finds a queued request about to time out and needs to send an - early reply out, the server adds the at_extra value. If the - time expires, the Lustre server drops the request, and the client enters recovery - status and reconnects to restore the connection to normal status. - If you see multiple early replies for the same RPC asking for 30-second - increases, change the at_extra value to a larger number to cut - down on early replies sent and, therefore, network load. + Incremental amount of time that a server requests with + each early reply (in seconds). The server does not know how + much time the RPC will take, so it asks for a fixed value. + The default is 30, which provides a balance between sending + too many early replies for the same RPC and overestimating + the actual completion time. + When a server finds a queued request about to time out + and needs to send an early reply out, the server adds the + at_extra value. If the time expires, the + Lustre server drops the request, and the client enters + recovery status and reconnects to restore the connection to + normal status. + If you see multiple early replies for the same RPC asking + for 30-second increases, change at_extra + to a larger number to cut down on early replies sent and, + therefore, network load. @@ -1958,14 +2015,26 @@ rpcs in flight rpcs % cum % ldlm_enqueue_min - Minimum lock enqueue time (in seconds). The default is 100. The time it takes - to enqueue a lock, ldlm_enqueue, is the maximum of the measured - enqueue estimate (influenced by at_min and - at_max parameters), multiplied by a weighting factor and the - value of ldlm_enqueue_min. - Lustre Distributed Lock Manager (LDLM) lock enqueues have a dedicated minimum - value for ldlm_enqueue_min. Lock enqueue timeouts increase as - the measured enqueue times increase (similar to adaptive timeouts). + Minimum lock enqueue time (in seconds). The default is + 100. The it takes to enqueue a lock, shown as the + ldlm_enqueue operation in the stats files, + is the maximum of the measured enqueue estimate (influenced + by at_min and at_max + parameters), multiplied by a weighting factor and the value + of ldlm_enqueue_min. + Lustre Distributed Lock Manager (LDLM) lock enqueues + have a dedicated minimum ldlm_enqueue_min. + Lock enqueue timeouts increase as the measured enqueue times + increase (similar to adaptive timeouts). + + Since Lustre 2.16 it is preferred to set + ldlm_enqueue_min as a per-target tunable with + *.fsname*.ldlm_enqueue_min + instead of the global ldlm_enqueue_min + parameter. This avoids issues if a client mounts multiple + filesystems with different ldlm_enqueue_min + tunable settings. + @@ -2039,20 +2108,30 @@ req_timeout 6 samples [sec] 1 10 15 105 Lustre timeouts - - Lustre timeouts ensure that Lustre RPCs complete in a finite - time in the presence of failures when adaptive timeouts are not enabled. Adaptive - timeouts are enabled by default. To disable adaptive timeouts at run time, set - at_max to 0 by running on the - MGS:# lctl conf_param fsname.sys.at_max=0 + - Lustre timeouts ensure that Lustre RPCs + complete in a finite time in the presence of failures when + adaptive timeouts are not enabled. Adaptive timeouts are enabled + by default. To disable adaptive timeouts at run time, set + at_max to 0 by running on the MGS: + +# lctl conf_param fsname.sys.at_max=0 + + - Changing the status of adaptive timeouts at runtime may cause a transient client - timeout, recovery, and reconnection. + Changing the state of adaptive timeouts at runtime may + cause transient client timeouts, recovery, and reconnection. - Lustre timeouts are always printed as console messages. - If Lustre timeouts are not accompanied by LND timeouts, increase the Lustre - timeout on both servers and clients. Lustre timeouts are set using a command such as - the following:# lctl set_param timeout=30 - Lustre timeout parameters are described in the table below. + Lustre timeouts are always printed as console messages. + + If Lustre timeouts are not accompanied by LND timeouts, + increase the Lustre timeout on both servers and clients. Lustre + timeouts are set across the whole filesystem using a command + such as the following: + +mgs# lctl set_param -P timeout=30 + + + Timeout parameters are described in the table below. @@ -2069,48 +2148,55 @@ req_timeout 6 samples [sec] 1 10 15 105 timeout - The time that a client waits for a server to complete an RPC (default 100s). - Servers wait half this time for a normal client RPC to complete and a quarter of - this time for a single bulk request (read or write of up to 4 MB) to complete. - The client pings recoverable targets (MDS and OSTs) at one quarter of the - timeout, and the server waits one and a half times the timeout before evicting a + The time that a client waits for a server to complete + an RPC (default 100s). Servers wait half this time for a + normal client RPC to complete and a quarter of this time + for a single bulk request (read or write of up to 4 MB) + to complete. The client pings recoverable targets (MDS + and OSTs) at one quarter of the timeout, and the server + waits one and a half times the timeout before evicting a client for being "stale." - Lustre client sends periodic 'ping' messages to servers with which - it has had no communication for the specified period of time. Any network - activity between a client and a server in the file system also serves as a + Lustre client sends periodic 'ping' messages + to servers with which it has had no communication for the + specified period of time. Any network activity between a + client and a server in the file system also serves as a ping. ldlm_timeout - The time that a server waits for a client to reply to an initial AST (lock - cancellation request). The default is 20s for an OST and 6s for an MDS. If the - client replies to the AST, the server will give it a normal timeout (half the - client timeout) to flush any dirty data and release the lock. + The time that a server waits for a client to reply to + an initial AST (lock cancellation request). The default + is 20s for an OST and 6s for an MDS. If the client replies + to the AST, the server will give it a normal timeout (half + the client timeout) to flush any dirty data and release + the lock. fail_loc An internal debugging failure hook. The default value of - 0 means that no failure will be triggered or + 0 means that no failure will be triggered or injected. dump_on_timeout - Triggers a dump of the Lustre debug log when a timeout occurs. The default - value of 0 (zero) means a dump of the Lustre debug log will - not be triggered. + Triggers a dump of the Lustre debug log when a timeout + occurs. The default value of 0 (zero) + means a dump of the Lustre debug log will not be triggered. + dump_on_eviction - Triggers a dump of the Lustre debug log when an eviction occurs. The default - value of 0 (zero) means a dump of the Lustre debug log will + Triggers a dump of the Lustre debug log when an + eviction occurs. The default value of 0 + (zero) means a dump of the Lustre debug log will not be triggered. diff --git a/SystemConfigurationUtilities.xml b/SystemConfigurationUtilities.xml index a279080..f0a42d1 100644 --- a/SystemConfigurationUtilities.xml +++ b/SystemConfigurationUtilities.xml @@ -254,16 +254,28 @@ mgs# lctl set_param -P -d mdt.testfs-MDT0000.identity_upcall mgs# lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - The lctl conf_param command permanently sets parameters in the file system configuration for all nodes of the specified type. + The lctl conf_param command + permanently sets parameters in the file system + configuration for all nodes of the specified type. - To get current Lustre parameter settings, use the lctl get_param command on the desired node with the same parameter name as lctl set_param: - lctl get_param [-n] obdtype.obdname.parameter + To get current Lustre parameter settings, use the + lctl get_param command on the desired node with the + same parameter name as lctl set_param: + +# lctl get_param [-n] obdtype.obdname.parameter + For example: - mds# lctl get_param mdt.testfs-MDT0000.identity_upcall + +mds# lctl get_param mdt.testfs-MDT0000.identity_upcall + To list Lustre parameters that are available to set, use the lctl list_param command, with this syntax: - lctl list_param [-R] [-F] obdtype.obdname.* + +# lctl list_param [-R] [-F] obdtype.obdname.* + For example, to list all of the parameters on the MDT: - oss# lctl list_param -RF mdt + +oss# lctl list_param -RF mdt + For more information on using lctl to set temporary and permanent parameters, see . Network Configuration @@ -514,9 +526,20 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 conf_param [-d] device|fsname parameter=value - Sets a permanent configuration parameter for any device via the MGS. This command must be run on the MGS node. - All writeable parameters under lctl list_param (e.g. lctl list_param -F osc.*.* | grep =) can be permanently set using lctl conf_param, but the format is slightly different. For conf_param, the device is specified first, then the obdtype. Wildcards are not supported. Additionally, failover nodes may be added (or removed), and some system-wide parameters may be set as well (sys.at_max, sys.at_min, sys.at_extra, sys.at_early_margin, sys.at_history, sys.timeout, sys.ldlm_timeout). For system-wide parameters, device is ignored. - For more information on setting permanent parameters and lctl conf_param command examples, see (Setting Permanent Parameters). + Sets a permanent configuration parameter for any device + via the MGS. This command must be run on the MGS node. + + All writeable parameters under + lctl list_param (e.g. + lctl list_param -F osc.*.* | grep =) can + be permanently set using lctl conf_param, + but the conversion of list_param names to + conf_param names is not obvious, so it is + preferred to use the set_param -P command. + + For more information on setting permanent parameters, see + + (Setting Permanent Parameters).