From 5cfb40263106cc848447c33dda03665455f7679e Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Mon, 26 Sep 2016 14:23:15 -0600 Subject: [PATCH] LUDOC-11 tuning: improve description of CPT config Improve the description of CPT configuration tunables, such as module parameters and /proc file output for various potential uses. Signed-off-by: Andreas Dilger Change-Id: I7d7df85091cc099c22f811c1285433a57da0c4ae Reviewed-on: http://review.whamcloud.com/22741 Tested-by: Jenkins Reviewed-by: Amir Shehata --- LustreTuning.xml | 120 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 78 insertions(+), 42 deletions(-) diff --git a/LustreTuning.xml b/LustreTuning.xml index 76a7f7d..2e3a496 100644 --- a/LustreTuning.xml +++ b/LustreTuning.xml @@ -94,8 +94,8 @@ options ost oss_num_threads={N} lctl {get,set}_param {service}.thread_{min,max,started} - Lustre software release 2.3 introduced binding - service threads to CPU partition. This works in a similar fashion to + + This works in a similar fashion to binding of threads on MDS. MDS thread tuning is covered in . @@ -125,9 +125,7 @@ lctl {get,set}_param {service}.thread_{min,max,started} mds_num_threads parameter enables the number of MDS service threads to be specified at module load time on the MDS node: - -options mds mds_num_threads={N} - + options mds mds_num_threads={N} After startup, the minimum and maximum number of MDS thread counts can be set via the {service}.thread_{min,max,started} tunable. To change @@ -139,19 +137,20 @@ lctl {get,set}_param {service}.thread_{min,max,started} For details, see . - At this time, no testing has been done to determine the optimal - number of MDS threads. The default value varies, based on server size, up - to a maximum of 32. The maximum number of threads ( - MDS_MAX_THREADS) is 512. + The number of MDS service threads started depends on system size + and the load on the server, and has a default maximum of 64. The + maximum potential number of threads (MDS_MAX_THREADS) + is 1024. - The OSS and MDS automatically start new service threads - dynamically, in response to server load within a factor of 4. The - default value is calculated the same way as before. Setting the - _mu_threads module parameter disables automatic - thread creation behavior. + The OSS and MDS start two threads per service per CPT at mount + time, and dynamically increase the number of running service threads in + response to server load. Setting the *_num_threads + module parameter starts the specified number of threads for that + service immediately and disables automatic thread creation behavior. + - Lustre software release 2.3 introduced new parameters to provide - more control to administrators. + Lustre software release 2.3 introduced new + parameters to provide more control to administrators. @@ -166,12 +165,6 @@ lctl {get,set}_param {service}.thread_{min,max,started} release 1.8. - - Default values for the thread counts are automatically selected. - The values are chosen to best exploit the number of CPUs present in the - system and to provide best overall performance for typical - workloads. -
@@ -182,10 +175,13 @@ lctl {get,set}_param {service}.thread_{min,max,started} Binding MDS Service Thread to CPU Partitions With the introduction of Node Affinity ( ) in Lustre software release 2.3, MDS threads - can be bound to particular CPU partitions (CPTs). Default values for + can be bound to particular CPU partitions (CPTs) to improve CPU cache + usage and memory locality. Default values for CPT counts and CPU core bindings are selected automatically to provide good overall performance for a given CPU count. However, an administrator can deviate from these setting - if they choose. + if they choose. For details on specifying the mapping of CPU cores to + CPTs see . + @@ -529,45 +525,85 @@ lnet large_router_buffers=8192 be MAX.
-
+
<indexterm> <primary>tuning</primary> <secondary>libcfs</secondary> </indexterm>libcfs Tuning + Lustre software release 2.3 introduced binding service threads via + CPU Partition Tables (CPTs). This allows the system administrator to + fine-tune on which CPU cores the Lustre service threads are run, for both + OSS and MDS services, as well as on the client. + + CPTs are useful to reserve some cores on the OSS or MDS nodes for + system functions such as system monitoring, HA heartbeat, or similar + tasks. On the client it may be useful to restrict Lustre RPC service + threads to a small subset of cores so that they do not interfere with + computation, or because these cores are directly attached to the network + interfaces. + By default, the Lustre software will automatically generate CPU - partitions (CPT) based on the number of CPUs in the system. The CPT number - will be 1 if the online CPU number is less than five. - The CPT number can be explicitly set on the libcfs module using - cpu_npartitions=NUMBER. The value of - cpu_npartitions must be an integer between 1 and the - number of online CPUs. + partitions (CPT) based on the number of CPUs in the system. + The CPT count can be explicitly set on the libcfs module using + cpu_npartitions=NUMBER. + The value of cpu_npartitions must be an integer between + 1 and the number of online CPUs. + + In Lustre 2.9 and later the default is to use + one CPT per NUMA node. In earlier versions of Lustre, by default there + was a single CPT if the online CPU core count was four or fewer, and + additional CPTs would be created depending on the number of CPU cores, + typically with 4-8 cores per CPT. + - Setting CPT to 1 will disable most of the SMP Node Affinity - functionality. + Setting cpu_npartitions=1 will disable most + of the SMP Node Affinity functionality.
CPU Partition String Patterns - CPU partitions can be described using string pattern notation. For - example: + CPU partitions can be described using string pattern notation. + If cpu_pattern=N is used, then there will be one + CPT for each NUMA node in the system, with each CPT mapping all of + the CPU cores for that NUMA node. + + It is also possible to explicitly specify the mapping between + CPU cores and CPTs, for example: - cpu_pattern="0[0,2,4,6] 1[1,3,5,7] + cpu_pattern="0[2,4,6] 1[3,5,7] - Create two CPTs, CPT0 contains CPU[0, 2, 4, 6]. CPT1 contains - CPU[1,3,5,7]. + Create two CPTs, CPT0 contains cores 2, 4, and 6, while CPT1 + contains cores 3, 5, 7. CPU cores 0 and 1 will not be used by Lustre + service threads, and could be used for node services such as + system monitoring, HA heartbeat threads, etc. The binding of + non-Lustre services to those CPU cores may be done in userspace + using numactl(8) or other application-specific + methods, but is beyond the scope of this document. cpu_pattern="N 0[0-3] 1[4-7] - Create two CPTs, CPT0 contains all CPUs in NUMA node[0-3], CPT1 - contains all CPUs in NUMA node [4-7]. + Create two CPTs, with CPT0 containing all CPUs in NUMA + node[0-3], while CPT1 contains all CPUs in NUMA node [4-7]. - The current configuration of the CPU partition can be read from - /proc/sys/lnet/cpu_partition_table + The current configuration of the CPU partition can be read via + lctl get_parm cpu_partition_table. For example, + a simple 4-core system has a single CPT with all four CPU cores: + $ lctl get_param cpu_partition_table +cpu_partition_table=0 : 0 1 2 3 + while a larger NUMA system with four 12-core CPUs may have four CPTs: + $ lctl get_param cpu_partition_table +cpu_partition_table= +0 : 0 1 2 3 4 5 6 7 8 9 10 11 +1 : 12 13 14 15 16 17 18 19 20 21 22 23 +2 : 24 25 26 27 28 29 30 31 32 33 34 35 +3 : 36 37 38 39 40 41 42 43 44 45 46 47 + +
-- 1.8.3.1