lctl {get,set}_param {service}.thread_{min,max,started}
</screen>
</para>
- <para condition='l23'>Lustre software release 2.3 introduced binding
- service threads to CPU partition. This works in a similar fashion to
+ <para>
+ This works in a similar fashion to
binding of threads on MDS. MDS thread tuning is covered in
<xref linkend="dbdoclet.mdsbinding" />.</para>
<itemizedlist>
<literal>mds_num_threads</literal> parameter enables the number of MDS
service threads to be specified at module load time on the MDS
node:</para>
- <screen>
-options mds mds_num_threads={N}
-</screen>
+ <screen>options mds mds_num_threads={N}</screen>
<para>After startup, the minimum and maximum number of MDS thread counts
can be set via the
<literal>{service}.thread_{min,max,started}</literal> tunable. To change
</para>
<para>For details, see
<xref linkend="dbdoclet.50438271_87260" />.</para>
- <para>At this time, no testing has been done to determine the optimal
- number of MDS threads. The default value varies, based on server size, up
- to a maximum of 32. The maximum number of threads (
- <literal>MDS_MAX_THREADS</literal>) is 512.</para>
+ <para>The number of MDS service threads started depends on system size
+ and the load on the server, and has a default maximum of 64. The
+ maximum potential number of threads (<literal>MDS_MAX_THREADS</literal>)
+ is 1024.</para>
<note>
- <para>The OSS and MDS automatically start new service threads
- dynamically, in response to server load within a factor of 4. The
- default value is calculated the same way as before. Setting the
- <literal>_mu_threads</literal> module parameter disables automatic
- thread creation behavior.</para>
+ <para>The OSS and MDS start two threads per service per CPT at mount
+ time, and dynamically increase the number of running service threads in
+ response to server load. Setting the <literal>*_num_threads</literal>
+ module parameter starts the specified number of threads for that
+ service immediately and disables automatic thread creation behavior.
+ </para>
</note>
- <para>Lustre software release 2.3 introduced new parameters to provide
- more control to administrators.</para>
+ <para condition='l23'>Lustre software release 2.3 introduced new
+ parameters to provide more control to administrators.</para>
<itemizedlist>
<listitem>
<para>
release 1.8.</para>
</listitem>
</itemizedlist>
- <note>
- <para>Default values for the thread counts are automatically selected.
- The values are chosen to best exploit the number of CPUs present in the
- system and to provide best overall performance for typical
- workloads.</para>
- </note>
</section>
</section>
<section xml:id="dbdoclet.mdsbinding" condition='l23'>
</indexterm>Binding MDS Service Thread to CPU Partitions</title>
<para>With the introduction of Node Affinity (
<xref linkend="nodeaffdef" />) in Lustre software release 2.3, MDS threads
- can be bound to particular CPU partitions (CPTs). Default values for
+ can be bound to particular CPU partitions (CPTs) to improve CPU cache
+ usage and memory locality. Default values for CPT counts and CPU core
bindings are selected automatically to provide good overall performance for
a given CPU count. However, an administrator can deviate from these setting
- if they choose.</para>
+ if they choose. For details on specifying the mapping of CPU cores to
+ CPTs see <xref linkend="dbdoclet.libcfstuning"/>.
+ </para>
<itemizedlist>
<listitem>
<para>
be MAX.</para>
</section>
</section>
- <section xml:id="dbdoclet.libcfstuning">
+ <section xml:id="dbdoclet.libcfstuning" condition='l23'>
<title>
<indexterm>
<primary>tuning</primary>
<secondary>libcfs</secondary>
</indexterm>libcfs Tuning</title>
+ <para>Lustre software release 2.3 introduced binding service threads via
+ CPU Partition Tables (CPTs). This allows the system administrator to
+ fine-tune on which CPU cores the Lustre service threads are run, for both
+ OSS and MDS services, as well as on the client.
+ </para>
+ <para>CPTs are useful to reserve some cores on the OSS or MDS nodes for
+ system functions such as system monitoring, HA heartbeat, or similar
+ tasks. On the client it may be useful to restrict Lustre RPC service
+ threads to a small subset of cores so that they do not interfere with
+ computation, or because these cores are directly attached to the network
+ interfaces.
+ </para>
<para>By default, the Lustre software will automatically generate CPU
- partitions (CPT) based on the number of CPUs in the system. The CPT number
- will be 1 if the online CPU number is less than five.</para>
- <para>The CPT number can be explicitly set on the libcfs module using
- <literal>cpu_npartitions=NUMBER</literal>. The value of
- <literal>cpu_npartitions</literal> must be an integer between 1 and the
- number of online CPUs.</para>
+ partitions (CPT) based on the number of CPUs in the system.
+ The CPT count can be explicitly set on the libcfs module using
+ <literal>cpu_npartitions=<replaceable>NUMBER</replaceable></literal>.
+ The value of <literal>cpu_npartitions</literal> must be an integer between
+ 1 and the number of online CPUs.
+ </para>
+ <para condition='l29'>In Lustre 2.9 and later the default is to use
+ one CPT per NUMA node. In earlier versions of Lustre, by default there
+ was a single CPT if the online CPU core count was four or fewer, and
+ additional CPTs would be created depending on the number of CPU cores,
+ typically with 4-8 cores per CPT.
+ </para>
<tip>
- <para>Setting CPT to 1 will disable most of the SMP Node Affinity
- functionality.</para>
+ <para>Setting <literal>cpu_npartitions=1</literal> will disable most
+ of the SMP Node Affinity functionality.</para>
</tip>
<section>
<title>CPU Partition String Patterns</title>
- <para>CPU partitions can be described using string pattern notation. For
- example:</para>
+ <para>CPU partitions can be described using string pattern notation.
+ If <literal>cpu_pattern=N</literal> is used, then there will be one
+ CPT for each NUMA node in the system, with each CPT mapping all of
+ the CPU cores for that NUMA node.
+ </para>
+ <para>It is also possible to explicitly specify the mapping between
+ CPU cores and CPTs, for example:</para>
<itemizedlist>
<listitem>
<para>
- <literal>cpu_pattern="0[0,2,4,6] 1[1,3,5,7]</literal>
+ <literal>cpu_pattern="0[2,4,6] 1[3,5,7]</literal>
</para>
- <para>Create two CPTs, CPT0 contains CPU[0, 2, 4, 6]. CPT1 contains
- CPU[1,3,5,7].</para>
+ <para>Create two CPTs, CPT0 contains cores 2, 4, and 6, while CPT1
+ contains cores 3, 5, 7. CPU cores 0 and 1 will not be used by Lustre
+ service threads, and could be used for node services such as
+ system monitoring, HA heartbeat threads, etc. The binding of
+ non-Lustre services to those CPU cores may be done in userspace
+ using <literal>numactl(8)</literal> or other application-specific
+ methods, but is beyond the scope of this document.</para>
</listitem>
<listitem>
<para>
<literal>cpu_pattern="N 0[0-3] 1[4-7]</literal>
</para>
- <para>Create two CPTs, CPT0 contains all CPUs in NUMA node[0-3], CPT1
- contains all CPUs in NUMA node [4-7].</para>
+ <para>Create two CPTs, with CPT0 containing all CPUs in NUMA
+ node[0-3], while CPT1 contains all CPUs in NUMA node [4-7].</para>
</listitem>
</itemizedlist>
- <para>The current configuration of the CPU partition can be read from
- <literal>/proc/sys/lnet/cpu_partition_table</literal></para>
+ <para>The current configuration of the CPU partition can be read via
+ <literal>lctl get_parm cpu_partition_table</literal>. For example,
+ a simple 4-core system has a single CPT with all four CPU cores:
+ <screen>$ lctl get_param cpu_partition_table
+cpu_partition_table=0 : 0 1 2 3</screen>
+ while a larger NUMA system with four 12-core CPUs may have four CPTs:
+ <screen>$ lctl get_param cpu_partition_table
+cpu_partition_table=
+0 : 0 1 2 3 4 5 6 7 8 9 10 11
+1 : 12 13 14 15 16 17 18 19 20 21 22 23
+2 : 24 25 26 27 28 29 30 31 32 33 34 35
+3 : 36 37 38 39 40 41 42 43 44 45 46 47
+</screen>
+ </para>
</section>
</section>
<section xml:id="dbdoclet.lndtuning">