LUDOC-11 tuning: improve description of CPT config

author Andreas Dilger <andreas.dilger@intel.com>

Mon, 26 Sep 2016 20:23:15 +0000 (14:23 -0600)

committer Andreas Dilger <andreas.dilger@intel.com>

Thu, 13 Oct 2016 06:23:40 +0000 (06:23 +0000)
author Andreas Dilger <andreas.dilger@intel.com>
Mon, 26 Sep 2016 20:23:15 +0000 (14:23 -0600)
committer Andreas Dilger <andreas.dilger@intel.com>
Thu, 13 Oct 2016 06:23:40 +0000 (06:23 +0000)
diff --git a/LustreTuning.xml b/LustreTuning.xml

index 76a7f7d..2e3a496 100644 (file)
--- a/LustreTuning.xml
+++ b/LustreTuning.xml
@@ -94,8 +94,8 @@ options ost oss_num_threads={N}
  lctl {get,set}_param {service}.thread_{min,max,started}
  </screen>
        </para>
-         <para condition='l23'>Lustre software release 2.3 introduced binding
-      service threads to CPU partition. This works in a similar fashion to 
+      <para>
+      This works in a similar fashion to 
        binding of threads on MDS. MDS thread tuning is covered in 
        <xref linkend="dbdoclet.mdsbinding" />.</para>
        <itemizedlist>
@@ -125,9 +125,7 @@ lctl {get,set}_param {service}.thread_{min,max,started}
        <literal>mds_num_threads</literal> parameter enables the number of MDS
        service threads to be specified at module load time on the MDS
        node:</para>
-      <screen>
-options mds mds_num_threads={N}
-</screen>
+      <screen>options mds mds_num_threads={N}</screen>
        <para>After startup, the minimum and maximum number of MDS thread counts
        can be set via the 
        <literal>{service}.thread_{min,max,started}</literal> tunable. To change
@@ -139,19 +137,20 @@ lctl {get,set}_param {service}.thread_{min,max,started}
        </para>
        <para>For details, see 
        <xref linkend="dbdoclet.50438271_87260" />.</para>
-      <para>At this time, no testing has been done to determine the optimal
-      number of MDS threads. The default value varies, based on server size, up
-      to a maximum of 32. The maximum number of threads (
-      <literal>MDS_MAX_THREADS</literal>) is 512.</para>
+      <para>The number of MDS service threads started depends on system size
+      and the load on the server, and has a default maximum of 64. The
+      maximum potential number of threads (<literal>MDS_MAX_THREADS</literal>)
+      is 1024.</para>
        <note>
-        <para>The OSS and MDS automatically start new service threads
-        dynamically, in response to server load within a factor of 4. The
-        default value is calculated the same way as before. Setting the 
-        <literal>_mu_threads</literal> module parameter disables automatic
-        thread creation behavior.</para>
+        <para>The OSS and MDS start two threads per service per CPT at mount
+       time, and dynamically increase the number of running service threads in
+       response to server load. Setting the <literal>*_num_threads</literal>
+       module parameter starts the specified number of threads for that
+       service immediately and disables automatic thread creation behavior.
+       </para>
        </note>
-      <para>Lustre software release 2.3 introduced new parameters to provide
-      more control to administrators.</para>
+      <para condition='l23'>Lustre software release 2.3 introduced new
+      parameters to provide more control to administrators.</para>
        <itemizedlist>
          <listitem>
            <para>
@@ -166,12 +165,6 @@ lctl {get,set}_param {service}.thread_{min,max,started}
            release 1.8.</para>
          </listitem>
        </itemizedlist>
-      <note>
-        <para>Default values for the thread counts are automatically selected.
-        The values are chosen to best exploit the number of CPUs present in the
-        system and to provide best overall performance for typical
-        workloads.</para>
-      </note>
      </section>
    </section>
    <section xml:id="dbdoclet.mdsbinding" condition='l23'>
@@ -182,10 +175,13 @@ lctl {get,set}_param {service}.thread_{min,max,started}
      </indexterm>Binding MDS Service Thread to CPU Partitions</title>
      <para>With the introduction of Node Affinity (
      <xref linkend="nodeaffdef" />) in Lustre software release 2.3, MDS threads
-    can be bound to particular CPU partitions (CPTs). Default values for
+    can be bound to particular CPU partitions (CPTs) to improve CPU cache
+    usage and memory locality.  Default values for CPT counts and CPU core
      bindings are selected automatically to provide good overall performance for
      a given CPU count. However, an administrator can deviate from these setting
-    if they choose.</para>
+    if they choose.  For details on specifying the mapping of CPU cores to
+    CPTs see <xref linkend="dbdoclet.libcfstuning"/>.
+    </para>
      <itemizedlist>
        <listitem>
          <para>
@@ -529,45 +525,85 @@ lnet large_router_buffers=8192
        be MAX.</para>
      </section>
    </section>
-  <section xml:id="dbdoclet.libcfstuning">
+  <section xml:id="dbdoclet.libcfstuning" condition='l23'>
      <title>
      <indexterm>
        <primary>tuning</primary>
        <secondary>libcfs</secondary>
      </indexterm>libcfs Tuning</title>
+    <para>Lustre software release 2.3 introduced binding service threads via
+    CPU Partition Tables (CPTs). This allows the system administrator to
+    fine-tune on which CPU cores the Lustre service threads are run, for both
+    OSS and MDS services, as well as on the client.
+    </para>
+    <para>CPTs are useful to reserve some cores on the OSS or MDS nodes for
+    system functions such as system monitoring, HA heartbeat, or similar
+    tasks.  On the client it may be useful to restrict Lustre RPC service
+    threads to a small subset of cores so that they do not interfere with
+    computation, or because these cores are directly attached to the network
+    interfaces.
+    </para>
      <para>By default, the Lustre software will automatically generate CPU
-    partitions (CPT) based on the number of CPUs in the system. The CPT number
-    will be 1 if the online CPU number is less than five.</para>
-    <para>The CPT number can be explicitly set on the libcfs module using 
-    <literal>cpu_npartitions=NUMBER</literal>. The value of 
-    <literal>cpu_npartitions</literal> must be an integer between 1 and the
-    number of online CPUs.</para>
+    partitions (CPT) based on the number of CPUs in the system.
+    The CPT count can be explicitly set on the libcfs module using 
+    <literal>cpu_npartitions=<replaceable>NUMBER</replaceable></literal>.
+    The value of <literal>cpu_npartitions</literal> must be an integer between
+    1 and the number of online CPUs.
+    </para>
+    <para condition='l29'>In Lustre 2.9 and later the default is to use
+    one CPT per NUMA node.  In earlier versions of Lustre, by default there
+    was a single CPT if the online CPU core count was four or fewer, and
+    additional CPTs would be created depending on the number of CPU cores,
+    typically with 4-8 cores per CPT.
+    </para>
      <tip>
-      <para>Setting CPT to 1 will disable most of the SMP Node Affinity
-      functionality.</para>
+      <para>Setting <literal>cpu_npartitions=1</literal> will disable most
+      of the SMP Node Affinity functionality.</para>
      </tip>
      <section>
        <title>CPU Partition String Patterns</title>
-      <para>CPU partitions can be described using string pattern notation. For
-      example:</para>
+      <para>CPU partitions can be described using string pattern notation.
+      If <literal>cpu_pattern=N</literal> is used, then there will be one
+      CPT for each NUMA node in the system, with each CPT mapping all of
+      the CPU cores for that NUMA node.
+      </para>
+      <para>It is also possible to explicitly specify the mapping between
+      CPU cores and CPTs, for example:</para>
        <itemizedlist>
          <listitem>
            <para>
-            <literal>cpu_pattern="0[0,2,4,6] 1[1,3,5,7]</literal>
+            <literal>cpu_pattern="0[2,4,6] 1[3,5,7]</literal>
            </para>
-          <para>Create two CPTs, CPT0 contains CPU[0, 2, 4, 6]. CPT1 contains
-          CPU[1,3,5,7].</para>
+          <para>Create two CPTs, CPT0 contains cores 2, 4, and 6, while CPT1
+         contains cores 3, 5, 7.  CPU cores 0 and 1 will not be used by Lustre
+         service threads, and could be used for node services such as
+         system monitoring, HA heartbeat threads, etc.  The binding of
+         non-Lustre services to those CPU cores may be done in userspace
+         using <literal>numactl(8)</literal> or other application-specific
+         methods, but is beyond the scope of this document.</para>
          </listitem>
          <listitem>
            <para>
              <literal>cpu_pattern="N 0[0-3] 1[4-7]</literal>
            </para>
-          <para>Create two CPTs, CPT0 contains all CPUs in NUMA node[0-3], CPT1
-          contains all CPUs in NUMA node [4-7].</para>
+          <para>Create two CPTs, with CPT0 containing all CPUs in NUMA
+         node[0-3], while CPT1 contains all CPUs in NUMA node [4-7].</para>
          </listitem>
        </itemizedlist>
-      <para>The current configuration of the CPU partition can be read from 
-      <literal>/proc/sys/lnet/cpu_partition_table</literal></para>
+      <para>The current configuration of the CPU partition can be read via 
+      <literal>lctl get_parm cpu_partition_table</literal>.  For example,
+      a simple 4-core system has a single CPT with all four CPU cores:
+      <screen>$ lctl get_param cpu_partition_table
+cpu_partition_table=0  : 0 1 2 3</screen>
+      while a larger NUMA system with four 12-core CPUs may have four CPTs:
+      <screen>$ lctl get_param cpu_partition_table
+cpu_partition_table=
+0      : 0 1 2 3 4 5 6 7 8 9 10 11
+1      : 12 13 14 15 16 17 18 19 20 21 22 23
+2      : 24 25 26 27 28 29 30 31 32 33 34 35
+3      : 36 37 38 39 40 41 42 43 44 45 46 47
+</screen>
+      </para>
      </section>
    </section>
    <section xml:id="dbdoclet.lndtuning">
author	Andreas Dilger <andreas.dilger@intel.com>
	Mon, 26 Sep 2016 20:23:15 +0000 (14:23 -0600)
committer	Andreas Dilger <andreas.dilger@intel.com>
	Thu, 13 Oct 2016 06:23:40 +0000 (06:23 +0000)