LUDOC-504 nodemap: servers must be in a trusted+admin group

[doc/manual.git] / LustreProc.xml
diff --git a/LustreProc.xml b/LustreProc.xml

index 64b3f9f..bc70d6f 100644 (file)
--- a/LustreProc.xml
+++ b/LustreProc.xml
@@ -1,14 +1,16 @@
  <?xml version='1.0' encoding='UTF-8'?>
-<chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0"
-  xml:lang="en-US" xml:id="lustreproc">
+<chapter xmlns="http://docbook.org/ns/docbook"
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustreproc">
    <title xml:id="lustreproc.title">Lustre Parameters</title>
-  <para>The <literal>/proc</literal> and <literal>/sys</literal> file systems
-  acts as an interface to internal data structures in the kernel. This chapter
-  describes parameters and tunables that are useful for optimizing and
-  monitoring aspects of a Lustre file system. It includes these sections:</para>
+  <para>There are many parameters for Lustre that can tune client and server
+  performance, change behavior of the system, and report statistics about
+  various subsystems.  This chapter describes the various parameters and
+  tunables that are useful for optimizing and monitoring aspects of a Lustre
+  file system.  It includes these sections:</para>
    <itemizedlist>
      <listitem>
-      <para><xref linkend="dbdoclet.50438271_83523"/></para>
+      <para><xref linkend="enabling_interpreting_debugging_logs"/></para>
        <para>.</para>
      </listitem>
    </itemizedlist>
@@ -23,9 +25,11 @@
      </para>
      <para>Typically, metrics are accessed via <literal>lctl get_param</literal>
      files and settings are changed by via <literal>lctl set_param</literal>.
-    While it is possible to access parameters in <literal>/proc</literal>
+    They allow getting and setting multiple parameters with a single command,
+    through the use of wildcards in one or more part of the parameter name.
+    While each of these parameters maps to files in <literal>/proc</literal>
      and <literal>/sys</literal> directly, the location of these parameters may
-    change between releases, so it is recommended to always use
+    change between Lustre releases, so it is recommended to always use
      <literal>lctl</literal> to access the parameters from userspace scripts.
      Some data is server-only, some data is client-only, and some data is
      exported from the client to the server and is thus duplicated in both
@@ -34,24 +38,29 @@
        <para>In the examples in this chapter, <literal>#</literal> indicates
        a command is entered as root.  Lustre servers are named according to the
        convention <literal><replaceable>fsname</replaceable>-<replaceable>MDT|OSTnumber</replaceable></literal>.
-        The standard UNIX wildcard designation (*) is used.</para>
+      The standard UNIX wildcard designation (*) is used to represent any
+      part of a single component of the parameter name, excluding
+      "<literal>.</literal>" and "<literal>/</literal>".
+      It is also possible to use brace <literal>{}</literal>expansion
+      to specify a list of parameter names efficiently.</para>
      </note>
      <para>Some examples are shown below:</para>
      <itemizedlist>
        <listitem>
-        <para> To obtain data from a Lustre client:</para>
-        <screen># lctl list_param osc.*
-osc.testfs-OST0000-osc-ffff881071d5cc00
-osc.testfs-OST0001-osc-ffff881071d5cc00
-osc.testfs-OST0002-osc-ffff881071d5cc00
-osc.testfs-OST0003-osc-ffff881071d5cc00
-osc.testfs-OST0004-osc-ffff881071d5cc00
-osc.testfs-OST0005-osc-ffff881071d5cc00
-osc.testfs-OST0006-osc-ffff881071d5cc00
-osc.testfs-OST0007-osc-ffff881071d5cc00
-osc.testfs-OST0008-osc-ffff881071d5cc00</screen>
+        <para> To list available OST targets on a Lustre client:</para>
+        <screen># lctl list_param -F osc.*
+osc.testfs-OST0000-osc-ffff881071d5cc00/
+osc.testfs-OST0001-osc-ffff881071d5cc00/
+osc.testfs-OST0002-osc-ffff881071d5cc00/
+osc.testfs-OST0003-osc-ffff881071d5cc00/
+osc.testfs-OST0004-osc-ffff881071d5cc00/
+osc.testfs-OST0005-osc-ffff881071d5cc00/
+osc.testfs-OST0006-osc-ffff881071d5cc00/
+osc.testfs-OST0007-osc-ffff881071d5cc00/
+osc.testfs-OST0008-osc-ffff881071d5cc00/</screen>
          <para>In this example, information about OST connections available
-        on a client is displayed (indicated by "osc").</para>
+        on a client is displayed (indicated by "osc").  Each of these
+        connections may have numerous sub-parameters as well.</para>
        </listitem>
      </itemizedlist>
      <itemizedlist>
@@ -71,12 +80,22 @@ osc.testfs-OST0000-osc-ffff881071d5cc00.rpc_stats</screen></para>
      </itemizedlist>
      <itemizedlist>
        <listitem>
+        <para> To see a specific subset of parameters, use braces, like:
+<screen># lctl list_param osc.*.{checksum,connect}*
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksum_type
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksums
+osc.testfs-OST0000-osc-ffff881071d5cc00.connect_flags
+</screen></para>
+      </listitem>
+    </itemizedlist>
+    <itemizedlist>
+      <listitem>
          <para> To view a specific file, use <literal>lctl get_param</literal>:
            <screen># lctl get_param osc.lustre-OST0000*.rpc_stats</screen></para>
        </listitem>
      </itemizedlist>
      <para>For more information about using <literal>lctl</literal>, see <xref
-        xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438194_51490"/>.</para>
+        xmlns:xlink="http://www.w3.org/1999/xlink" linkend="setting_param_with_lctl"/>.</para>
      <para>Data can also be viewed using the <literal>cat</literal> command
      with the full path to the file. The form of the <literal>cat</literal>
      command is similar to that of the <literal>lctl get_param</literal>
@@ -89,35 +108,18 @@ osc.testfs-OST0000-osc-ffff881071d5cc00.rpc_stats</screen></para>
      version and the Lustre version being used.  The <literal>lctl</literal>
      command insulates scripts from these changes and is preferred over direct
      file access, unless as part of a high-performance monitoring system.
-    In the <literal>cat</literal> command:</para>
-    <itemizedlist>
-      <listitem>
-        <para>Replace the dots in the path with slashes.</para>
-      </listitem>
-      <listitem>
-        <para>Prepend the path with the appropriate directory component:
-          <screen>/{proc,sys}/{fs,sys}/{lustre,lnet}</screen></para>
-      </listitem>
-    </itemizedlist>
-    <para>For example, an <literal>lctl get_param</literal> command may look like
-      this:<screen># lctl get_param osc.*.uuid
-osc.testfs-OST0000-osc-ffff881071d5cc00.uuid=594db456-0685-bd16-f59b-e72ee90e9819
-osc.testfs-OST0001-osc-ffff881071d5cc00.uuid=594db456-0685-bd16-f59b-e72ee90e9819
-...</screen></para>
-    <para>The equivalent <literal>cat</literal> command may look like this:
-     <screen># cat /proc/fs/lustre/osc/*/uuid
-594db456-0685-bd16-f59b-e72ee90e9819
-594db456-0685-bd16-f59b-e72ee90e9819
-...</screen></para>
-    <para>or like this:
-     <screen># cat /sys/fs/lustre/osc/*/uuid
-594db456-0685-bd16-f59b-e72ee90e9819
-594db456-0685-bd16-f59b-e72ee90e9819
-...</screen></para>
+    </para>
+    <note condition='l2c'><para>Starting in Lustre 2.12, there is
+    <literal>lctl get_param</literal> and <literal>lctl set_param</literal>
+    command can provide <emphasis>tab completion</emphasis> when using an
+    interactive shell with <literal>bash-completion</literal> installed.
+    This simplifies the use of <literal>get_param</literal> significantly,
+    since it provides an interactive list of available parameters.
+    </para></note>
      <para>The <literal>llstat</literal> utility can be used to monitor some
      Lustre file system I/O activity over a specified time period. For more
      details, see
-    <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438219_23232"/></para>
+    <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="config_llstat"/></para>
      <para>Some data is imported from attached clients and is available in a
      directory called <literal>exports</literal> located in the corresponding
      per-service directory on a Lustre server. For example:
@@ -373,7 +375,7 @@ testfs-MDT0000</screen></para>
          <listitem>
            <para><literal>brw_stats</literal> – Histogram data characterizing I/O requests to the
              OSTs. For more details, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
-              linkend="dbdoclet.50438271_55057"/>.</para>
+              linkend="monitor_ost_block_io_stream"/>.</para>
          </listitem>
          <listitem>
            <para><literal>rpc_stats</literal> – Histogram data showing information about RPCs made by
@@ -991,17 +993,19 @@ PID: 11429
            to that point in the table of calls (<literal>cum %</literal>). </para>
        </section>
      </section>
-    <section xml:id="dbdoclet.50438271_55057">
+    <section xml:id="monitor_ost_block_io_stream">
        <title><indexterm>
            <primary>proc</primary>
            <secondary>block I/O</secondary>
          </indexterm>Monitoring the OST Block I/O Stream</title>
-      <para>The <literal>brw_stats</literal> file in the <literal>obdfilter</literal> directory
-        contains histogram data showing statistics for number of I/O requests sent to the disk,
-        their size, and whether they are contiguous on the disk or not.</para>
+      <para>The <literal>brw_stats</literal> parameter file below the
+      <literal>osd-ldiskfs</literal> or <literal>osd-zfs</literal> directory
+        contains histogram data showing statistics for number of I/O requests
+        sent to the disk, their size, and whether they are contiguous on the
+        disk or not.</para>
        <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
-      <para>Enter on the OSS:</para>
-      <screen># lctl get_param obdfilter.testfs-OST0000.brw_stats 
+      <para>Enter on the OSS or MDS:</para>
+      <screen>oss# lctl get_param osd-*.*.brw_stats 
  snapshot_time:         1372775039.769045 (secs.usecs)
                             read      |      write
  pages per bulk r/w     rpcs  % cum % |  rpcs   % cum %
@@ -1071,10 +1075,11 @@ disk I/O size          ios   % cum % |   ios   % cum %
  512K:                    0   0 100   |    24   0   0
  1M:                      0   0 100   | 23142  99 100
  </screen>
-      <para>The tabular data is described in the table below. Each row in the table shows the number
-        of reads and writes occurring for the statistic (<literal>ios</literal>), the relative
-        percentage of total reads or writes (<literal>%</literal>), and the cumulative percentage to
-        that point in the table for the statistic (<literal>cum %</literal>). </para>
+      <para>The tabular data is described in the table below. Each row in the
+        table shows the number of reads and writes occurring for the statistic
+        (<literal>ios</literal>), the relative percentage of total reads or
+        writes (<literal>%</literal>), and the cumulative percentage to that
+        point in the table for the statistic (<literal>cum %</literal>). </para>
        <informaltable frame="all">
          <tgroup cols="2">
            <colspec colname="c1" colwidth="40*"/>
@@ -1292,9 +1297,9 @@ osc.myth-OST0001-osc-ffff8804296c2800.checksum_type
              </para>
            </listitem>
            <listitem>
-            <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_cache_mb</literal>
-              - Maximum amount of inactive data cached by the client.  The
-              default value is 3/4 of the client RAM.
+            <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_cached_mb</literal>
+              - Maximum amount of read+write data cached by the client.  The
+              default value is 1/2 of the client RAM.
              </para>
            </listitem>
          </itemizedlist>
@@ -1343,7 +1348,7 @@ write RPCs in flight: 0
          <itemizedlist>
            <listitem>
              <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_mb</literal>
-              - Controls the maximum amount of data readahead on a file.
+              - Controls the maximum amount of data readahead on all files.
                Files are read ahead in RPC-sized chunks (4 MiB, or the size of
                the <literal>read()</literal> call, if larger) after the second
                sequential read on a file descriptor. Random reads are done at
@@ -1362,7 +1367,7 @@ write RPCs in flight: 0
              <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_per_file_mb</literal>
                - Controls the maximum number of megabytes (MiB) of data that
                should be prefetched by the client when sequential reads are
-              detected on a file.  This is the per-file readahead limit and
+              detected on one file.  This is the per-file readahead limit and
                cannot be larger than <literal>max_read_ahead_mb</literal>.
              </para>
            </listitem>
@@ -1428,118 +1433,227 @@ write RPCs in flight: 0
        <title><indexterm>
            <primary>proc</primary>
            <secondary>read cache</secondary>
-        </indexterm>Tuning OSS Read Cache</title>
-      <para>The OSS read cache feature provides read-only caching of data on an OSS. This
-        functionality uses the Linux page cache to store the data and uses as much physical memory
+        </indexterm>Tuning Server Read Cache</title>
+      <para>The server read cache feature provides read-only caching of file
+        data on an OSS or MDS (for Data-on-MDT). This functionality uses the
+        Linux page cache to store the data and uses as much physical memory
          as is allocated.</para>
-      <para>OSS read cache improves Lustre file system performance in these situations:</para>
+      <para>The server read cache can improves Lustre file system performance
+        in these situations:</para>
        <itemizedlist>
          <listitem>
-          <para>Many clients are accessing the same data set (as in HPC applications or when
-            diskless clients boot from the Lustre file system).</para>
+          <para>Many clients are accessing the same data set (as in HPC
+            applications or when diskless clients boot from the Lustre file
+            system).</para>
          </listitem>
          <listitem>
-          <para>One client is storing data while another client is reading it (i.e., clients are
-            exchanging data via the OST).</para>
+          <para>One client is writing data while another client is reading
+            it (i.e., clients are exchanging data via the filesystem).</para>
          </listitem>
          <listitem>
            <para>A client has very limited caching of its own.</para>
          </listitem>
        </itemizedlist>
-      <para>OSS read cache offers these benefits:</para>
+      <para>The server read cache offers these benefits:</para>
        <itemizedlist>
          <listitem>
-          <para>Allows OSTs to cache read data more frequently.</para>
+          <para>Allows servers to cache read data more frequently.</para>
          </listitem>
          <listitem>
-          <para>Improves repeated reads to match network speeds instead of disk speeds.</para>
+          <para>Improves repeated reads to match network speeds instead of
+             storage speeds.</para>
          </listitem>
          <listitem>
-          <para>Provides the building blocks for OST write cache (small-write aggregation).</para>
+          <para>Provides the building blocks for server write cache
+            (small-write aggregation).</para>
          </listitem>
        </itemizedlist>
        <section remap="h4">
-        <title>Using OSS Read Cache</title>
-        <para>OSS read cache is implemented on the OSS, and does not require any special support on
-          the client side. Since OSS read cache uses the memory available in the Linux page cache,
-          the appropriate amount of memory for the cache should be determined based on I/O patterns;
-          if the data is mostly reads, then more cache is required than would be needed for mostly
-          writes.</para>
-        <para>OSS read cache is managed using the following tunables:</para>
+        <title>Using Server Read Cache</title>
+        <para>The server read cache is implemented on the OSS and MDS, and does
+          not require any special support on the client side. Since the server
+          read cache uses the memory available in the Linux page cache, the
+          appropriate amount of memory for the cache should be determined based
+          on I/O patterns.  If the data is mostly reads, then more cache is
+          beneficial on the server than would be needed for mostly writes.
+        </para>
+        <para>The server read cache is managed using the following tunables.
+          Many tunables are available for both <literal>osd-ldiskfs</literal>
+          and <literal>osd-zfs</literal>, but in some cases the implementation
+          of <literal>osd-zfs</literal> prevents their use.</para>
          <itemizedlist>
            <listitem>
-            <para><literal>read_cache_enable</literal> - Controls whether data read from disk during
-              a read request is kept in memory and available for later read requests for the same
-              data, without having to re-read it from disk. By default, read cache is enabled
-                (<literal>read_cache_enable=1</literal>).</para>
-            <para>When the OSS receives a read request from a client, it reads data from disk into
-              its memory and sends the data as a reply to the request. If read cache is enabled,
-              this data stays in memory after the request from the client has been fulfilled. When
-              subsequent read requests for the same data are received, the OSS skips reading data
-              from disk and the request is fulfilled from the cached data. The read cache is managed
-              by the Linux kernel globally across all OSTs on that OSS so that the least recently
-              used cache pages are dropped from memory when the amount of free memory is running
-              low.</para>
-            <para>If read cache is disabled (<literal>read_cache_enable=0</literal>), the OSS
-              discards the data after a read request from the client is serviced and, for subsequent
-              read requests, the OSS again reads the data from disk.</para>
-            <para>To disable read cache on all the OSTs of an OSS, run:</para>
-            <screen>root@oss1# lctl set_param obdfilter.*.read_cache_enable=0</screen>
-            <para>To re-enable read cache on one OST, run:</para>
-            <screen>root@oss1# lctl set_param obdfilter.{OST_name}.read_cache_enable=1</screen>
-            <para>To check if read cache is enabled on all OSTs on an OSS, run:</para>
-            <screen>root@oss1# lctl get_param obdfilter.*.read_cache_enable</screen>
+            <para><literal>read_cache_enable</literal> - High-level control of
+              whether data read from storage during a read request is kept in
+              memory and available for later read requests for the same data,
+              without having to re-read it from storage. By default, read cache
+              is enabled (<literal>read_cache_enable=1</literal>) for HDD OSDs
+              and automatically disabled for flash OSDs
+              (<literal>nonrotational=1</literal>).
+              The read cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+              </para>
+            <para>When the server receives a read request from a client,
+              it reads data from storage into its memory and sends the data
+              to the client. If read cache is enabled for the target,
+              and the RPC and object size also meet the other criterion below,
+              this data may stay in memory after the client request has
+              completed.  If later read requests for the same data are received,
+              if the data is still in cache the server skips reading it from
+              storage. The cache is managed by the Linux kernel globally
+              across all targets on that server so that the infrequently used
+              cache pages are dropped from memory when the free memory is
+              running low.</para>
+            <para>If read cache is disabled
+              (<literal>read_cache_enable=0</literal>), or the read or object
+              is large enough that it will not benefit from caching, the server
+              discards the data after the read request from the client is
+              completed. For subsequent read requests the server again reads
+              the data from storage.</para>
+            <para>To disable read cache on all targets of a server, run:</para>
+            <screen>
+              oss1# lctl set_param osd-*.*.read_cache_enable=0
+            </screen>
+            <para>To re-enable read cache on one target, run:</para>
+            <screen>
+              oss1# lctl set_param osd-*.{target_name}.read_cache_enable=1
+            </screen>
+            <para>To check if read cache is enabled on targets on a server, run:
+            </para>
+            <screen>
+              oss1# lctl get_param osd-*.*.read_cache_enable
+            </screen>
            </listitem>
            <listitem>
-            <para><literal>writethrough_cache_enable</literal> - Controls whether data sent to the
-              OSS as a write request is kept in the read cache and available for later reads, or if
-              it is discarded from cache when the write is completed. By default, the writethrough
-              cache is enabled (<literal>writethrough_cache_enable=1</literal>).</para>
-            <para>When the OSS receives write requests from a client, it receives data from the
-              client into its memory and writes the data to disk. If the writethrough cache is
-              enabled, this data stays in memory after the write request is completed, allowing the
-              OSS to skip reading this data from disk if a later read request, or partial-page write
-              request, for the same data is received.</para>
+            <para><literal>writethrough_cache_enable</literal> - High-level
+              control of whether data sent to the server as a write request is
+              kept in the read cache and available for later reads, or if it is
+              discarded when the write completes. By default, writethrough
+              cache is enabled (<literal>writethrough_cache_enable=1</literal>)
+              for HDD OSDs and automatically disabled for flash OSDs
+              (<literal>nonrotational=1</literal>).
+              The write cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+              </para>
+            <para>When the server receives write requests from a client, it
+              fetches data from the client into its memory and writes the data
+              to storage. If the writethrough cache is enabled for the target,
+              and the RPC and object size meet the other criterion below,
+              this data may stay in memory after the write request has
+              completed. If later read or partial-block write requests for this
+              same data are received, if the data is still in cache the server
+              skips reading it from storage.
+              </para>
              <para>If the writethrough cache is disabled
-                (<literal>writethrough_cache_enabled=0</literal>), the OSS discards the data after
-              the write request from the client is completed. For subsequent read requests, or
-              partial-page write requests, the OSS must re-read the data from disk.</para>
-            <para>Enabling writethrough cache is advisable if clients are doing small or unaligned
-              writes that would cause partial-page updates, or if the files written by one node are
-              immediately being accessed by other nodes. Some examples where enabling writethrough
-              cache might be useful include producer-consumer I/O models or shared-file writes with
-              a different node doing I/O not aligned on 4096-byte boundaries. </para>
-            <para>Disabling the writethrough cache is advisable when files are mostly written to the
-              file system but are not re-read within a short time period, or files are only written
-              and re-read by the same node, regardless of whether the I/O is aligned or not.</para>
-            <para>To disable the writethrough cache on all OSTs of an OSS, run:</para>
-            <screen>root@oss1# lctl set_param obdfilter.*.writethrough_cache_enable=0</screen>
+               (<literal>writethrough_cache_enabled=0</literal>), or the
+               write or object is large enough that it will not benefit from
+               caching, the server discards the data after the write request
+               from the client is completed. For subsequent read requests, or
+               partial-page write requests, the server must re-read the data
+               from storage.</para>
+            <para>Enabling writethrough cache is advisable if clients are doing
+              small or unaligned writes that would cause partial-page updates,
+              or if the files written by one node are immediately being read by
+              other nodes. Some examples where enabling writethrough cache
+              might be useful include producer-consumer I/O models or
+              shared-file writes that are not aligned on 4096-byte boundaries.
+            </para>
+            <para>Disabling the writethrough cache is advisable when files are
+              mostly written to the file system but are not re-read within a
+              short time period, or files are only written and re-read by the
+              same node, regardless of whether the I/O is aligned or not.</para>
+            <para>To disable writethrough cache on all targets on a server, run:
+            </para>
+            <screen>
+              oss1# lctl set_param osd-*.*.writethrough_cache_enable=0
+            </screen>
              <para>To re-enable the writethrough cache on one OST, run:</para>
-            <screen>root@oss1# lctl set_param obdfilter.{OST_name}.writethrough_cache_enable=1</screen>
+            <screen>
+              oss1# lctl set_param osd-*.{OST_name}.writethrough_cache_enable=1
+            </screen>
              <para>To check if the writethrough cache is enabled, run:</para>
-            <screen>root@oss1# lctl get_param obdfilter.*.writethrough_cache_enable</screen>
+            <screen>
+              oss1# lctl get_param osd-*.*.writethrough_cache_enable
+            </screen>
            </listitem>
            <listitem>
-            <para><literal>readcache_max_filesize</literal> - Controls the maximum size of a file
-              that both the read cache and writethrough cache will try to keep in memory. Files
-              larger than <literal>readcache_max_filesize</literal> will not be kept in cache for
-              either reads or writes.</para>
-            <para>Setting this tunable can be useful for workloads where relatively small files are
-              repeatedly accessed by many clients, such as job startup files, executables, log
-              files, etc., but large files are read or written only once. By not putting the larger
-              files into the cache, it is much more likely that more of the smaller files will
-              remain in cache for a longer time.</para>
-            <para>When setting <literal>readcache_max_filesize</literal>, the input value can be
-              specified in bytes, or can have a suffix to indicate other binary units such as
-                <literal>K</literal> (kilobytes), <literal>M</literal> (megabytes),
-                <literal>G</literal> (gigabytes), <literal>T</literal> (terabytes), or
-                <literal>P</literal> (petabytes).</para>
-            <para>To limit the maximum cached file size to 32 MB on all OSTs of an OSS, run:</para>
-            <screen>root@oss1# lctl set_param obdfilter.*.readcache_max_filesize=32M</screen>
-            <para>To disable the maximum cached file size on an OST, run:</para>
-            <screen>root@oss1# lctl set_param obdfilter.{OST_name}.readcache_max_filesize=-1</screen>
-            <para>To check the current maximum cached file size on all OSTs of an OSS, run:</para>
-            <screen>root@oss1# lctl get_param obdfilter.*.readcache_max_filesize</screen>
+            <para><literal>readcache_max_filesize</literal> - Controls the
+              maximum size of an object that both the read cache and
+              writethrough cache will try to keep in memory. Objects larger
+              than <literal>readcache_max_filesize</literal> will not be kept
+              in cache for either reads or writes regardless of the
+              <literal>read_cache_enable</literal> or
+              <literal>writethrough_cache_enable</literal> settings.</para>
+            <para>Setting this tunable can be useful for workloads where
+              relatively small objects are repeatedly accessed by many clients,
+              such as job startup objects, executables, log objects, etc., but
+              large objects are read or written only once. By not putting the
+              larger objects into the cache, it is much more likely that more
+              of the smaller objects will remain in cache for a longer time.
+            </para>
+            <para>When setting <literal>readcache_max_filesize</literal>,
+              the input value can be specified in bytes, or can have a suffix
+              to indicate other binary units such as
+                <literal>K</literal> (kibibytes),
+                <literal>M</literal> (mebibytes),
+                <literal>G</literal> (gibibytes),
+                <literal>T</literal> (tebibytes), or
+                <literal>P</literal> (pebibytes).</para>
+            <para>
+              To limit the maximum cached object size to 64 MiB on all OSTs of
+              a server, run:
+            </para>
+            <screen>
+              oss1# lctl set_param osd-*.*.readcache_max_filesize=64M
+            </screen>
+            <para>To disable the maximum cached object size on all targets, run:
+            </para>
+            <screen>
+              oss1# lctl set_param osd-*.*.readcache_max_filesize=-1
+            </screen>
+            <para>
+              To check the current maximum cached object size on all targets of
+              a server, run:
+            </para>
+            <screen>
+              oss1# lctl get_param osd-*.*.readcache_max_filesize
+            </screen>
+          </listitem>
+          <listitem>
+            <para><literal>readcache_max_io_mb</literal> - Controls the maximum
+              size of a single read IO that will be cached in memory. Reads
+              larger than <literal>readcache_max_io_mb</literal> will be read
+              directly from storage and bypass the page cache completely.
+              This avoids significant CPU overhead at high IO rates.
+              The read cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+            </para>
+            <para>When setting <literal>readcache_max_io_mb</literal>, the
+              input value can be specified in mebibytes, or can have a suffix
+              to indicate other binary units such as
+                <literal>K</literal> (kibibytes),
+                <literal>M</literal> (mebibytes),
+                <literal>G</literal> (gibibytes),
+                <literal>T</literal> (tebibytes), or
+                <literal>P</literal> (pebibytes).</para>
+          </listitem>
+          <listitem>
+            <para><literal>writethrough_max_io_mb</literal> - Controls the
+              maximum size of a single writes IO that will be cached in memory.
+              Writes larger than <literal>writethrough_max_io_mb</literal> will
+              be written directly to storage and bypass the page cache entirely.
+              This avoids significant CPU overhead at high IO rates.
+              The write cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+            </para>
+            <para>When setting <literal>writethrough_max_io_mb</literal>, the
+              input value can be specified in mebibytes, or can have a suffix
+              to indicate other binary units such as
+                <literal>K</literal> (kibibytes),
+                <literal>M</literal> (mebibytes),
+                <literal>G</literal> (gibibytes),
+                <literal>T</literal> (tebibytes), or
+                <literal>P</literal> (pebibytes).</para>
            </listitem>
          </itemizedlist>
        </section>
@@ -1606,7 +1720,7 @@ obdfilter.lol-OST0001.sync_journal=0</screen>
        <screen>$ lctl get_param obdfilter.*.sync_on_lock_cancel
  obdfilter.lol-OST0001.sync_on_lock_cancel=never</screen>
      </section>
-    <section xml:id="dbdoclet.TuningModRPCs" condition='l28'>
+    <section xml:id="TuningModRPCs" condition='l28'>
        <title>
          <indexterm>
            <primary>proc</primary>
@@ -2088,7 +2202,7 @@ nid                refs   state  max  rtr  min   tx    min   queue
                        <literal>rtr </literal></para>
                    </entry>
                    <entry>
-                    <para>Number of routing buffer credits.</para>
+                    <para>Number of available routing buffer credits.</para>
                    </entry>
                  </row>
                  <row>
@@ -2106,7 +2220,7 @@ nid                refs   state  max  rtr  min   tx    min   queue
                        <literal>tx </literal></para>
                    </entry>
                    <entry>
-                    <para>Number of send credits.</para>
+                    <para>Number of available send credits.</para>
                    </entry>
                  </row>
                  <row>
@@ -2130,27 +2244,41 @@ nid                refs   state  max  rtr  min   tx    min   queue
                </tbody>
              </tgroup>
            </informaltable>
-          <para>Credits are initialized to allow a certain number of operations (in the example
-            above the table, eight as shown in the <literal>max</literal> column. LNet keeps track
-            of the minimum number of credits ever seen over time showing the peak congestion that
-            has occurred during the time monitored. Fewer available credits indicates a more
-            congested resource. </para>
-          <para>The number of credits currently in flight (number of transmit credits) is shown in
-            the <literal>tx</literal> column. The maximum number of send credits available is shown
-            in the <literal>max</literal> column and never changes. The number of router buffers
-            available for consumption by a peer is shown in the <literal>rtr</literal>
-            column.</para>
-          <para>Therefore, <literal>rtr</literal> – <literal>tx</literal> is the number of transmits
-            in flight. Typically, <literal>rtr == max</literal>, although a configuration can be set
-            such that <literal>max >= rtr</literal>. The ratio of routing buffer credits to send
-            credits (<literal>rtr/tx</literal>) that is less than <literal>max</literal> indicates
-            operations are in progress. If the ratio <literal>rtr/tx</literal> is greater than
-              <literal>max</literal>, operations are blocking.</para>
-          <para>LNet also limits concurrent sends and number of router buffers allocated to a single
-            peer so that no peer can occupy all these resources.</para>
+          <para>Credits are initialized to allow a certain number of operations
+            (in the example above the table, eight as shown in the
+            <literal>max</literal> column. LNet keeps track of the minimum
+            number of credits ever seen over time showing the peak congestion
+            that has occurred during the time monitored. Fewer available credits
+            indicates a more congested resource. </para>
+          <para>The number of credits currently available is shown in the
+            <literal>tx</literal> column. The maximum number of send credits is
+            shown in the <literal>max</literal> column and never changes. The
+            number of currently active transmits can be derived by
+            <literal>(max - tx)</literal>, as long as
+            <literal>tx</literal> is greater than or equal to 0. Once
+            <literal>tx</literal> is less than 0, it indicates the number of
+            transmits on that peer which have been queued for lack of credits.
+          </para>
+          <para>The number of router buffer credits available for consumption
+            by a peer is shown in <literal>rtr</literal> column. The number of
+            routing credits can be configured separately at the LND level or at
+            the LNet level by using the <literal>peer_buffer_credits</literal>
+            module parameter for the appropriate module. If the routing credits
+            is not set explicitly, it'll default to the maximum transmit credits
+            defined by <literal>peer_credits</literal> module parameter.
+            Whenever a gateway routes a message from a peer, it decrements the
+            number of available routing credits for that peer. If that value
+            goes to zero, then messages will be queued. Negative values show the
+            number of queued message waiting to be routed. The number of
+            messages which are currently being routed from a peer can be derived
+            by <literal>(max_rtr_credits - rtr)</literal>.</para>
+          <para>LNet also limits concurrent sends and number of router buffers
+            allocated to a single peer so that no peer can occupy all resources.
+          </para>
          </listitem>
          <listitem>
-          <para><literal>nis</literal> - Shows the current queue health on this node.</para>
+          <para><literal>nis</literal> - Shows current queue health on the node.
+          </para>
            <para>Example:</para>
            <screen># lctl get_param nis
  nid                    refs   peer    max   tx    min
@@ -2247,7 +2375,7 @@ nid                    refs   peer    max   tx    min
          </listitem>
        </itemizedlist></para>
    </section>
-  <section remap="h3" xml:id="dbdoclet.balancing_free_space">
+  <section remap="h3" xml:id="balancing_free_space">
      <title><indexterm>
          <primary>proc</primary>
          <secondary>free space</secondary>
@@ -2288,8 +2416,9 @@ nid                    refs   peer    max   tx    min
            space is more than this. The default is 0.2% of total OST size.</para>
        </listitem>
      </itemizedlist>
-    <para>For more information about monitoring and managing free space, see <xref
-        xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438209_10424"/>.</para>
+    <para>For more information about monitoring and managing free space, see
+    <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+          linkend="file_striping.managing_free_space"/>.</para>
    </section>
    <section remap="h3">
      <title><indexterm>
@@ -2309,19 +2438,19 @@ nid                    refs   peer    max   tx    min
      <itemizedlist>
        <listitem>
          <para>To enable automatic LRU sizing, set the
-       <literal>lru_size</literal> parameter to 0. In this case, the
-       <literal>lru_size</literal> parameter shows the current number of locks
+        <literal>lru_size</literal> parameter to 0. In this case, the
+        <literal>lru_size</literal> parameter shows the current number of locks
          being used on the client. Dynamic LRU resizing is enabled by default.
-       </para>
+        </para>
        </listitem>
        <listitem>
          <para>To specify a maximum number of locks, set the
-       <literal>lru_size</literal> parameter to a value other than zero.
-       A good default value for compute nodes is around
-       <literal>100 * <replaceable>num_cpus</replaceable></literal>.
+        <literal>lru_size</literal> parameter to a value other than zero.
+        A good default value for compute nodes is around
+        <literal>100 * <replaceable>num_cpus</replaceable></literal>.
          It is recommended that you only set <literal>lru_size</literal>
-       to be signifivantly larger on a few login nodes where multiple
-       users access the file system interactively.</para>
+        to be signifivantly larger on a few login nodes where multiple
+        users access the file system interactively.</para>
        </listitem>
      </itemizedlist>
      <para>To clear the LRU on a single client, and, as a result, flush client
@@ -2334,7 +2463,7 @@ nid                    refs   peer    max   tx    min
      <note>
        <para>The <literal>lru_size</literal> parameter can only be set
          temporarily using <literal>lctl set_param</literal>, it cannot be set
-       permanently.</para>
+        permanently.</para>
      </note>
      <para>To disable dynamic LRU resizing on the clients, run for example:
      </para>
@@ -2362,7 +2491,7 @@ nid                    refs   peer    max   tx    min
  ldlm.namespaces.myth-MDT0000-mdc-ffff8804296c2800.lru_max_age=900000
      </screen>
    </section>
-  <section xml:id="dbdoclet.50438271_87260">
+  <section xml:id="tuning_setting_thread_count">
      <title><indexterm>
          <primary>proc</primary>
          <secondary>thread counts</secondary>
@@ -2460,15 +2589,18 @@ ldlm.namespaces.myth-MDT0000-mdc-ffff8804296c2800.lru_max_age=900000
          <screen># lctl set_param <replaceable>service</replaceable>.threads_<replaceable>min|max|started=num</replaceable> </screen>
          </listitem>
        <listitem>
-        <para>To permanently set this tunable, run:</para>
-       <screen># lctl conf_param <replaceable>obdname|fsname.obdtype</replaceable>.threads_<replaceable>min|max|started</replaceable> </screen>
-       <para condition='l25'>For version 2.5 or later, run:
-               <screen># lctl set_param -P <replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></screen></para>
+        <para>To permanently set this tunable, run the following command on
+        the MGS:
+        <screen>mgs# lctl set_param -P <replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></screen></para>
+        <para condition='l25'>For Lustre 2.5 or earlier, run:
+        <screen>mgs# lctl conf_param <replaceable>obdname|fsname.obdtype</replaceable>.threads_<replaceable>min|max|started</replaceable></screen>
+        </para>
        </listitem>
      </itemizedlist>
-      <para>The following examples show how to set thread counts and get the number of running threads
-        for the service <literal>ost_io</literal>  using the tunable
-       <literal><replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></literal>.</para>
+      <para>The following examples show how to set thread counts and get the
+        number of running threads for the service <literal>ost_io</literal>
+        using the tunable
+        <literal><replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></literal>.</para>
      <itemizedlist>
        <listitem>
          <para>To get the number of running threads, run:</para>
@@ -2507,7 +2639,7 @@ ost.OSS.ost_io.threads_max=256</screen>
      </note>
      <para>See also <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="lustretuning"/></para>
    </section>
-  <section xml:id="dbdoclet.50438271_83523">
+  <section xml:id="enabling_interpreting_debugging_logs">
      <title><indexterm>
          <primary>proc</primary>
          <secondary>debug</secondary>
@@ -2608,8 +2740,8 @@ debug=neterror warning error emerg console</screen>
      <section>
        <title>Interpreting OST Statistics</title>
        <note>
-        <para>See also <xref linkend="dbdoclet.50438219_84890"/> (<literal>llobdstat</literal>) and
-            <xref linkend="dbdoclet.50438273_80593"/> (<literal>collectl</literal>).</para>
+        <para>See also
+            <xref linkend="collectl"/> (<literal>collectl</literal>).</para>
        </note>
        <para>OST <literal>stats</literal> files can be used to provide statistics showing activity
          for each OST. For example:</para>
@@ -2868,8 +3000,8 @@ ost_write    21   2    59   [bytes] 7648424 15019  332725.08 910694 180397.87
      <section>
        <title>Interpreting MDT Statistics</title>
        <note>
-        <para>See also <xref linkend="dbdoclet.50438219_84890"/> (<literal>llobdstat</literal>) and
-            <xref linkend="dbdoclet.50438273_80593"/> (<literal>collectl</literal>).</para>
+        <para>See also
+            <xref linkend="collectl"/> (<literal>collectl</literal>).</para>
        </note>
        <para>MDT <literal>stats</literal> files can be used to track MDT
        statistics for the MDS. The example below shows sample output from an
@@ -2890,3 +3022,6 @@ notify                          16 samples [reqs]</screen>
      </section>
    </section>
  </chapter>
+<!--
+  vim:expandtab:shiftwidth=2:tabstop=8:
+  -->