Whamcloud - gitweb
LUDOC-11 misc: update URLs from http to https
[doc/manual.git] / LustreProc.xml
index 8895f6d..327a428 100644 (file)
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter version="5.0" xml:lang="en-US" xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" xml:id='lustreproc'>
-  <info>
-    <title xml:id='lustreproc.title'>LustreProc</title>
-  </info>
-  <para><anchor xml:id="dbdoclet.50438271_pgfId-1301083" xreflabel=""/>The /proc file system acts as an interface to internal data structures in the kernel. The /proc variables can be used to control aspects of Lustre performance and provide information.</para>
-  <para><anchor xml:id="dbdoclet.50438271_pgfId-1290340" xreflabel=""/>This chapter describes Lustre /proc entries and includes the following sections:</para>
-  <itemizedlist><listitem>
-      <para><xref linkend="dbdoclet.50438271_90999"/></para>
-    </listitem>
-
-<listitem>
-      <para><xref linkend="dbdoclet.50438271_78950"/></para>
-    </listitem>
-
-<listitem>
+<?xml version='1.0' encoding='UTF-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustreproc">
+  <title xml:id="lustreproc.title">Lustre Parameters</title>
+  <para>There are many parameters for Lustre that can tune client and server
+  performance, change behavior of the system, and report statistics about
+  various subsystems.  This chapter describes the various parameters and
+  tunables that are useful for optimizing and monitoring aspects of a Lustre
+  file system.  It includes these sections:</para>
+  <itemizedlist>
+    <listitem>
       <para><xref linkend="dbdoclet.50438271_83523"/></para>
+      <para>.</para>
     </listitem>
+  </itemizedlist>
+  <section>
+    <title>Introduction to Lustre Parameters</title>
+    <para>Lustre parameters and statistics files provide an interface to
+    internal data structures in the kernel that enables monitoring and
+    tuning of many aspects of Lustre file system and application performance.
+    These data structures include settings and metrics for components such
+    as memory, networking, file systems, and kernel housekeeping routines,
+    which are available throughout the hierarchical file layout.
+    </para>
+    <para>Typically, metrics are accessed via <literal>lctl get_param</literal>
+    files and settings are changed by via <literal>lctl set_param</literal>.
+    They allow getting and setting multiple parameters with a single command,
+    through the use of wildcards in one or more part of the parameter name.
+    While each of these parameters maps to files in <literal>/proc</literal>
+    and <literal>/sys</literal> directly, the location of these parameters may
+    change between Lustre releases, so it is recommended to always use
+    <literal>lctl</literal> to access the parameters from userspace scripts.
+    Some data is server-only, some data is client-only, and some data is
+    exported from the client to the server and is thus duplicated in both
+    locations.</para>
+    <note>
+      <para>In the examples in this chapter, <literal>#</literal> indicates
+      a command is entered as root.  Lustre servers are named according to the
+      convention <literal><replaceable>fsname</replaceable>-<replaceable>MDT|OSTnumber</replaceable></literal>.
+      The standard UNIX wildcard designation (*) is used to represent any
+      part of a single component of the parameter name, excluding
+      "<literal>.</literal>" and "<literal>/</literal>".
+      It is also possible to use brace <literal>{}</literal>expansion
+      to specify a list of parameter names efficiently.</para>
+    </note>
+    <para>Some examples are shown below:</para>
+    <itemizedlist>
+      <listitem>
+        <para> To list available OST targets on a Lustre client:</para>
+        <screen># lctl list_param -F osc.*
+osc.testfs-OST0000-osc-ffff881071d5cc00/
+osc.testfs-OST0001-osc-ffff881071d5cc00/
+osc.testfs-OST0002-osc-ffff881071d5cc00/
+osc.testfs-OST0003-osc-ffff881071d5cc00/
+osc.testfs-OST0004-osc-ffff881071d5cc00/
+osc.testfs-OST0005-osc-ffff881071d5cc00/
+osc.testfs-OST0006-osc-ffff881071d5cc00/
+osc.testfs-OST0007-osc-ffff881071d5cc00/
+osc.testfs-OST0008-osc-ffff881071d5cc00/</screen>
+        <para>In this example, information about OST connections available
+        on a client is displayed (indicated by "osc").  Each of these
+        connections may have numerous sub-parameters as well.</para>
+      </listitem>
+    </itemizedlist>
+    <itemizedlist>
+      <listitem>
+        <para> To see multiple levels of parameters, use multiple
+          wildcards:<screen># lctl list_param osc.*.*
+osc.testfs-OST0000-osc-ffff881071d5cc00.active
+osc.testfs-OST0000-osc-ffff881071d5cc00.blocksize
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksum_type
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksums
+osc.testfs-OST0000-osc-ffff881071d5cc00.connect_flags
+osc.testfs-OST0000-osc-ffff881071d5cc00.contention_seconds
+osc.testfs-OST0000-osc-ffff881071d5cc00.cur_dirty_bytes
+...
+osc.testfs-OST0000-osc-ffff881071d5cc00.rpc_stats</screen></para>
+      </listitem>
+    </itemizedlist>
+    <itemizedlist>
+      <listitem>
+        <para> To see a specific subset of parameters, use braces, like:
+<screen># lctl list_param osc.*.{checksum,connect}*
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksum_type
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksums
+osc.testfs-OST0000-osc-ffff881071d5cc00.connect_flags
+</screen></para>
+      </listitem>
+    </itemizedlist>
+    <itemizedlist>
+      <listitem>
+        <para> To view a specific file, use <literal>lctl get_param</literal>:
+          <screen># lctl get_param osc.lustre-OST0000*.rpc_stats</screen></para>
+      </listitem>
+    </itemizedlist>
+    <para>For more information about using <literal>lctl</literal>, see <xref
+        xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438194_51490"/>.</para>
+    <para>Data can also be viewed using the <literal>cat</literal> command
+    with the full path to the file. The form of the <literal>cat</literal>
+    command is similar to that of the <literal>lctl get_param</literal>
+    command with some differences.  Unfortunately, as the Linux kernel has
+    changed over the years, the location of statistics and parameter files
+    has also changed, which means that the Lustre parameter files may be
+    located in either the <literal>/proc</literal> directory, in the
+    <literal>/sys</literal> directory, and/or in the
+    <literal>/sys/kernel/debug</literal> directory, depending on the kernel
+    version and the Lustre version being used.  The <literal>lctl</literal>
+    command insulates scripts from these changes and is preferred over direct
+    file access, unless as part of a high-performance monitoring system.
+    </para>
+    <note condition='l2c'><para>Starting in Lustre 2.12, there is
+    <literal>lctl get_param</literal> and <literal>lctl set_param</literal>
+    command can provide <emphasis>tab completion</emphasis> when using an
+    interactive shell with <literal>bash-completion</literal> installed.
+    This simplifies the use of <literal>get_param</literal> significantly,
+    since it provides an interactive list of available parameters.
+    </para></note>
+    <para>The <literal>llstat</literal> utility can be used to monitor some
+    Lustre file system I/O activity over a specified time period. For more
+    details, see
+    <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438219_23232"/></para>
+    <para>Some data is imported from attached clients and is available in a
+    directory called <literal>exports</literal> located in the corresponding
+    per-service directory on a Lustre server. For example:
+    <screen>oss:/root# lctl list_param obdfilter.testfs-OST0000.exports.*
+# hash ldlm_stats stats uuid</screen></para>
+    <section remap="h3">
+      <title>Identifying Lustre File Systems and Servers</title>
+      <para>Several parameter files on the MGS list existing
+      Lustre file systems and file system servers. The examples below are for
+      a Lustre file system called
+          <literal>testfs</literal> with one MDT and three OSTs.</para>
+      <itemizedlist>
+        <listitem>
+          <para> To view all known Lustre file systems, enter:</para>
+          <screen>mgs# lctl get_param mgs.*.filesystems
+testfs</screen>
+        </listitem>
+        <listitem>
+          <para> To view the names of the servers in a file system in which least one server is
+            running,
+            enter:<screen>lctl get_param mgs.*.live.<replaceable>&lt;filesystem name></replaceable></screen></para>
+          <para>For example:</para>
+          <screen>mgs# lctl get_param mgs.*.live.testfs
+fsname: testfs
+flags: 0x20     gen: 45
+testfs-MDT0000
+testfs-OST0000
+testfs-OST0001
+testfs-OST0002 
 
-</itemizedlist>
+Secure RPC Config Rules: 
 
-    <section xml:id="dbdoclet.50438271_90999">
-      <title>31.1 Proc Entries for Lustre</title>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290360" xreflabel=""/>This section describes /proc entries for Lustre.</para>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290361" xreflabel=""/>31.1.1 Locating Lustre <anchor xml:id="dbdoclet.50438271_marker-1296151" xreflabel=""/>File Systems and Servers</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290362" xreflabel=""/>Use the proc files on the MGS to locate the following:</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1290363" xreflabel=""/> All known file systems</para>
-          </listitem>
+imperative_recovery_state:
+    state: startup
+    nonir_clients: 0
+    nidtbl_version: 6
+    notify_duration_total: 0.001000
+    notify_duation_max:  0.001000
+    notify_count: 4</screen>
+        </listitem>
+        <listitem>
+          <para>To list all configured devices on the local node, enter:</para>
+          <screen># lctl device_list
+0 UP mgs MGS MGS 11
+1 UP mgc MGC192.168.10.34@tcp 1f45bb57-d9be-2ddb-c0b0-5431a49226705
+2 UP mdt MDS MDS_uuid 3
+3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
+4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 7
+5 UP osc testfs-OST0000-osc testfs-mdtlov_UUID 5
+6 UP osc testfs-OST0001-osc testfs-mdtlov_UUID 5
+7 UP lov testfs-clilov-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa04
+8 UP mdc testfs-MDT0000-mdc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05
+9 UP osc testfs-OST0000-osc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05
+10 UP osc testfs-OST0001-osc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05</screen>
+          <para>The information provided on each line includes:</para>
+          <para> -  Device number</para>
+          <para> - Device status (UP, INactive, or STopping) </para>
+          <para> -  Device name</para>
+          <para> -  Device UUID</para>
+          <para> -  Reference count (how many users this device has)</para>
+        </listitem>
+        <listitem>
+          <para>To display the name of any server, view the device
+            label:<screen>mds# e2label /dev/sda
+testfs-MDT0000</screen></para>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </section>
+  <section>
+    <title>Tuning Multi-Block Allocation (mballoc)</title>
+    <para>Capabilities supported by <literal>mballoc</literal> include:</para>
+    <itemizedlist>
+      <listitem>
+        <para> Pre-allocation for single files to help to reduce fragmentation.</para>
+      </listitem>
+      <listitem>
+        <para> Pre-allocation for a group of files to enable packing of small files into large,
+          contiguous chunks.</para>
+      </listitem>
+      <listitem>
+        <para> Stream allocation to help decrease the seek rate.</para>
+      </listitem>
+    </itemizedlist>
+    <para>The following <literal>mballoc</literal> tunables are available:</para>
+    <informaltable frame="all">
+      <tgroup cols="2">
+        <colspec colname="c1" colwidth="30*"/>
+        <colspec colname="c2" colwidth="70*"/>
+        <thead>
+          <row>
+            <entry>
+              <para><emphasis role="bold">Field</emphasis></para>
+            </entry>
+            <entry>
+              <para><emphasis role="bold">Description</emphasis></para>
+            </entry>
+          </row>
+        </thead>
+        <tbody>
+          <row>
+            <entry>
+              <para>
+                <literal>mb_max_to_scan</literal></para>
+            </entry>
+            <entry>
+              <para>Maximum number of free chunks that <literal>mballoc</literal> finds before a
+                final decision to avoid a livelock situation.</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>
+                <literal>mb_min_to_scan</literal></para>
+            </entry>
+            <entry>
+              <para>Minimum number of free chunks that <literal>mballoc</literal> searches before
+                picking the best chunk for allocation. This is useful for small requests to reduce
+                fragmentation of big free chunks.</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>
+                <literal>mb_order2_req</literal></para>
+            </entry>
+            <entry>
+              <para>For requests equal to 2^N, where N &gt;= <literal>mb_order2_req</literal>, a
+                fast search is done using a base 2 buddy allocation service.</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>
+                <literal>mb_small_req</literal></para>
+            </entry>
+            <entry morerows="1">
+              <para><literal>mb_small_req</literal> - Defines (in MB) the upper bound of "small
+                requests".</para>
+              <para><literal>mb_large_req</literal> - Defines (in MB) the lower bound of "large
+                requests".</para>
+              <para>Requests are handled differently based on size:<itemizedlist>
+                  <listitem>
+                    <para>&lt; <literal>mb_small_req</literal> - Requests are packed together to
+                      form large, aggregated requests.</para>
+                  </listitem>
+                  <listitem>
+                    <para>> <literal>mb_small_req</literal> and &lt; <literal>mb_large_req</literal>
+                      - Requests are primarily allocated linearly.</para>
+                  </listitem>
+                  <listitem>
+                    <para>> <literal>mb_large_req</literal> - Requests are allocated since hard disk
+                      seek time is less of a concern in this case.</para>
+                  </listitem>
+                </itemizedlist></para>
+              <para>In general, small requests are combined to create larger requests, which are
+                then placed close to one another to minimize the number of seeks required to access
+                the data.</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>
+                <literal>mb_large_req</literal></para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>
+                <literal>prealloc_table</literal></para>
+            </entry>
+            <entry>
+              <para>A table of values used to preallocate space when a new request is received. By
+                default, the table looks like
+                this:<screen>prealloc_table
+4 8 16 32 64 128 256 512 1024 2048 </screen></para>
+              <para>When a new request is received, space is preallocated at the next higher
+                increment specified in the table. For example, for requests of less than 4 file
+                system blocks, 4 blocks of space are preallocated; for requests between 4 and 8, 8
+                blocks are preallocated; and so forth</para>
+              <para>Although customized values can be entered in the table, the performance of
+                general usage file systems will not typically be improved by modifying the table (in
+                fact, in ext4 systems, the table values are fixed).  However, for some specialized
+                workloads, tuning the <literal>prealloc_table</literal> values may result in smarter
+                preallocation decisions. </para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>
+                <literal>mb_group_prealloc</literal></para>
+            </entry>
+            <entry>
+              <para>The amount of space (in kilobytes) preallocated for groups of small
+                requests.</para>
+            </entry>
+          </row>
+        </tbody>
+      </tgroup>
+    </informaltable>
+    <para>Buddy group cache information found in
+          <literal>/sys/fs/ldiskfs/<replaceable>disk_device</replaceable>/mb_groups</literal> may
+      be useful for assessing on-disk fragmentation. For
+      example:<screen>cat /proc/fs/ldiskfs/loop0/mb_groups 
+#group: free free frags first pa [ 2^0 2^1 2^2 2^3 2^4 2^5 2^6 2^7 2^8 2^9 
+     2^10 2^11 2^12 2^13] 
+#0    : 2936 2936 1     42    0  [ 0   0   0   1   1   1   1   2   0   1 
+     2    0    0    0   ]</screen></para>
+    <para>In this example, the columns show:<itemizedlist>
+        <listitem>
+          <para>#group number</para>
+        </listitem>
+        <listitem>
+          <para>Available blocks in the group</para>
+        </listitem>
+        <listitem>
+          <para>Blocks free on a disk</para>
+        </listitem>
+        <listitem>
+          <para>Number of free fragments</para>
+        </listitem>
+        <listitem>
+          <para>First free block in the group</para>
+        </listitem>
+        <listitem>
+          <para>Number of preallocated chunks (not blocks)</para>
+        </listitem>
+        <listitem>
+          <para>A series of available chunks of different sizes</para>
+        </listitem>
+      </itemizedlist></para>
+  </section>
+  <section>
+    <title>Monitoring Lustre File System I/O</title>
+    <para>A number of system utilities are provided to enable collection of data related to I/O
+      activity in a Lustre file system. In general, the data collected describes:</para>
+    <itemizedlist>
+      <listitem>
+        <para> Data transfer rates and throughput of inputs and outputs external to the Lustre file
+          system, such as network requests or disk I/O operations performed</para>
+      </listitem>
+      <listitem>
+        <para> Data about the throughput or transfer rates of internal Lustre file system data, such
+          as locks or allocations. </para>
+      </listitem>
+    </itemizedlist>
+    <note>
+      <para>It is highly recommended that you complete baseline testing for your Lustre file system
+        to determine normal I/O activity for your hardware, network, and system workloads. Baseline
+        data will allow you to easily determine when performance becomes degraded in your system.
+        Two particularly useful baseline statistics are:</para>
+      <itemizedlist>
+        <listitem>
+          <para><literal>brw_stats</literal> – Histogram data characterizing I/O requests to the
+            OSTs. For more details, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+              linkend="dbdoclet.50438271_55057"/>.</para>
+        </listitem>
+        <listitem>
+          <para><literal>rpc_stats</literal> – Histogram data showing information about RPCs made by
+            clients. For more details, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+              linkend="MonitoringClientRCPStream"/>.</para>
+        </listitem>
+      </itemizedlist>
+    </note>
+    <section remap="h3" xml:id="MonitoringClientRCPStream">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>watching RPC</secondary>
+        </indexterm>Monitoring the Client RPC Stream</title>
+      <para>The <literal>rpc_stats</literal> file contains histogram data showing information about
+        remote procedure calls (RPCs) that have been made since this file was last cleared. The
+        histogram data can be cleared by writing any value into the <literal>rpc_stats</literal>
+        file.</para>
+      <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
+      <screen># lctl get_param osc.testfs-OST0000-osc-ffff810058d2f800.rpc_stats
+snapshot_time:            1372786692.389858 (secs.usecs)
+read RPCs in flight:      0
+write RPCs in flight:     1
+dio read RPCs in flight:  0
+dio write RPCs in flight: 0
+pending write pages:      256
+pending read pages:       0
 
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290364" xreflabel=""/># cat /proc/fs/lustre/mgs/MGS/filesystems
-<anchor xml:id="dbdoclet.50438271_pgfId-1291584" xreflabel=""/>spfs
-<anchor xml:id="dbdoclet.50438271_pgfId-1291587" xreflabel=""/>lustre
-</screen>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1290367" xreflabel=""/> The server names participating in a file system (for each file system that has at least one server running)</para>
-          </listitem>
+                     read                   write
+pages per rpc   rpcs   % cum % |       rpcs   % cum %
+1:                 0   0   0   |          0   0   0
+2:                 0   0   0   |          1   0   0
+4:                 0   0   0   |          0   0   0
+8:                 0   0   0   |          0   0   0
+16:                0   0   0   |          0   0   0
+32:                0   0   0   |          2   0   0
+64:                0   0   0   |          2   0   0
+128:               0   0   0   |          5   0   0
+256:             850 100 100   |      18346  99 100
 
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290368" xreflabel=""/># cat /proc/fs/lustre/mgs/MGS/live/spfs
-<anchor xml:id="dbdoclet.50438271_pgfId-1291593" xreflabel=""/>fsname: spfs
-<anchor xml:id="dbdoclet.50438271_pgfId-1291596" xreflabel=""/>flags: 0x0         gen: 7
-<anchor xml:id="dbdoclet.50438271_pgfId-1291599" xreflabel=""/>spfs-MDT0000
-<anchor xml:id="dbdoclet.50438271_pgfId-1291602" xreflabel=""/>spfs-OST0000
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290373" xreflabel=""/>All servers are named according to this convention: &lt;fsname&gt;-&lt;MDT|OST&gt;&lt;XXXX&gt; This can be shown for live servers under /proc/fs/lustre/devices:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290374" xreflabel=""/># cat /proc/fs/lustre/devices 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290375" xreflabel=""/>0 UP mgs MGS MGS 11
-<anchor xml:id="dbdoclet.50438271_pgfId-1290376" xreflabel=""/>1 UP mgc MGC192.168.10.34@tcp 1f45bb57-d9be-2ddb-c0b0-5431a49226705
-<anchor xml:id="dbdoclet.50438271_pgfId-1290377" xreflabel=""/>2 UP mdt MDS MDS_uuid 3
-<anchor xml:id="dbdoclet.50438271_pgfId-1290378" xreflabel=""/>3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
-<anchor xml:id="dbdoclet.50438271_pgfId-1290379" xreflabel=""/>4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 7
-<anchor xml:id="dbdoclet.50438271_pgfId-1290380" xreflabel=""/>5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
-<anchor xml:id="dbdoclet.50438271_pgfId-1290381" xreflabel=""/>6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
-<anchor xml:id="dbdoclet.50438271_pgfId-1290382" xreflabel=""/>7 UP lov lustre-clilov-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa04
-<anchor xml:id="dbdoclet.50438271_pgfId-1290383" xreflabel=""/>8 UP mdc lustre-MDT0000-mdc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05
-<anchor xml:id="dbdoclet.50438271_pgfId-1290384" xreflabel=""/>9 UP osc lustre-OST0000-osc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05
-<anchor xml:id="dbdoclet.50438271_pgfId-1290385" xreflabel=""/>10 UP osc lustre-OST0001-osc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290386" xreflabel=""/>Or from the device label at any time:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290387" xreflabel=""/># e2label /dev/sda
-<anchor xml:id="dbdoclet.50438271_pgfId-1290388" xreflabel=""/>lustre-MDT0000
-</screen>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290389" xreflabel=""/>31.1.2 Lustre <anchor xml:id="dbdoclet.50438271_marker-1296153" xreflabel=""/>Timeouts</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294163" xreflabel=""/>Lustre uses two types of timeouts.</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1294195" xreflabel=""/>LND timeouts that ensure point-to-point communications complete in finite time in the presence of failures. These timeouts are logged with the S_LND flag set. They may <emphasis>not</emphasis> be printed as console messages, so you should check the Lustre log for D_NETERROR messages, or enable printing of D_NETERROR messages to the console (echo + neterror &gt; /proc/sys/lnet/printk).</para>
-          </listitem>
+                     read                   write
+rpcs in flight  rpcs   % cum % |       rpcs   % cum %
+0:               691  81  81   |       1740   9   9
+1:                48   5  86   |        938   5  14
+2:                29   3  90   |       1059   5  20
+3:                17   2  92   |       1052   5  26
+4:                13   1  93   |        920   5  31
+5:                12   1  95   |        425   2  33
+6:                10   1  96   |        389   2  35
+7:                30   3 100   |      11373  61  97
+8:                 0   0 100   |        460   2 100
 
-</itemizedlist>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294211" xreflabel=""/>Congested routers can be a source of spurious LND timeouts. To avoid this, increase the number of LNET router buffers to reduce back-pressure and/or increase LND timeouts on all nodes on all connected networks. You should also consider increasing the total number of LNET router nodes in the system so that the aggregate router bandwidth matches the aggregate server bandwidth.</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1294177" xreflabel=""/>Lustre timeouts that ensure Lustre RPCs complete in finite time in the presence of failures. These timeouts should <emphasis>always</emphasis> be printed as console messages. If Lustre timeouts are not accompanied by LNET timeouts, then you need to increase the lustre timeout on both servers and clients.</para>
-          </listitem>
+                     read                   write
+offset          rpcs   % cum % |       rpcs   % cum %
+0:               850 100 100   |      18347  99  99
+1:                 0   0 100   |          0   0  99
+2:                 0   0 100   |          0   0  99
+4:                 0   0 100   |          0   0  99
+8:                 0   0 100   |          0   0  99
+16:                0   0 100   |          1   0  99
+32:                0   0 100   |          1   0  99
+64:                0   0 100   |          3   0  99
+128:               0   0 100   |          4   0 100
 
-</itemizedlist>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294236" xreflabel=""/>Specific Lustre timeouts are described below.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290390" xreflabel=""/><emphasis role="bold">/proc/sys/lustre/timeout</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290391" xreflabel=""/>This is the time period that a client waits for a server to complete an RPC (default is 100s). Servers wait half of this time for a normal client RPC to complete and a quarter of this time for a single bulk request (read or write of up to 1 MB) to complete. The client pings recoverable targets (MDS and OSTs) at one quarter of the timeout, and the server waits one and a half times the timeout before evicting a client for being &quot;stale.&quot;</para>
-                <note><para>Lustre sends periodic 'PING' messages to servers with which it had no communication for a specified period of time. Any network activity on the file system that triggers network traffic toward servers also works as a health check.</para></note>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1292930" xreflabel=""/><emphasis role="bold">/proc/sys/lustre/ldlm_timeout</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290393" xreflabel=""/>This is the time period for which a server will wait for a client to reply to an initial AST (lock cancellation request) where default is 20s for an OST and 6s for an MDS. If the client replies to the AST, the server will give it a normal timeout (half of the client timeout) to flush any dirty data and release the lock.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290394" xreflabel=""/><emphasis role="bold">/proc/sys/lustre/fail_loc</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290395" xreflabel=""/>This is the internal debugging failure hook.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290396" xreflabel=""/>See lustre/include/linux/obd_support.h for the definitions of individual failure locations. The default value is 0 (zero).</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290397" xreflabel=""/>sysctl -w lustre.fail_loc=0x80000122 # drop a single reply
 </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290398" xreflabel=""/><emphasis role="bold">/proc/sys/lustre/dump_on_timeout</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290399" xreflabel=""/>This triggers dumps of the Lustre debug log when timeouts occur. The default value is 0 (zero).</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294958" xreflabel=""/><emphasis role="bold">/proc/sys/lustre/dump_on_eviction</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294959" xreflabel=""/>This triggers dumps of the Lustre debug log when an eviction occurs. The default value is 0 (zero). By default, debug logs are dumped to the /tmp folder; this location can be changed via /proc.</para>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1292935" xreflabel=""/>31.1.3 Adaptive <anchor xml:id="dbdoclet.50438271_marker-1293380" xreflabel=""/>Timeouts</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1292944" xreflabel=""/>Lustre offers an adaptive mechanism to set RPC timeouts. The adaptive timeouts feature (enabled, by default) causes servers to track actual RPC completion times, and to report estimated completion times for future RPCs back to clients. The clients use these estimates to set their future RPC timeout values. If server request processing slows down for any reason, the RPC completion estimates increase, and the clients allow more time for RPC completion.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1293013" xreflabel=""/>If RPCs queued on the server approach their timeouts, then the server sends an early reply to the client, telling the client to allow more time. In this manner, clients avoid RPC timeouts and disconnect/reconnect cycles. Conversely, as a server speeds up, RPC timeout values decrease, allowing faster detection of non-responsive servers and faster attempts to reconnect to a server&apos;s failover partner.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1292945" xreflabel=""/>In previous Lustre versions, the static obd_timeout (/proc/sys/lustre/timeout) value was used as the maximum completion time for all RPCs; this value also affected the client-server ping interval and initial recovery timer. Now, with adaptive timeouts, obd_timeout is only used for the ping interval and initial recovery estimate. When a client reconnects during recovery, the server uses the client&apos;s timeout value to reset the recovery wait period; i.e., the server learns how long the client had been willing to wait, and takes this into account when adjusting the recovery period.</para>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1292947" xreflabel=""/>31.1.3.1 Configuring <anchor xml:id="dbdoclet.50438271_marker-1293381" xreflabel=""/>Adaptive Timeouts</title>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1292948" xreflabel=""/>One of the goals of adaptive timeouts is to relieve users from having to tune the obd_timeout value. In general, obd_timeout should no longer need to be changed. However, there are several parameters related to adaptive timeouts that users can set. In most situations, the default values should be used.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1299111" xreflabel=""/>The following parameters can be set persistently system-wide using lctl conf_param on the MGS. For example, lctl conf_param work1.sys.at_max=1500 sets the at_max value for all servers and clients using the work1 file system.</para>
-                  <note><para>Nodes using multiple Lustre file systems must use the same at_* values for all file systems.)</para></note>
-           <informaltable frame="all">
-            <tgroup cols="2">
-              <colspec colname="c1" colwidth="50*"/>
-              <colspec colname="c2" colwidth="50*"/>
-              <thead>
-                <row>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1294373" xreflabel=""/>Parameter</emphasis></para></entry>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1294375" xreflabel=""/>Description</emphasis></para></entry>
-                </row>
-              </thead>
-              <tbody>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294377" xreflabel=""/><emphasis role="bold">at_min</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294379" xreflabel=""/>Sets the minimum adaptive timeout (in seconds). Default value is 0. The at_min parameter is the minimum processing time that a server will report. Clients base their timeouts on this value, but they do not use this value directly. If you experience cases in which, for unknown reasons, the adaptive timeout value is too short and clients time out their RPCs (usually due to temporary network outages), then you can increase the at_min value to compensate for this. Ideally, users should leave at_min set to its default.</para></entry>
-                </row>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294381" xreflabel=""/><emphasis role="bold">at_max</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1299206" xreflabel=""/>Sets the maximum adaptive timeout (in seconds). The at_max parameter is an upper-limit on the service time estimate, and is used as a &apos;failsafe&apos; in case of rogue/bad/buggy code that would lead to never-ending estimate increases. If at_max is reached, an RPC request is considered &apos;broken&apos; and should time out.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1294383" xreflabel=""/>Setting at_max to 0 causes adaptive timeouts to be disabled and the old fixed-timeout method (obd_timeout) to be used. This is the default value in Lustre 1.6.5.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1294387" xreflabel=""/> </para>
-                      
-                      <note><para>It is possible that slow hardware might validly cause the service estimate to increase beyond the default value of at_max. In this case, you should increase at_max to the maximum time you are willing to wait for an RPC completion.</para></note></entry>
-                </row>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294390" xreflabel=""/><emphasis role="bold">at_history</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294392" xreflabel=""/>Sets a time period (in seconds) within which adaptive timeouts remember the slowest event that occurred. Default value is 600.</para></entry>
-                </row>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294394" xreflabel=""/><emphasis role="bold">at_early_margin</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294396" xreflabel=""/>Sets how far before the deadline Lustre sends an early reply. Default value is 5<footnote><para><anchor xml:id="dbdoclet.50438271_pgfId-1294399" xreflabel=""/>This default was chosen as a reasonable time in which to send a reply from the point at which it was sent.</para></footnote>.</para></entry>
-          
-                </row>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294401" xreflabel=""/><emphasis role="bold">at_extra</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294403" xreflabel=""/>Sets the incremental amount of time that a server asks for, with each early reply. The server does not know how much time the RPC will take, so it asks for a fixed value. Default value is 30<footnote><para><anchor xml:id="dbdoclet.50438271_pgfId-1294406" xreflabel=""/>This default was chosen as a balance between sending too many early replies for the same RPC and overestimating the actual completion time</para></footnote>. When a server finds a queued request about to time out (and needs to send an early reply out), the server adds the at_extra value. If the time expires, the Lustre client enters recovery status and reconnects to restore it to normal status.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1294407" xreflabel=""/>If you see multiple early replies for the same RPC asking for multiple 30-second increases, change the at_extra value to a larger number to cut down on early replies sent and, therefore, network load.</para></entry>
-                </row>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294409" xreflabel=""/><emphasis role="bold">ldlm_enqueue_min</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294411" xreflabel=""/>Sets the minimum lock enqueue time. Default value is 100. The ldlm_enqueue time is the maximum of the measured enqueue estimate (influenced by at_min and at_max parameters), multiplied by a weighting factor, and the ldlm_enqueue_min setting. LDLM lock enqueues were based on the obd_timeout value; now they have a dedicated minimum value. Lock enqueues increase as the measured enqueue times increase (similar to adaptive timeouts).</para></entry>
-                </row>
-              </tbody>
-            </tgroup>
-          </informaltable>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1293587" xreflabel=""/>Adaptive timeouts are enabled, by default. To disable adaptive timeouts, at run time, set at_max to 0. On the MGS, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1294471" xreflabel=""/>$ lctl conf_param &lt;fsname&gt;.sys.at_max=0
-</screen>
-                  <note><para>Changing adaptive timeouts status at runtime may cause transient timeout, reconnect, recovery, etc.</para></note>
-        </section>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1292959" xreflabel=""/>31.1.3.2 Interpreting <anchor xml:id="dbdoclet.50438271_marker-1293383" xreflabel=""/>Adaptive Timeouts Information</title>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1299183" xreflabel=""/>Adaptive timeouts information can be read from /proc/fs/lustre/*/timeouts files (for each service and client) or with the lctl command.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1294311" xreflabel=""/>This is an example from the /proc/fs/lustre/*/timeouts files:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1293196" xreflabel=""/>cfs21:~# cat /proc/fs/lustre/ost/OSS/ost_io/timeouts
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1299178" xreflabel=""/>This is an example using the lctl command:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1294318" xreflabel=""/>$ lctl get_param -n ost.*.ost_io.timeouts
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1294307" xreflabel=""/>This is the sample output:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1294322" xreflabel=""/>service : cur 33  worst 34 (at 1193427052, 0d0h26m40s ago) 1 1 33 2
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1292962" xreflabel=""/>The ost_io service on this node is currently reporting an estimate of 33 seconds. The worst RPC service time was 34 seconds, and it happened 26 minutes ago.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1293207" xreflabel=""/>The output also provides a history of service times. In the example, there are 4 &quot;bins&quot; of adaptive_timeout_history, with the maximum RPC time in each bin reported. In 0-150 seconds, the maximum RPC time was 1, with the same result in 150-300 seconds. From 300-450 seconds, the worst (maximum) RPC time was 33 seconds, and from 450-600s the worst time was 2 seconds. The current estimated service time is the maximum value of the 4 bins (33 seconds in this example).</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1294437" xreflabel=""/>Service times (as reported by the servers) are also tracked in the client OBDs:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1292965" xreflabel=""/>cfs21:# lctl get_param osc.*.timeouts
-<anchor xml:id="dbdoclet.50438271_pgfId-1292966" xreflabel=""/>last reply : 1193428639, 0d0h00m00s ago
-<anchor xml:id="dbdoclet.50438271_pgfId-1292967" xreflabel=""/>network    : cur   1  worst   2 (at 1193427053, 0d0h26m26s ago)   1   1   1\
-   1
-<anchor xml:id="dbdoclet.50438271_pgfId-1292968" xreflabel=""/>portal 6   : cur  33  worst  34 (at 1193427052, 0d0h26m27s ago)  33  33  33\
-   2
-<anchor xml:id="dbdoclet.50438271_pgfId-1292969" xreflabel=""/>portal 28  : cur   1  worst   1 (at 1193426141, 0d0h41m38s ago)   1   1   1\
-   1
-<anchor xml:id="dbdoclet.50438271_pgfId-1292970" xreflabel=""/>portal 7   : cur   1  worst   1 (at 1193426141, 0d0h41m38s ago)   1   0   1\
-   1
-<anchor xml:id="dbdoclet.50438271_pgfId-1292971" xreflabel=""/>portal 17  : cur   1  worst   1 (at 1193426177, 0d0h41m02s ago)   1   0   0\
-   1
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1292972" xreflabel=""/>In this case, RPCs to portal 6, the OST_IO_PORTAL (see lustre/include/lustre/lustre_idl.h), shows the history of what the ost_io portal has reported as the service estimate.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1292974" xreflabel=""/>Server statistic files also show the range of estimates in the normal min/max/sum/sumsq manner.</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1292975" xreflabel=""/>cfs21:~# lctl get_param mdt.*.mdt.stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1292976" xreflabel=""/>...
-<anchor xml:id="dbdoclet.50438271_pgfId-1292977" xreflabel=""/>req_timeout               6 samples [sec] 1 10 15 105
-<anchor xml:id="dbdoclet.50438271_pgfId-1292978" xreflabel=""/>...
-</screen>
-        </section>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290400" xreflabel=""/>31.1.4 LNET <anchor xml:id="dbdoclet.50438271_marker-1296164" xreflabel=""/>Information</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1291795" xreflabel=""/>This section describes /proc entries for LNET information.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290401" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/peers</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290402" xreflabel=""/>Shows all NIDs known to this node and also gives information on the queue state.</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290403" xreflabel=""/># cat /proc/sys/lnet/peers
-<anchor xml:id="dbdoclet.50438271_pgfId-1290404" xreflabel=""/>nid                        refs            state           max             \
-rtr             min             tx              min             queue
-<anchor xml:id="dbdoclet.50438271_pgfId-1290405" xreflabel=""/>0@lo                       1               ~rtr            0               \
-0               0               0               0               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290406" xreflabel=""/>192.168.10.35@tcp  1               ~rtr            8               8       \
-        8               8               6               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290407" xreflabel=""/>192.168.10.36@tcp  1               ~rtr            8               8       \
-        8               8               6               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290408" xreflabel=""/>192.168.10.37@tcp  1               ~rtr            8               8       \
-        8               8               6               0
+      <para>The header information includes:</para>
+      <itemizedlist>
+        <listitem>
+          <para><literal>snapshot_time</literal> - UNIX epoch instant the file was read.</para>
+        </listitem>
+        <listitem>
+          <para><literal>read RPCs in flight</literal> - Number of read RPCs issued by the OSC, but
+            not complete at the time of the snapshot. This value should always be less than or equal
+            to <literal>max_rpcs_in_flight</literal>.</para>
+        </listitem>
+        <listitem>
+          <para><literal>write RPCs in flight</literal> - Number of write RPCs issued by the OSC,
+            but not complete at the time of the snapshot. This value should always be less than or
+            equal to <literal>max_rpcs_in_flight</literal>.</para>
+        </listitem>
+        <listitem>
+          <para><literal>dio read RPCs in flight</literal> - Direct I/O (as opposed to block I/O)
+            read RPCs issued but not completed at the time of the snapshot.</para>
+        </listitem>
+        <listitem>
+          <para><literal>dio write RPCs in flight</literal> - Direct I/O (as opposed to block I/O)
+            write RPCs issued but not completed at the time of the snapshot.</para>
+        </listitem>
+        <listitem>
+          <para><literal>pending write pages</literal>  - Number of pending write pages that have
+            been queued for I/O in the OSC.</para>
+        </listitem>
+        <listitem>
+          <para><literal>pending read pages</literal> - Number of pending read pages that have been
+            queued for I/O in the OSC.</para>
+        </listitem>
+      </itemizedlist>
+      <para>The tabular data is described in the table below. Each row in the table shows the number
+        of reads or writes (<literal>ios</literal>) occurring for the statistic, the relative
+        percentage (<literal>%</literal>) of total reads or writes, and the cumulative percentage
+          (<literal>cum %</literal>) to that point in the table for the statistic.</para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="40*"/>
+          <colspec colname="c2" colwidth="60*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Field</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para> pages per RPC</para>
+              </entry>
+              <entry>
+                <para>Shows cumulative RPC reads and writes organized according to the number of
+                  pages in the RPC. A single page RPC increments the <literal>0:</literal>
+                  row.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para> RPCs in flight</para>
+              </entry>
+              <entry>
+                <para> Shows the number of RPCs that are pending when an RPC is sent. When the first
+                  RPC is sent, the <literal>0:</literal> row is incremented. If the first RPC is
+                  sent while another RPC is pending, the <literal>1:</literal> row is incremented
+                  and so on. </para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para> offset</para>
+              </entry>
+              <entry>
+                <para> The page index of the first page read from or written to the object by the
+                  RPC. </para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <para><emphasis role="italic"><emphasis role="bold">Analysis:</emphasis></emphasis></para>
+      <para>This table provides a way to visualize the concurrency of the RPC stream. Ideally, you
+        will see a large clump around the <literal>max_rpcs_in_flight value</literal>, which shows
+        that the network is being kept busy.</para>
+      <para>For information about optimizing the client I/O RPC stream, see <xref
+          xmlns:xlink="http://www.w3.org/1999/xlink" linkend="TuningClientIORPCStream"/>.</para>
+    </section>
+    <section xml:id="lustreproc.clientstats" remap="h3">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>client stats</secondary>
+        </indexterm>Monitoring Client Activity</title>
+      <para>The <literal>stats</literal> file maintains statistics accumulate during typical
+        operation of a client across the VFS interface of the Lustre file system. Only non-zero
+        parameters are displayed in the file. </para>
+      <para>Client statistics are enabled by default.</para>
+      <note>
+        <para>Statistics for all mounted file systems can be discovered by
+          entering:<screen>lctl get_param llite.*.stats</screen></para>
+      </note>
+      <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
+      <screen>client# lctl get_param llite.*.stats
+snapshot_time          1308343279.169704 secs.usecs
+dirty_pages_hits       14819716 samples [regs]
+dirty_pages_misses     81473472 samples [regs]
+read_bytes             36502963 samples [bytes] 1 26843582 55488794
+write_bytes            22985001 samples [bytes] 0 125912 3379002
+brw_read               2279 samples [pages] 1 1 2270
+ioctl                  186749 samples [regs]
+open                   3304805 samples [regs]
+close                  3331323 samples [regs]
+seek                   48222475 samples [regs]
+fsync                  963 samples [regs]
+truncate               9073 samples [regs]
+setxattr               19059 samples [regs]
+getxattr               61169 samples [regs]
 </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290449" xreflabel=""/>The fields are explained below:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1291804" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1291806" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291808" xreflabel=""/><emphasis role="bold">refs</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291810" xreflabel=""/>A reference count (principally used for debugging)</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291812" xreflabel=""/><emphasis role="bold">state</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291863" xreflabel=""/>Only valid to refer to routers. Possible values:</para><itemizedlist><listitem>
-                      <para><anchor xml:id="dbdoclet.50438271_pgfId-1291864" xreflabel=""/> ~ rtr (indicates this node is not a router)</para>
+      <para> The statistics can be cleared by echoing an empty string into the
+          <literal>stats</literal> file or by using the command:
+        <screen>lctl set_param llite.*.stats=0</screen></para>
+      <para>The statistics displayed are described in the table below.</para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="3*"/>
+          <colspec colname="c2" colwidth="7*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Entry</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para>
+                  <literal>snapshot_time</literal></para>
+              </entry>
+              <entry>
+                <para>UNIX epoch instant the stats file was read.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>dirty_page_hits</literal></para>
+              </entry>
+              <entry>
+                <para>The number of write operations that have been satisfied by the dirty page
+                  cache. See <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+                    linkend="TuningClientIORPCStream"/> for more information about dirty cache
+                  behavior in a Lustre file system.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>dirty_page_misses</literal></para>
+              </entry>
+              <entry>
+                <para>The number of write operations that were not satisfied by the dirty page
+                  cache.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>read_bytes</literal></para>
+              </entry>
+              <entry>
+                <para>The number of read operations that have occurred. Three additional parameters
+                  are displayed:</para>
+                <variablelist>
+                  <varlistentry>
+                    <term>min</term>
+                    <listitem>
+                      <para>The minimum number of bytes read in a single request since the counter
+                        was reset.</para>
                     </listitem>
-<listitem>
-                      <para><anchor xml:id="dbdoclet.50438271_pgfId-1291865" xreflabel=""/> up/down (indicates this node is a router)</para>
+                  </varlistentry>
+                  <varlistentry>
+                    <term>max</term>
+                    <listitem>
+                      <para>The maximum number of bytes read in a single request since the counter
+                        was reset.</para>
                     </listitem>
-<listitem>
-                      <para><anchor xml:id="dbdoclet.50438271_pgfId-1291871" xreflabel=""/> auto_fail must be enabled</para>
+                  </varlistentry>
+                  <varlistentry>
+                    <term>sum</term>
+                    <listitem>
+                      <para>The accumulated sum of bytes of all read requests since the counter was
+                        reset.</para>
                     </listitem>
-</itemizedlist></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291816" xreflabel=""/><emphasis role="bold">max</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291818" xreflabel=""/>Maximum number of concurrent sends from this peer</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291820" xreflabel=""/><emphasis role="bold">rtr</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291822" xreflabel=""/>Routing buffer credits.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291824" xreflabel=""/><emphasis role="bold">min</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291826" xreflabel=""/>Minimum routing buffer credits seen.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291828" xreflabel=""/><emphasis role="bold">tx</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291830" xreflabel=""/>Send credits.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291832" xreflabel=""/><emphasis role="bold">min</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291834" xreflabel=""/>Minimum send credits seen.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291836" xreflabel=""/><emphasis role="bold">queue</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291838" xreflabel=""/>Total bytes in active/queued sends.</para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290450" xreflabel=""/>Credits work like a semaphore. At start they are initialized to allow a certain number of operations (8 in this example). LNET keeps a track of the minimum value so that you can see how congested a resource was.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290451" xreflabel=""/>If rtr/tx is less than max, there are operations in progress. The number of operations is equal to rtr or tx subtracted from max.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290452" xreflabel=""/>If rtr/tx is greater that max, there are operations blocking.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290453" xreflabel=""/>LNET also limits concurrent sends and router buffers allocated to a single peer so that no peer can occupy all these resources.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290454" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/nis</emphasis></para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290455" xreflabel=""/># cat /proc/sys/lnet/nis
-<anchor xml:id="dbdoclet.50438271_pgfId-1290456" xreflabel=""/>nid                                refs            peer            max     \
-        tx              min
-<anchor xml:id="dbdoclet.50438271_pgfId-1290457" xreflabel=""/>0@lo                               3               0               0       \
-        0               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290458" xreflabel=""/>192.168.10.34@tcp          4               8               256             \
-256             252
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290459" xreflabel=""/>Shows the current queue health on this node. The fields are explained below:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1291912" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1291914" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291963" xreflabel=""/><emphasis role="bold">nid</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291965" xreflabel=""/>Network interface</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291967" xreflabel=""/><emphasis role="bold">refs</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291969" xreflabel=""/>Internal reference counter</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291971" xreflabel=""/><emphasis role="bold">peer</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291973" xreflabel=""/>Number of peer-to-peer send credits on this NID. Credits are used to size buffer pools</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291975" xreflabel=""/><emphasis role="bold">max</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291977" xreflabel=""/>Total number of send credits on this NID.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291979" xreflabel=""/><emphasis role="bold">tx</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291981" xreflabel=""/>Current number of send credits available on this NID.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291983" xreflabel=""/><emphasis role="bold">min</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291985" xreflabel=""/>Lowest number of send credits available on this NID.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291987" xreflabel=""/><emphasis role="bold">queue</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1291989" xreflabel=""/>Total bytes in active/queued sends.</para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290494" xreflabel=""/>Subtracting max - tx yields the number of sends currently active. A large or increasing number of active sends may indicate a problem.</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290495" xreflabel=""/># cat /proc/sys/lnet/nis
-<anchor xml:id="dbdoclet.50438271_pgfId-1290496" xreflabel=""/>nid                                refs            peer            max     \
-        tx              min
-<anchor xml:id="dbdoclet.50438271_pgfId-1290497" xreflabel=""/>0@lo                               2               0               0       \
-        0               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290498" xreflabel=""/>10.67.73.173@tcp           4               8               256             \
-256             253
-</screen>
-      </section>
+                  </varlistentry>
+                </variablelist>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>write_bytes</literal></para>
+              </entry>
+              <entry>
+                <para>The number of write operations that have occurred. Three additional parameters
+                  are displayed:</para>
+                <variablelist>
+                  <varlistentry>
+                    <term>min</term>
+                    <listitem>
+                      <para>The minimum number of bytes written in a single request since the
+                        counter was reset.</para>
+                    </listitem>
+                  </varlistentry>
+                  <varlistentry>
+                    <term>max</term>
+                    <listitem>
+                      <para>The maximum number of bytes written in a single request since the
+                        counter was reset.</para>
+                    </listitem>
+                  </varlistentry>
+                  <varlistentry>
+                    <term>sum</term>
+                    <listitem>
+                      <para>The accumulated sum of bytes of all write requests since the counter was
+                        reset.</para>
+                    </listitem>
+                  </varlistentry>
+                </variablelist>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>brw_read</literal></para>
+              </entry>
+              <entry>
+                <para>The number of pages that have been read. Three additional parameters are
+                  displayed:</para>
+                <variablelist>
+                  <varlistentry>
+                    <term>min</term>
+                    <listitem>
+                      <para>The minimum number of bytes read in a single block read/write
+                          (<literal>brw</literal>) read request since the counter was reset.</para>
+                    </listitem>
+                  </varlistentry>
+                  <varlistentry>
+                    <term>max</term>
+                    <listitem>
+                      <para>The maximum number of bytes read in a single <literal>brw</literal> read
+                        requests since the counter was reset.</para>
+                    </listitem>
+                  </varlistentry>
+                  <varlistentry>
+                    <term>sum</term>
+                    <listitem>
+                      <para>The accumulated sum of bytes of all <literal>brw</literal> read requests
+                        since the counter was reset.</para>
+                    </listitem>
+                  </varlistentry>
+                </variablelist>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>ioctl</literal></para>
+              </entry>
+              <entry>
+                <para>The number of combined file and directory <literal>ioctl</literal>
+                  operations.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>open</literal></para>
+              </entry>
+              <entry>
+                <para>The number of open operations that have succeeded.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>close</literal></para>
+              </entry>
+              <entry>
+                <para>The number of close operations that have succeeded.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>seek</literal></para>
+              </entry>
+              <entry>
+                <para>The number of times <literal>seek</literal> has been called.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>fsync</literal></para>
+              </entry>
+              <entry>
+                <para>The number of times <literal>fsync</literal> has been called.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>truncate</literal></para>
+              </entry>
+              <entry>
+                <para>The total number of calls to both locked and lockless
+                    <literal>truncate</literal>.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>setxattr</literal></para>
+              </entry>
+              <entry>
+                <para>The number of times extended attributes have been set. </para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>getxattr</literal></para>
+              </entry>
+              <entry>
+                <para>The number of times value(s) of extended attributes have been fetched.</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <para><emphasis role="italic"><emphasis role="bold">Analysis:</emphasis></emphasis></para>
+      <para>Information is provided about the amount and type of I/O activity is taking place on the
+        client.</para>
+    </section>
+    <section remap="h3">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>read/write survey</secondary>
+        </indexterm>Monitoring Client Read-Write Offset Statistics</title>
+      <para>When the <literal>offset_stats</literal> parameter is set, statistics are maintained for
+        occurrences of a series of read or write calls from a process that did not access the next
+        sequential location. The <literal>OFFSET</literal> field is reset to 0 (zero) whenever a
+        different file is read or written.</para>
+      <note>
+        <para>By default, statistics are not collected in the <literal>offset_stats</literal>,
+            <literal>extents_stats</literal>, and <literal>extents_stats_per_process</literal> files
+          to reduce monitoring overhead when this information is not needed.  The collection of
+          statistics in all three of these files is activated by writing
+          anything, except for 0 (zero) and "disable", into any one of the
+          files.</para>
+      </note>
+      <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
+      <screen># lctl get_param llite.testfs-f57dee0.offset_stats
+snapshot_time: 1155748884.591028 (secs.usecs)
+             RANGE   RANGE    SMALLEST   LARGEST
+R/W   PID    START   END      EXTENT     EXTENT    OFFSET
+R     8385   0       128      128        128       0
+R     8385   0       224      224        224       -128
+W     8385   0       250      50         100       0
+W     8385   100     1110     10         500       -150
+W     8384   0       5233     5233       5233      0
+R     8385   500     600      100        100       -610</screen>
+      <para>In this example, <literal>snapshot_time</literal> is the UNIX epoch instant the file was
+        read. The tabular data is described in the table below.</para>
+      <para>The <literal>offset_stats</literal> file can be cleared by
+        entering:<screen>lctl set_param llite.*.offset_stats=0</screen></para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="50*"/>
+          <colspec colname="c2" colwidth="50*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Field</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para>R/W</para>
+              </entry>
+              <entry>
+                <para>Indicates if the non-sequential call was a read or write</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>PID </para>
+              </entry>
+              <entry>
+                <para>Process ID of the process that made the read/write call.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>RANGE START/RANGE END</para>
+              </entry>
+              <entry>
+                <para>Range in which the read/write calls were sequential.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>SMALLEST EXTENT </para>
+              </entry>
+              <entry>
+                <para>Smallest single read/write in the corresponding range (in bytes).</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>LARGEST EXTENT </para>
+              </entry>
+              <entry>
+                <para>Largest single read/write in the corresponding range (in bytes).</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>OFFSET </para>
+              </entry>
+              <entry>
+                <para>Difference between the previous range end and the current range start.</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <para><emphasis role="italic"><emphasis role="bold">Analysis:</emphasis></emphasis></para>
+      <para>This data provides an indication of how contiguous or fragmented the data is. For
+        example, the fourth entry in the example above shows the writes for this RPC were sequential
+        in the range 100 to 1110 with the minimum write 10 bytes and the maximum write 500 bytes.
+        The range started with an offset of -150 from the <literal>RANGE END</literal> of the
+        previous entry in the example.</para>
+    </section>
+    <section remap="h3">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>read/write survey</secondary>
+        </indexterm>Monitoring Client Read-Write Extent Statistics</title>
+      <para>For in-depth troubleshooting, client read-write extent statistics can be accessed to
+        obtain more detail about read/write I/O extents for the file system or for a particular
+        process.</para>
+      <note>
+        <para>By default, statistics are not collected in the <literal>offset_stats</literal>,
+            <literal>extents_stats</literal>, and <literal>extents_stats_per_process</literal> files
+          to reduce monitoring overhead when this information is not needed.  The collection of
+          statistics in all three of these files is activated by writing
+          anything, except for 0 (zero) and "disable", into any one of the
+          files.</para>
+      </note>
       <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290499" xreflabel=""/>31.1.5 Free Space <anchor xml:id="dbdoclet.50438271_marker-1296165" xreflabel=""/>Distribution</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1296514" xreflabel=""/>Free-space stripe weighting, as set, gives a priority of &quot;0&quot; to free space (versus trying to place the stripes &quot;widely&quot; -- nicely distributed across OSSs and OSTs to maximize network balancing). To adjust this priority (as a percentage), use the qos_prio_free proc tunable:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296515" xreflabel=""/>$ cat /proc/fs/lustre/lov/&lt;fsname&gt;-mdtlov/qos_prio_free
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1296516" xreflabel=""/>Currently, the default is 90%. You can permanently set this value by running this command on the MGS:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296517" xreflabel=""/>$ lctl conf_param &lt;fsname&gt;-MDT0000.lov.qos_prio_free=90
+        <title>Client-Based I/O Extent Size Survey</title>
+        <para>The <literal>extents_stats</literal> histogram in the
+          <literal>llite</literal> directory shows the statistics for the sizes
+          of the read/write I/O extents. This file does not maintain the per
+          process statistics.</para>
+        <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
+        <screen># lctl get_param llite.testfs-*.extents_stats
+snapshot_time:                     1213828728.348516 (secs.usecs)
+                       read           |            write
+extents          calls  %      cum%   |     calls  %     cum%
+
+0K - 4K :        0      0      0      |     2      2     2
+4K - 8K :        0      0      0      |     0      0     2
+8K - 16K :       0      0      0      |     0      0     2
+16K - 32K :      0      0      0      |     20     23    26
+32K - 64K :      0      0      0      |     0      0     26
+64K - 128K :     0      0      0      |     51     60    86
+128K - 256K :    0      0      0      |     0      0     86
+256K - 512K :    0      0      0      |     0      0     86
+512K - 1024K :   0      0      0      |     0      0     86
+1M - 2M :        0      0      0      |     11     13    100</screen>
+        <para>In this example, <literal>snapshot_time</literal> is the UNIX epoch instant the file
+          was read. The table shows cumulative extents organized according to size with statistics
+          provided separately for reads and writes. Each row in the table shows the number of RPCs
+          for reads and writes respectively (<literal>calls</literal>), the relative percentage of
+          total calls (<literal>%</literal>), and the cumulative percentage to
+          that point in the table of calls (<literal>cum %</literal>). </para>
+        <para> The file can be cleared by issuing the following command:
+        <screen># lctl set_param llite.testfs-*.extents_stats=1</screen></para>
+      </section>
+      <section>
+        <title>Per-Process Client I/O Statistics</title>
+        <para>The <literal>extents_stats_per_process</literal> file maintains the I/O extent size
+          statistics on a per-process basis.</para>
+        <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
+        <screen># lctl get_param llite.testfs-*.extents_stats_per_process
+snapshot_time:                     1213828762.204440 (secs.usecs)
+                          read            |             write
+extents            calls   %      cum%    |      calls   %       cum%
+PID: 11488
+   0K - 4K :       0       0       0      |      0       0       0
+   4K - 8K :       0       0       0      |      0       0       0
+   8K - 16K :      0       0       0      |      0       0       0
+   16K - 32K :     0       0       0      |      0       0       0
+   32K - 64K :     0       0       0      |      0       0       0
+   64K - 128K :    0       0       0      |      0       0       0
+   128K - 256K :   0       0       0      |      0       0       0
+   256K - 512K :   0       0       0      |      0       0       0
+   512K - 1024K :  0       0       0      |      0       0       0
+   1M - 2M :       0       0       0      |      10      100     100
+PID: 11491
+   0K - 4K :       0       0       0      |      0       0       0
+   4K - 8K :       0       0       0      |      0       0       0
+   8K - 16K :      0       0       0      |      0       0       0
+   16K - 32K :     0       0       0      |      20      100     100
+   
+PID: 11424
+   0K - 4K :       0       0       0      |      0       0       0
+   4K - 8K :       0       0       0      |      0       0       0
+   8K - 16K :      0       0       0      |      0       0       0
+   16K - 32K :     0       0       0      |      0       0       0
+   32K - 64K :     0       0       0      |      0       0       0
+   64K - 128K :    0       0       0      |      16      100     100
+PID: 11426
+   0K - 4K :       0       0       0      |      1       100     100
+PID: 11429
+   0K - 4K :       0       0       0      |      1       100     100
 </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1296518" xreflabel=""/>Setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via weighting. If OST 2 has twice as much free space as OST 1, it is twice as likely to be used, but it is NOT guaranteed to be used.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290506" xreflabel=""/>Also note that free-space stripe weighting does not activate until two OSTs are imbalanced by more than 20%. Until then, a faster round-robin stripe allocator is used. (The new round-robin order also maximizes network balancing.)</para>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1296529" xreflabel=""/>31.1.5.1 Managing Stripe Allocation</title>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296530" xreflabel=""/>The MDS uses two methods to manage stripe allocation and determine which OSTs to use for file object storage:</para>
-          <itemizedlist><listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1296531" xreflabel=""/><emphasis role="bold">QOS</emphasis></para>
-            </listitem>
+        <para>This table shows cumulative extents organized according to size for each process ID
+          (PID) with statistics provided separately for reads and writes. Each row in the table
+          shows the number of RPCs for reads and writes respectively (<literal>calls</literal>), the
+          relative percentage of total calls (<literal>%</literal>), and the cumulative percentage
+          to that point in the table of calls (<literal>cum %</literal>). </para>
+      </section>
+    </section>
+    <section xml:id="dbdoclet.50438271_55057">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>block I/O</secondary>
+        </indexterm>Monitoring the OST Block I/O Stream</title>
+      <para>The <literal>brw_stats</literal> parameter file below the
+      <literal>osd-ldiskfs</literal> or <literal>osd-zfs</literal> directory
+        contains histogram data showing statistics for number of I/O requests
+        sent to the disk, their size, and whether they are contiguous on the
+        disk or not.</para>
+      <para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
+      <para>Enter on the OSS or MDS:</para>
+      <screen>oss# lctl get_param osd-*.*.brw_stats 
+snapshot_time:         1372775039.769045 (secs.usecs)
+                           read      |      write
+pages per bulk r/w     rpcs  % cum % |  rpcs   % cum %
+1:                     108 100 100   |    39   0   0
+2:                       0   0 100   |     6   0   0
+4:                       0   0 100   |     1   0   0
+8:                       0   0 100   |     0   0   0
+16:                      0   0 100   |     4   0   0
+32:                      0   0 100   |    17   0   0
+64:                      0   0 100   |    12   0   0
+128:                     0   0 100   |    24   0   0
+256:                     0   0 100   | 23142  99 100
 
-</itemizedlist>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297308" xreflabel=""/>Quality of Service (QOS) considers an OST's available blocks, speed, and the number of existing objects, etc. Using these criteria, the MDS selects OSTs with more free space more often than OSTs with less free space.</para>
-          <itemizedlist><listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1297309" xreflabel=""/><emphasis role="bold">RR</emphasis></para>
-            </listitem>
+                           read      |      write
+discontiguous pages    rpcs  % cum % |  rpcs   % cum %
+0:                     108 100 100   | 23245 100 100
 
-</itemizedlist>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296534" xreflabel=""/>Round-Robin (RR) allocates objects evenly across all OSTs. The RR stripe allocator is faster than QOS, and used often because it distributes space usage/load best in most situations, maximizing network balancing and improving performance.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296535" xreflabel=""/>Whether QOS or RR is used depends on the setting of the qos_threshold_rr proc tunable. The qos_threshold_rr variable specifies a percentage threshold where the use of QOS or RR becomes more/less likely. The qos_threshold_rr tunable can be set as an integer, from 0 to 100, and results in this stripe allocation behavior:</para>
-          <itemizedlist><listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1296536" xreflabel=""/> If qos_threshold_rr is set to 0, then QOS is always used</para>
-            </listitem>
+                           read      |      write
+discontiguous blocks   rpcs  % cum % |  rpcs   % cum %
+0:                     108 100 100   | 23243  99  99
+1:                       0   0 100   |     2   0 100
 
-<listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1296537" xreflabel=""/> If qos_threshold_rr is set to 100, then RR is always used</para>
-            </listitem>
+                           read      |      write
+disk fragmented I/Os   ios   % cum % |   ios   % cum %
+0:                      94  87  87   |     0   0   0
+1:                      14  12 100   | 23243  99  99
+2:                       0   0 100   |     2   0 100
 
-<listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1296538" xreflabel=""/> The larger the qos_threshold_rr setting, the greater the possibility that RR is used instead of QOS</para>
-            </listitem>
+                           read      |      write
+disk I/Os in flight    ios   % cum % |   ios   % cum %
+1:                      14 100 100   | 20896  89  89
+2:                       0   0 100   |  1071   4  94
+3:                       0   0 100   |   573   2  96
+4:                       0   0 100   |   300   1  98
+5:                       0   0 100   |   166   0  98
+6:                       0   0 100   |   108   0  99
+7:                       0   0 100   |    81   0  99
+8:                       0   0 100   |    47   0  99
+9:                       0   0 100   |     5   0 100
 
-</itemizedlist>
-        </section>
-      </section>
-    </section>
-    <section xml:id="dbdoclet.50438271_78950">
-      <title>31.2 Lustre I/O <anchor xml:id="dbdoclet.50438271_marker-1290508" xreflabel=""/>Tunables</title>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290510" xreflabel=""/>The section describes I/O tunables.</para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290511" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/llite/&lt;fsname&gt;-&lt;uid&gt;/max_cache_mb</emphasis></para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290512" xreflabel=""/># cat /proc/fs/lustre/llite/lustre-ce63ca00/max_cached_mb 128
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290513" xreflabel=""/>This tunable is the maximum amount of inactive data cached by the client (default is 3/4 of RAM).</para>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290515" xreflabel=""/>31.2.1 Client I/O RPC<anchor xml:id="dbdoclet.50438271_marker-1290514" xreflabel=""/> Stream Tunables</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290516" xreflabel=""/>The Lustre engine always attempts to pack an optimal amount of data into each I/O RPC and attempts to keep a consistent number of issued RPCs in progress at a time. Lustre exposes several tuning variables to adjust behavior according to network conditions and cluster size. Each OSC has its own tree of these tunables. For example:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290517" xreflabel=""/>$ ls -d /proc/fs/lustre/osc/OSC_client_ost1_MNT_client_2 /localhost
-<anchor xml:id="dbdoclet.50438271_pgfId-1290518" xreflabel=""/>/proc/fs/lustre/osc/OSC_uml0_ost1_MNT_localhost
-<anchor xml:id="dbdoclet.50438271_pgfId-1290519" xreflabel=""/>/proc/fs/lustre/osc/OSC_uml0_ost2_MNT_localhost
-<anchor xml:id="dbdoclet.50438271_pgfId-1290520" xreflabel=""/>/proc/fs/lustre/osc/OSC_uml0_ost3_MNT_localhost
-<anchor xml:id="dbdoclet.50438271_pgfId-1290521" xreflabel=""/>$ ls /proc/fs/lustre/osc/OSC_uml0_ost1_MNT_localhost
-<anchor xml:id="dbdoclet.50438271_pgfId-1290522" xreflabel=""/>blocksizefilesfree max_dirty_mb ost_server_uuid stats
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290523" xreflabel=""/>... and so on.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290524" xreflabel=""/>RPC stream tunables are described below.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1291134" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/osc/&lt;object name&gt;/max_dirty_mb</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1291135" xreflabel=""/>This tunable controls how many MBs of dirty data can be written and queued up in the OSC. POSIX file writes that are cached contribute to this count. When the limit is reached, additional writes stall until previously-cached writes are written to the server. This may be changed by writing a single ASCII integer to the file. Only values between 0 and 512 are allowable. If 0 is given, no writes are cached. Performance suffers noticeably unless you use large writes (1 MB or more).</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290527" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/osc/&lt;object name&gt;/cur_dirty_bytes</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290528" xreflabel=""/>This tunable is a read-only value that returns the current amount of bytes written and cached on this OSC.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290529" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/osc/&lt;object name&gt;/max_pages_per_rpc</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290530" xreflabel=""/>This tunable is the maximum number of pages that will undergo I/O in a single RPC to the OST. The minimum is a single page and the maximum for this setting is platform dependent (256 for i386/x86_64, possibly less for ia64/PPC with larger PAGE_SIZE), though generally amounts to a total of 1 MB in the RPC.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290531" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/osc/&lt;object name&gt;/max_rpcs_in_flight</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1296189" xreflabel=""/>This tunable is the maximum number of concurrent RPCs in flight from an OSC to its OST. If the OSC tries to initiate an RPC but finds that it already has the same number of RPCs outstanding, it will wait to issue further RPCs until some complete. The minimum setting is 1 and maximum setting is 32. If you are looking to improve small file I/O performance, increase the max_rpcs_in_flight value.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290533" xreflabel=""/>To maximize performance, the value for max_dirty_mb is recommended to be 4 * max_pages_per_rpc * max_rpcs_in_flight.</para>
-                <note><para>The &lt;object name&gt; varies depending on the specific Lustre configuration. For &lt;object name&gt; examples, refer to the sample command output.</para></note>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290536" xreflabel=""/>31.2.2 Watching the <anchor xml:id="dbdoclet.50438271_marker-1290535" xreflabel=""/>Client RPC Stream</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290537" xreflabel=""/>The same directory contains a rpc_stats file with a histogram showing the composition of previous RPCs. The histogram can be cleared by writing any value into the rpc_stats file.</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290538" xreflabel=""/># cat /proc/fs/lustre/osc/spfs-OST0000-osc-c45f9c00/rpc_stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290539" xreflabel=""/>snapshot_time:                                     1174867307.156604 (secs.\
-usecs)
-<anchor xml:id="dbdoclet.50438271_pgfId-1290540" xreflabel=""/>read RPCs in flight:                               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290541" xreflabel=""/>write RPCs in flight:                              0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290542" xreflabel=""/>pending write pages:                               0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290543" xreflabel=""/>pending read pages:                                0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290544" xreflabel=""/>                   read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290545" xreflabel=""/>pages per rpc              rpcs    %       cum     %       |       rpcs    \
-%       cum     %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290546" xreflabel=""/>1:                 0       0       0               |       0               \
-0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290547" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290548" xreflabel=""/>                   read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290549" xreflabel=""/>rpcs in flight             rpcs    %       cum     %       |       rpcs    \
-%       cum     %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290550" xreflabel=""/>0:                 0       0       0               |       0               \
-0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290551" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290552" xreflabel=""/>                   read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290553" xreflabel=""/>offset                     rpcs    %       cum     %       |       rpcs    \
-%       cum     %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290554" xreflabel=""/>0:                 0       0       0               |       0               \
-0       0
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1298886" xreflabel=""/>Where:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1298854" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1298856" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298858" xreflabel=""/><emphasis role="bold">{read,write} RPCs in flight</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298860" xreflabel=""/>Number of read/write RPCs issued by the OSC, but not complete at the time of the snapshot. This value should always be less than or equal to max_rpcs_in_flight.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298862" xreflabel=""/><emphasis role="bold">pending {read,write} pages</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298911" xreflabel=""/>Number of pending read/write pages that have been queued for I/O in the OSC.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298866" xreflabel=""/><emphasis role="bold">pages per RPC</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298924" xreflabel=""/>When an RPC is sent, the number of pages it consists of is recorded (in order). A single page RPC increments the 0: row.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298871" xreflabel=""/><emphasis role="bold">RPCs in flight</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298873" xreflabel=""/>When an RPC is sent, the number of other RPCs that are pending is recorded. When the first RPC is sent, the 0: row is incremented. If the first RPC is sent while another is pending, the 1: row is incremented and so on. As each RPC *completes*, the number of pending RPCs is not tabulated.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1298973" xreflabel=""/>This table is a good way to visualize the concurrency of the RPC stream. Ideally, you will see a large clump around the max_rpcs_in_flight value, which shows that the network is being kept busy.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1299025" xreflabel=""/><emphasis role="bold">offset</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1299027" xreflabel=""/> </para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290565" xreflabel=""/>31.2.3 Client Read-Write <anchor xml:id="dbdoclet.50438271_marker-1290564" xreflabel=""/>Offset Survey</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290566" xreflabel=""/>The offset_stats parameter maintains statistics for occurrences where a series of read or write calls from a process did not access the next sequential location. The offset field is reset to 0 (zero) whenever a different file is read/written.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1293887" xreflabel=""/>Read/write offset statistics are off, by default. The statistics can be activated by writing anything into the offset_stats file.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290567" xreflabel=""/>Example:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290568" xreflabel=""/># cat /proc/fs/lustre/llite/lustre-f57dee00/rw_offset_stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290569" xreflabel=""/>snapshot_time: 1155748884.591028 (secs.usecs)
-<anchor xml:id="dbdoclet.50438271_pgfId-1290570" xreflabel=""/>R/W                PID             RANGE START             RANGE END       \
-        SMALLEST EXTENT         LARGEST EXTENT                          OFF\
-SET
-<anchor xml:id="dbdoclet.50438271_pgfId-1290571" xreflabel=""/>R          8385            0                       128                     \
-128                     128                             0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290572" xreflabel=""/>R          8385            0                       224                     \
-224                     224                             -128
-<anchor xml:id="dbdoclet.50438271_pgfId-1290573" xreflabel=""/>W          8385            0                       250                     \
-50                      100                             0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290574" xreflabel=""/>W          8385            100                     1110                    \
-10                      500                             -150
-<anchor xml:id="dbdoclet.50438271_pgfId-1290575" xreflabel=""/>W          8384            0                       5233                    \
-5233                    5233                            0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290576" xreflabel=""/>R          8385            500                     600                     \
-100                     100                             -610
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290611" xreflabel=""/>Where:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1291997" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1291999" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292045" xreflabel=""/><emphasis role="bold">R/W</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292047" xreflabel=""/>Whether the non-sequential call was a read or write</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292049" xreflabel=""/><emphasis role="bold">PID</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292051" xreflabel=""/>Process ID which made the read/write call.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292053" xreflabel=""/><emphasis role="bold">Range Start/Range End</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292055" xreflabel=""/>Range in which the read/write calls were sequential.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292056" xreflabel=""/> </para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292058" xreflabel=""/><emphasis role="bold">Smallest Extent</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292060" xreflabel=""/>Smallest extent (single read/write) in the corresponding range.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292062" xreflabel=""/><emphasis role="bold">Largest Extent</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292064" xreflabel=""/>Largest extent (single read/write) in the corresponding range.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292066" xreflabel=""/><emphasis role="bold">Offset</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292068" xreflabel=""/>Difference from the previous range end to the current range start.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292069" xreflabel=""/>For example, Smallest-Extent indicates that the writes in the range 100 to 1110 were sequential, with a minimum write of 10 and a maximum write of 500. This range was started with an offset of -150. That means this is the difference between the last entry's range-end and this entry's range-start for the same file.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292070" xreflabel=""/>The rw_offset_stats file can be cleared by writing to it:</para><screen><anchor xml:id="dbdoclet.50438271_pgfId-1292071" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1292072" xreflabel=""/>echo &gt; /proc/fs/lustre/llite/lustre-f57dee00/rw_offset_stats
-</screen></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290613" xreflabel=""/>31.2.4 Client Read-Write <anchor xml:id="dbdoclet.50438271_marker-1290612" xreflabel=""/>Extents Survey</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290614" xreflabel=""/><emphasis role="bold">Client-Based I/O Extent Size Survey</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290615" xreflabel=""/>The rw_extent_stats histogram in the llite directory shows you the statistics for the sizes of the read-write I/O extents. This file does not maintain the per-process statistics.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290616" xreflabel=""/>Example:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290617" xreflabel=""/>$ cat /proc/fs/lustre/llite/lustre-ee5af200/extents_stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1293904" xreflabel=""/>snapshot_time:                     1213828728.348516 (secs.usecs)
-<anchor xml:id="dbdoclet.50438271_pgfId-1294259" xreflabel=""/>                           read            |               write
-<anchor xml:id="dbdoclet.50438271_pgfId-1294260" xreflabel=""/>extents                    calls   %       cum%    |       calls   %       \
-cum%
-<anchor xml:id="dbdoclet.50438271_pgfId-1290621" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1294257" xreflabel=""/>0K - 4K :          0       0       0       |       2       2       2
-<anchor xml:id="dbdoclet.50438271_pgfId-1293950" xreflabel=""/>4K - 8K :          0       0       0       |       0       0       2
-<anchor xml:id="dbdoclet.50438271_pgfId-1293918" xreflabel=""/>8K - 16K :         0       0       0       |       0       0       2
-<anchor xml:id="dbdoclet.50438271_pgfId-1293922" xreflabel=""/>16K - 32K :                0       0       0       |       20      23      \
-26
-<anchor xml:id="dbdoclet.50438271_pgfId-1293926" xreflabel=""/>32K - 64K :                0       0       0       |       0       0       \
-26
-<anchor xml:id="dbdoclet.50438271_pgfId-1293930" xreflabel=""/>64K - 128K :               0       0       0       |       51      60      \
-86
-<anchor xml:id="dbdoclet.50438271_pgfId-1293934" xreflabel=""/>128K - 256K :              0       0       0       |       0       0       \
-86
-<anchor xml:id="dbdoclet.50438271_pgfId-1293938" xreflabel=""/>256K - 512K :              0       0       0       |       0       0       \
-86
-<anchor xml:id="dbdoclet.50438271_pgfId-1293942" xreflabel=""/>512K - 1024K :             0       0       0       |       0       0       \
-86
-<anchor xml:id="dbdoclet.50438271_pgfId-1293946" xreflabel=""/>1M - 2M :          0       0       0       |       11      13      100
-<anchor xml:id="dbdoclet.50438271_pgfId-1293908" xreflabel=""/> 
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290622" xreflabel=""/>The file can be cleared by issuing the following command:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290623" xreflabel=""/>$ echo &gt; cat /proc/fs/lustre/llite/lustre-ee5af200/extents_stats
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290624" xreflabel=""/><emphasis role="bold">Per-Process Client I/O Statistics</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290625" xreflabel=""/>The extents_stats_per_process file maintains the I/O extent size statistics on a per-process basis. So you can track the per-process statistics for the last MAX_PER_PROCESS_HIST processes.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1293971" xreflabel=""/>Example:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1293972" xreflabel=""/>$ cat /proc/fs/lustre/llite/lustre-ee5af200/extents_stats_per_process
-<anchor xml:id="dbdoclet.50438271_pgfId-1293973" xreflabel=""/>snapshot_time:                     1213828762.204440 (secs.usecs)
-<anchor xml:id="dbdoclet.50438271_pgfId-1293974" xreflabel=""/>                           read            |               write
-<anchor xml:id="dbdoclet.50438271_pgfId-1293975" xreflabel=""/>extents                    calls   %       cum%    |       calls   %       \
-cum%
-<anchor xml:id="dbdoclet.50438271_pgfId-1293976" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1293998" xreflabel=""/>PID: 11488
-<anchor xml:id="dbdoclet.50438271_pgfId-1293999" xreflabel=""/>   0K - 4K :       0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293977" xreflabel=""/>   4K - 8K :       0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293978" xreflabel=""/>   8K - 16K :      0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293979" xreflabel=""/>   16K - 32K :     0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293980" xreflabel=""/>   32K - 64K :     0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293981" xreflabel=""/>   64K - 128K :    0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293982" xreflabel=""/>   128K - 256K :   0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1293983" xreflabel=""/>   256K - 512K :   0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294019" xreflabel=""/>   512K - 1024K :  0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294017" xreflabel=""/>   1M - 2M :       0       0        0      |       10      100     100
-<anchor xml:id="dbdoclet.50438271_pgfId-1294028" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1294031" xreflabel=""/>PID: 11491
-<anchor xml:id="dbdoclet.50438271_pgfId-1294032" xreflabel=""/>   0K - 4K :       0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294033" xreflabel=""/>   4K - 8K :       0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294034" xreflabel=""/>   8K - 16K :      0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294035" xreflabel=""/>   16K - 32K :     0       0        0      |       20      100     100
-<anchor xml:id="dbdoclet.50438271_pgfId-1294036" xreflabel=""/>   
-<anchor xml:id="dbdoclet.50438271_pgfId-1294052" xreflabel=""/>PID: 11424
-<anchor xml:id="dbdoclet.50438271_pgfId-1294053" xreflabel=""/>   0K - 4K :       0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294054" xreflabel=""/>   4K - 8K :       0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294055" xreflabel=""/>   8K - 16K :      0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294056" xreflabel=""/>   16K - 32K :     0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294070" xreflabel=""/>   32K - 64K :     0       0        0      |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1294086" xreflabel=""/>   64K - 128K :    0       0        0      |       16      100     100
-<anchor xml:id="dbdoclet.50438271_pgfId-1294087" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1294060" xreflabel=""/>PID: 11426
-<anchor xml:id="dbdoclet.50438271_pgfId-1294061" xreflabel=""/>   0K - 4K :       0       0        0      |       1       100     100
-<anchor xml:id="dbdoclet.50438271_pgfId-1294096" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1294099" xreflabel=""/>PID: 11429
-<anchor xml:id="dbdoclet.50438271_pgfId-1294100" xreflabel=""/>   0K - 4K :       0       0        0      |       1       100     100
-<anchor xml:id="dbdoclet.50438271_pgfId-1294097" xreflabel=""/> 
+                           read      |      write
+I/O time (1/1000s)     ios   % cum % |   ios   % cum %
+1:                      94  87  87   |     0   0   0
+2:                       0   0  87   |     7   0   0
+4:                      14  12 100   |    27   0   0
+8:                       0   0 100   |    14   0   0
+16:                      0   0 100   |    31   0   0
+32:                      0   0 100   |    38   0   0
+64:                      0   0 100   | 18979  81  82
+128:                     0   0 100   |   943   4  86
+256:                     0   0 100   |  1233   5  91
+512:                     0   0 100   |  1825   7  99
+1K:                      0   0 100   |   99   0  99
+2K:                      0   0 100   |     0   0  99
+4K:                      0   0 100   |     0   0  99
+8K:                      0   0 100   |    49   0 100
+
+                           read      |      write
+disk I/O size          ios   % cum % |   ios   % cum %
+4K:                     14 100 100   |    41   0   0
+8K:                      0   0 100   |     6   0   0
+16K:                     0   0 100   |     1   0   0
+32K:                     0   0 100   |     0   0   0
+64K:                     0   0 100   |     4   0   0
+128K:                    0   0 100   |    17   0   0
+256K:                    0   0 100   |    12   0   0
+512K:                    0   0 100   |    24   0   0
+1M:                      0   0 100   | 23142  99 100
 </screen>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290632" xreflabel=""/>31.2.5 <anchor xml:id="dbdoclet.50438271_55057" xreflabel=""/> Watching the <anchor xml:id="dbdoclet.50438271_marker-1290631" xreflabel=""/>OST Block I/O Stream</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290633" xreflabel=""/>Similarly, there is a brw_stats histogram in the obdfilter directory which shows you the statistics for number of I/O requests sent to the disk, their size and whether they are contiguous on the disk or not.</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290634" xreflabel=""/>cat /proc/fs/lustre/obdfilter/lustre-OST0000/brw_stats 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290635" xreflabel=""/>snapshot_time:                     1174875636.764630 (secs:usecs)
-<anchor xml:id="dbdoclet.50438271_pgfId-1290636" xreflabel=""/>                           read                            write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290637" xreflabel=""/>pages per brw              brws    %       cum %   |       rpcs    %       \
-cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290638" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290639" xreflabel=""/>                           read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290640" xreflabel=""/>discont pages              rpcs    %       cum %   |       rpcs    %       \
-cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290641" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290642" xreflabel=""/>                           read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290643" xreflabel=""/>discont blocks             rpcs    %       cum %   |       rpcs    %       \
-cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290644" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290645" xreflabel=""/>                           read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290646" xreflabel=""/>dio frags          rpcs    %       cum %   |       rpcs    %       cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290647" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290648" xreflabel=""/>                           read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290649" xreflabel=""/>disk ios in flight rpcs    %       cum %   |       rpcs    %       cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290650" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290651" xreflabel=""/>                           read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290652" xreflabel=""/>io time (1/1000s)  rpcs    %       cum %   |       rpcs    %       cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290653" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290654" xreflabel=""/>                           read                                    write
-<anchor xml:id="dbdoclet.50438271_pgfId-1290655" xreflabel=""/>disk io size               rpcs    %       cum %   |       rpcs    %       \
-cum %
-<anchor xml:id="dbdoclet.50438271_pgfId-1290656" xreflabel=""/>1:                 0       0       0       |       0       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290657" xreflabel=""/>                           read                                    write
+      <para>The tabular data is described in the table below. Each row in the
+        table shows the number of reads and writes occurring for the statistic
+        (<literal>ios</literal>), the relative percentage of total reads or
+        writes (<literal>%</literal>), and the cumulative percentage to that
+        point in the table for the statistic (<literal>cum %</literal>). </para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="40*"/>
+          <colspec colname="c2" colwidth="60*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Field</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para>
+                  <literal>pages per bulk r/w</literal></para>
+              </entry>
+              <entry>
+                <para>Number of pages per RPC request, which should match aggregate client
+                    <literal>rpc_stats</literal> (see <xref
+                    xmlns:xlink="http://www.w3.org/1999/xlink" linkend="MonitoringClientRCPStream"
+                  />).</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>discontiguous pages</literal></para>
+              </entry>
+              <entry>
+                <para>Number of discontinuities in the logical file offset of each page in a single
+                  RPC.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>discontiguous blocks</literal></para>
+              </entry>
+              <entry>
+                <para>Number of discontinuities in the physical block allocation in the file system
+                  for a single RPC.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para><literal>disk fragmented I/Os</literal></para>
+              </entry>
+              <entry>
+                <para>Number of I/Os that were not written entirely sequentially.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para><literal>disk I/Os in flight</literal></para>
+              </entry>
+              <entry>
+                <para>Number of disk I/Os currently pending.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para><literal>I/O time (1/1000s)</literal></para>
+              </entry>
+              <entry>
+                <para>Amount of time for each I/O operation to complete.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para><literal>disk I/O size</literal></para>
+              </entry>
+              <entry>
+                <para>Size of each I/O operation.</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <para><emphasis role="italic"><emphasis role="bold">Analysis:</emphasis></emphasis></para>
+      <para>This data provides an indication of extent size and distribution in the file
+        system.</para>
+    </section>
+  </section>
+  <section>
+    <title>Tuning Lustre File System I/O</title>
+    <para>Each OSC has its own tree of tunables. For example:</para>
+    <screen>$ lctl lctl list_param osc.*.*
+osc.myth-OST0000-osc-ffff8804296c2800.active
+osc.myth-OST0000-osc-ffff8804296c2800.blocksize
+osc.myth-OST0000-osc-ffff8804296c2800.checksum_dump
+osc.myth-OST0000-osc-ffff8804296c2800.checksum_type
+osc.myth-OST0000-osc-ffff8804296c2800.checksums
+osc.myth-OST0000-osc-ffff8804296c2800.connect_flags
+:
+:
+osc.myth-OST0000-osc-ffff8804296c2800.state
+osc.myth-OST0000-osc-ffff8804296c2800.stats
+osc.myth-OST0000-osc-ffff8804296c2800.timeouts
+osc.myth-OST0000-osc-ffff8804296c2800.unstable_stats
+osc.myth-OST0000-osc-ffff8804296c2800.uuid
+osc.myth-OST0001-osc-ffff8804296c2800.active
+osc.myth-OST0001-osc-ffff8804296c2800.blocksize
+osc.myth-OST0001-osc-ffff8804296c2800.checksum_dump
+osc.myth-OST0001-osc-ffff8804296c2800.checksum_type
+:
+:
 </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294291" xreflabel=""/>The fields are explained below:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1294276" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1294278" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294280" xreflabel=""/><emphasis role="bold">pages per brw</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294282" xreflabel=""/>Number of pages per RPC request, which should match aggregate client rpc_stats.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294284" xreflabel=""/><emphasis role="bold">discont pages</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294286" xreflabel=""/>Number of discontinuities in the logical file offset of each page in a single RPC.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294288" xreflabel=""/><emphasis role="bold">discont blocks</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1294290" xreflabel=""/>Number of discontinuities in the physical block allocation in the file system for a single RPC.</para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299070" xreflabel=""/>For each Lustre service, the following information is provided:</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299055" xreflabel=""/> Number of requests</para>
+    <para>The following sections describe some of the parameters that can
+      be tuned in a Lustre file system.</para>
+    <section remap="h3" xml:id="TuningClientIORPCStream">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>RPC tunables</secondary>
+        </indexterm>Tuning the Client I/O RPC Stream</title>
+      <para>Ideally, an optimal amount of data is packed into each I/O RPC
+        and a consistent number of issued RPCs are in progress at any time.
+        To help optimize the client I/O RPC stream, several tuning variables
+        are provided to adjust behavior according to network conditions and
+        cluster size. For information about monitoring the client I/O RPC
+        stream, see <xref
+          xmlns:xlink="http://www.w3.org/1999/xlink" linkend="MonitoringClientRCPStream"/>.</para>
+      <para>RPC stream tunables include:</para>
+      <para>
+        <itemizedlist>
+          <listitem>
+            <para><literal>osc.<replaceable>osc_instance</replaceable>.checksums</literal>
+              - Controls whether the client will calculate data integrity
+              checksums for the bulk data transferred to the OST.  Data
+              integrity checksums are enabled by default.  The algorithm used
+              can be set using the <literal>checksum_type</literal> parameter.
+            </para>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299083" xreflabel=""/> Request wait time (avg, min, max and std dev)</para>
+          <listitem>
+            <para><literal>osc.<replaceable>osc_instance</replaceable>.checksum_type</literal>
+              - Controls the data integrity checksum algorithm used by the
+              client.  The available algorithms are determined by the set of
+              algorihtms.  The checksum algorithm used by default is determined
+              by first selecting the fastest algorithms available on the OST,
+              and then selecting the fastest of those algorithms on the client,
+              which depends on available optimizations in the CPU hardware and
+              kernel.  The default algorithm can be overridden by writing the
+              algorithm name into the <literal>checksum_type</literal>
+              parameter.  Available checksum types can be seen on the client by
+              reading the <literal>checksum_type</literal> parameter. Currently
+              supported checksum types are:
+              <literal>adler</literal>,
+              <literal>crc32</literal>,
+              <literal>crc32c</literal>
+            </para>
+            <para condition="l2C">
+              In Lustre release 2.12 additional checksum types were added to
+              allow end-to-end checksum integration with T10-PI capable
+              hardware.  The client will compute the appropriate checksum
+              type, based on the checksum type used by the storage, for the
+              RPC checksum, which will be verified by the server and passed
+              on to the storage.  The T10-PI checksum types are:
+              <literal>t10ip512</literal>,
+              <literal>t10ip4K</literal>,
+              <literal>t10crc512</literal>,
+              <literal>t10crc4K</literal>
+            </para>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299086" xreflabel=""/> Service idle time (% of elapsed time)</para>
+          <listitem>
+            <para><literal>osc.<replaceable>osc_instance</replaceable>.max_dirty_mb</literal>
+              - Controls how many MiB of dirty data can be written into the
+              client pagecache for writes by <emphasis>each</emphasis> OSC.
+              When this limit is reached, additional writes block until
+              previously-cached data is written to the server. This may be
+              changed by the <literal>lctl set_param</literal> command. Only
+              values larger than 0 and smaller than the lesser of 2048 MiB or
+              1/4 of client RAM are valid. Performance can suffers if the
+              client cannot aggregate enough data per OSC to form a full RPC
+              (as set by the <literal>max_pages_per_rpc</literal>) parameter,
+              unless the application is doing very large writes itself.
+            </para>
+            <para>To maximize performance, the value for
+              <literal>max_dirty_mb</literal> is recommended to be at least
+              4 * <literal>max_pages_per_rpc</literal> *
+              <literal>max_rpcs_in_flight</literal>.
+            </para>
           </listitem>
-
-</itemizedlist>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299058" xreflabel=""/>Additionally, data on each Lustre service is provided by service type:</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299059" xreflabel=""/> Number of requests of this type</para>
+          <listitem>
+            <para><literal>osc.<replaceable>osc_instance</replaceable>.cur_dirty_bytes</literal>
+              - A read-only value that returns the current number of bytes
+              written and cached by this OSC.
+            </para>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299092" xreflabel=""/> Request service time (avg, min, max and std dev)</para>
+          <listitem>
+            <para><literal>osc.<replaceable>osc_instance</replaceable>.max_pages_per_rpc</literal>
+              - The maximum number of pages that will be sent in a single RPC
+              request to the OST. The minimum value is one page and the maximum
+              value is 16 MiB (4096 on systems with <literal>PAGE_SIZE</literal>
+              of 4 KiB), with the default value of 4 MiB in one RPC.  The upper
+              limit may also be constrained by <literal>ofd.*.brw_size</literal>
+              setting on the OSS, and applies to all clients connected to that
+              OST.  It is also possible to specify a units suffix (e.g.
+              <literal>max_pages_per_rpc=4M</literal>), so the RPC size can be
+              set independently of the client <literal>PAGE_SIZE</literal>.
+            </para>
           </listitem>
-
-</itemizedlist>
+          <listitem>
+            <para><literal>osc.<replaceable>osc_instance</replaceable>.max_rpcs_in_flight</literal>
+              - The maximum number of concurrent RPCs in flight from an OSC to
+              its OST. If the OSC tries to initiate an RPC but finds that it
+              already has the same number of RPCs outstanding, it will wait to
+              issue further RPCs until some complete. The minimum setting is 1
+              and maximum setting is 256. The default value is 8 RPCs.
+            </para>
+            <para>To improve small file I/O performance, increase the
+              <literal>max_rpcs_in_flight</literal> value.
+            </para>
+          </listitem>
+          <listitem>
+            <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_cached_mb</literal>
+              - Maximum amount of read+write data cached by the client.  The
+              default value is 1/2 of the client RAM.
+            </para>
+          </listitem>
+        </itemizedlist>
+      </para>
+      <note>
+        <para>The value for <literal><replaceable>osc_instance</replaceable></literal>
+          and <literal><replaceable>fsname_instance</replaceable></literal>
+          are unique to each mount point to allow associating osc, mdc, lov,
+          lmv, and llite parameters with the same mount point.  However, it is
+          common for scripts to use a wildcard <literal>*</literal> or a
+          filesystem-specific wildcard
+          <literal><replaceable>fsname-*</replaceable></literal> to specify
+          the parameter settings uniformly on all clients. For example:
+<screen>
+client$ lctl get_param osc.testfs-OST0000*.rpc_stats
+osc.testfs-OST0000-osc-ffff88107412f400.rpc_stats=
+snapshot_time:         1375743284.337839 (secs.usecs)
+read RPCs in flight:  0
+write RPCs in flight: 0
+</screen></para>
+      </note>
+    </section>
+    <section remap="h3" xml:id="TuningClientReadahead">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>readahead</secondary>
+        </indexterm>Tuning File Readahead and Directory Statahead</title>
+      <para>File readahead and directory statahead enable reading of data
+      into memory before a process requests the data. File readahead prefetches
+      file content data into memory for <literal>read()</literal> related
+      calls, while directory statahead fetches file metadata into memory for
+      <literal>readdir()</literal> and <literal>stat()</literal> related
+      calls.  When readahead and statahead work well, a process that accesses
+      data finds that the information it needs is available immediately in
+      memory on the client when requested without the delay of network I/O.
+      </para>
+      <section remap="h4">
+        <title>Tuning File Readahead</title>
+        <para>File readahead is triggered when two or more sequential reads
+          by an application fail to be satisfied by data in the Linux buffer
+          cache. The size of the initial readahead is determined by the RPC
+          size and the file stripe size, but will typically be at least 1 MiB.
+          Additional readaheads grow linearly and increment until the per-file
+          or per-system readahead cache limit on the client is reached.</para>
+        <para>Readahead tunables include:</para>
+        <itemizedlist>
+          <listitem>
+            <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_mb</literal>
+              - Controls the maximum amount of data readahead on a file.
+              Files are read ahead in RPC-sized chunks (4 MiB, or the size of
+              the <literal>read()</literal> call, if larger) after the second
+              sequential read on a file descriptor. Random reads are done at
+              the size of the <literal>read()</literal> call only (no
+              readahead). Reads to non-contiguous regions of the file reset
+              the readahead algorithm, and readahead is not triggered until
+              sequential reads take place again.
+            </para>
+            <para>
+              This is the global limit for all files and cannot be larger than
+              1/2 of the client RAM.  To disable readahead, set
+              <literal>max_read_ahead_mb=0</literal>.
+            </para>
+          </listitem>
+          <listitem>
+            <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_per_file_mb</literal>
+              - Controls the maximum number of megabytes (MiB) of data that
+              should be prefetched by the client when sequential reads are
+              detected on a file.  This is the per-file readahead limit and
+              cannot be larger than <literal>max_read_ahead_mb</literal>.
+            </para>
+          </listitem>
+          <listitem>
+            <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_whole_mb</literal>
+              - Controls the maximum size of a file in MiB that is read in its
+              entirety upon access, regardless of the size of the
+              <literal>read()</literal> call.  This avoids multiple small read
+              RPCs on relatively small files, when it is not possible to
+              efficiently detect a sequential read pattern before the whole
+              file has been read.
+            </para>
+            <para>The default value is the greater of 2 MiB or the size of one
+              RPC, as given by <literal>max_pages_per_rpc</literal>.
+            </para>
+          </listitem>
+        </itemizedlist>
       </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1294293" xreflabel=""/>31.2.6 Using File <anchor xml:id="dbdoclet.50438271_marker-1294292" xreflabel=""/>Readahead and Directory Statahead</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1295106" xreflabel=""/>Lustre 1.6.5.1 introduced file readahead and directory statahead functionality that read data into memory in anticipation of a process actually requesting the data. File readahead functionality reads file content data into memory. Directory statahead functionality reads metadata into memory. When readahead and/or statahead work well, a data-consuming process finds that the information it needs is available when requested, and it is unnecessary to wait for network I/O.</para>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1295107" xreflabel=""/>31.2.6.1 Tuning <anchor xml:id="dbdoclet.50438271_marker-1295183" xreflabel=""/>File Readahead</title>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1290680" xreflabel=""/>File readahead is triggered when two or more sequential reads by an application fail to be satisfied by the Linux buffer cache. The size of the initial readahead is 1 MB. Additional readaheads grow linearly, and increment until the readahead cache on the client is full at 40 MB.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1290681" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/llite/&lt;fsname&gt;-&lt;uid&gt;/max_read_ahead_mb</emphasis></para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1290682" xreflabel=""/>This tunable controls the maximum amount of data readahead on a file. Files are read ahead in RPC-sized chunks (1 MB or the size of read() call, if larger) after the second sequential read on a file descriptor. Random reads are done at the size of the read() call only (no readahead). Reads to non-contiguous regions of the file reset the readahead algorithm, and readahead is not triggered again until there are sequential reads again. To disable readahead, set this tunable to 0. The default value is 40 MB.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1290683" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/llite/&lt;fsname&gt;-&lt;uid&gt;/max_read_ahead_whole_mb</emphasis></para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1290684" xreflabel=""/>This tunable controls the maximum size of a file that is read in its entirety, regardless of the size of the read().</para>
-        </section>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1295046" xreflabel=""/>31.2.6.2 Tuning Directory <anchor xml:id="dbdoclet.50438271_marker-1295184" xreflabel=""/>Statahead</title>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295049" xreflabel=""/>When the ls -l process opens a directory, its process ID is recorded. When the first directory entry is &apos;&apos;stated&apos;&apos; with this recorded process ID, a statahead thread is triggered which stats ahead all of the directory entries, in order. The ls -l process can use the stated directory entries directly, improving performance.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295060" xreflabel=""/>/proc/fs/lustre/llite/*/statahead_max</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295152" xreflabel=""/>This tunable controls whether directory statahead is enabled and the maximum statahead count. By default, statahead is active.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295168" xreflabel=""/>To disable statahead, set this tunable to:</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295061" xreflabel=""/>echo 0 &gt; /proc/fs/lustre/llite/*/statahead_max</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295063" xreflabel=""/>To set the maximum statahead count (n), set this tunable to:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1295280" xreflabel=""/>echo n &gt; /proc/fs/lustre/llite/*/statahead_max
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295281" xreflabel=""/>The maximum value of n is 8192.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295282" xreflabel=""/>/proc/fs/lustre/llite/*/statahead_status</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295067" xreflabel=""/>This is a read-only interface that indicates the current statahead status.</para>
-        </section>
+      <section>
+        <title>Tuning Directory Statahead and AGL</title>
+        <para>Many system commands, such as <literal>ls –l</literal>,
+        <literal>du</literal>, and <literal>find</literal>, traverse a
+        directory sequentially. To make these commands run efficiently, the
+        directory statahead can be enabled to improve the performance of
+        directory traversal.</para>
+        <para>The statahead tunables are:</para>
+        <itemizedlist>
+          <listitem>
+            <para><literal>statahead_max</literal> -
+            Controls the maximum number of file attributes that will be
+            prefetched by the statahead thread. By default, statahead is
+            enabled and <literal>statahead_max</literal> is 32 files.</para>
+            <para>To disable statahead, set <literal>statahead_max</literal>
+            to zero via the following command on the client:</para>
+            <screen>lctl set_param llite.*.statahead_max=0</screen>
+            <para>To change the maximum statahead window size on a client:</para>
+            <screen>lctl set_param llite.*.statahead_max=<replaceable>n</replaceable></screen>
+            <para>The maximum <literal>statahead_max</literal> is 8192 files.
+            </para>
+            <para>The directory statahead thread will also prefetch the file
+            size/block attributes from the OSTs, so that all file attributes
+            are available on the client when requested by an application.
+            This is controlled by the asynchronous glimpse lock (AGL) setting.
+            The AGL behaviour can be disabled by setting:</para>
+            <screen>lctl set_param llite.*.statahead_agl=0</screen>
+          </listitem>
+          <listitem>
+            <para><literal>statahead_stats</literal> -
+            A read-only interface that provides current statahead and AGL
+            statistics, such as how many times statahead/AGL has been triggered
+            since the last mount, how many statahead/AGL failures have occurred
+            due to an incorrect prediction or other causes.</para>
+            <note>
+              <para>AGL behaviour is affected by statahead since the inodes
+              processed by AGL are built by the statahead thread.  If
+              statahead is disabled, then AGL is also disabled.</para>
+            </note>
+          </listitem>
+        </itemizedlist>
       </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290686" xreflabel=""/>31.2.7 OSS <anchor xml:id="dbdoclet.50438271_marker-1296183" xreflabel=""/>Read Cache</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1295320" xreflabel=""/>The OSS read cache feature provides read-only caching of data on an OSS. This functionality uses the regular Linux page cache to store the data. Just like caching from a regular filesytem in Linux, OSS read cache uses as much physical memory as is allocated.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1295462" xreflabel=""/>OSS read cache improves Lustre performance in these situations:</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1295517" xreflabel=""/> Many clients are accessing the same data set (as in HPC applications and when diskless clients boot from Lustre)</para>
+    </section>
+    <section remap="h3">
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>read cache</secondary>
+        </indexterm>Tuning Server Read Cache</title>
+      <para>The server read cache feature provides read-only caching of file
+        data on an OSS or MDS (for Data-on-MDT). This functionality uses the
+        Linux page cache to store the data and uses as much physical memory
+        as is allocated.</para>
+      <para>The server read cache can improves Lustre file system performance
+        in these situations:</para>
+      <itemizedlist>
+        <listitem>
+          <para>Many clients are accessing the same data set (as in HPC
+            applications or when diskless clients boot from the Lustre file
+            system).</para>
+        </listitem>
+        <listitem>
+          <para>One client is writing data while another client is reading
+            it (i.e., clients are exchanging data via the filesystem).</para>
+        </listitem>
+        <listitem>
+          <para>A client has very limited caching of its own.</para>
+        </listitem>
+      </itemizedlist>
+      <para>The server read cache offers these benefits:</para>
+      <itemizedlist>
+        <listitem>
+          <para>Allows servers to cache read data more frequently.</para>
+        </listitem>
+        <listitem>
+          <para>Improves repeated reads to match network speeds instead of
+             storage speeds.</para>
+        </listitem>
+        <listitem>
+          <para>Provides the building blocks for server write cache
+            (small-write aggregation).</para>
+        </listitem>
+      </itemizedlist>
+      <section remap="h4">
+        <title>Using Server Read Cache</title>
+        <para>The server read cache is implemented on the OSS and MDS, and does
+          not require any special support on the client side. Since the server
+          read cache uses the memory available in the Linux page cache, the
+          appropriate amount of memory for the cache should be determined based
+          on I/O patterns.  If the data is mostly reads, then more cache is
+          beneficial on the server than would be needed for mostly writes.
+        </para>
+        <para>The server read cache is managed using the following tunables.
+          Many tunables are available for both <literal>osd-ldiskfs</literal>
+          and <literal>osd-zfs</literal>, but in some cases the implementation
+          of <literal>osd-zfs</literal> prevents their use.</para>
+        <itemizedlist>
+          <listitem>
+            <para><literal>read_cache_enable</literal> - High-level control of
+              whether data read from storage during a read request is kept in
+              memory and available for later read requests for the same data,
+              without having to re-read it from storage. By default, read cache
+              is enabled (<literal>read_cache_enable=1</literal>) for HDD OSDs
+              and automatically disabled for flash OSDs
+              (<literal>nonrotational=1</literal>).
+              The read cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+              </para>
+            <para>When the server receives a read request from a client,
+              it reads data from storage into its memory and sends the data
+              to the client. If read cache is enabled for the target,
+              and the RPC and object size also meet the other criterion below,
+              this data may stay in memory after the client request has
+              completed.  If later read requests for the same data are received,
+              if the data is still in cache the server skips reading it from
+              storage. The cache is managed by the Linux kernel globally
+              across all targets on that server so that the infrequently used
+              cache pages are dropped from memory when the free memory is
+              running low.</para>
+            <para>If read cache is disabled
+              (<literal>read_cache_enable=0</literal>), or the read or object
+              is large enough that it will not benefit from caching, the server
+              discards the data after the read request from the client is
+              completed. For subsequent read requests the server again reads
+              the data from storage.</para>
+            <para>To disable read cache on all targets of a server, run:</para>
+            <screen>
+              oss1# lctl set_param osd-*.*.read_cache_enable=0
+            </screen>
+            <para>To re-enable read cache on one target, run:</para>
+            <screen>
+              oss1# lctl set_param osd-*.{target_name}.read_cache_enable=1
+            </screen>
+            <para>To check if read cache is enabled on targets on a server, run:
+            </para>
+            <screen>
+              oss1# lctl get_param osd-*.*.read_cache_enable
+            </screen>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1295521" xreflabel=""/>  One client is storing data while another client is reading it (essentially exchanging data via the OST)</para>
+          <listitem>
+            <para><literal>writethrough_cache_enable</literal> - High-level
+              control of whether data sent to the server as a write request is
+              kept in the read cache and available for later reads, or if it is
+              discarded when the write completes. By default, writethrough
+              cache is enabled (<literal>writethrough_cache_enable=1</literal>)
+              for HDD OSDs and automatically disabled for flash OSDs
+              (<literal>nonrotational=1</literal>).
+              The write cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+              </para>
+            <para>When the server receives write requests from a client, it
+              fetches data from the client into its memory and writes the data
+              to storage. If the writethrough cache is enabled for the target,
+              and the RPC and object size meet the other criterion below,
+              this data may stay in memory after the write request has
+              completed. If later read or partial-block write requests for this
+              same data are received, if the data is still in cache the server
+              skips reading it from storage.
+              </para>
+            <para>If the writethrough cache is disabled
+               (<literal>writethrough_cache_enabled=0</literal>), or the
+               write or object is large enough that it will not benefit from
+               caching, the server discards the data after the write request
+               from the client is completed. For subsequent read requests, or
+               partial-page write requests, the server must re-read the data
+               from storage.</para>
+            <para>Enabling writethrough cache is advisable if clients are doing
+              small or unaligned writes that would cause partial-page updates,
+              or if the files written by one node are immediately being read by
+              other nodes. Some examples where enabling writethrough cache
+              might be useful include producer-consumer I/O models or
+              shared-file writes that are not aligned on 4096-byte boundaries.
+            </para>
+            <para>Disabling the writethrough cache is advisable when files are
+              mostly written to the file system but are not re-read within a
+              short time period, or files are only written and re-read by the
+              same node, regardless of whether the I/O is aligned or not.</para>
+            <para>To disable writethrough cache on all targets on a server, run:
+            </para>
+            <screen>
+              oss1# lctl set_param osd-*.*.writethrough_cache_enable=0
+            </screen>
+            <para>To re-enable the writethrough cache on one OST, run:</para>
+            <screen>
+              oss1# lctl set_param osd-*.{OST_name}.writethrough_cache_enable=1
+            </screen>
+            <para>To check if the writethrough cache is enabled, run:</para>
+            <screen>
+              oss1# lctl get_param osd-*.*.writethrough_cache_enable
+            </screen>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1295528" xreflabel=""/> A client has very limited caching of its own</para>
+          <listitem>
+            <para><literal>readcache_max_filesize</literal> - Controls the
+              maximum size of an object that both the read cache and
+              writethrough cache will try to keep in memory. Objects larger
+              than <literal>readcache_max_filesize</literal> will not be kept
+              in cache for either reads or writes regardless of the
+              <literal>read_cache_enable</literal> or
+              <literal>writethrough_cache_enable</literal> settings.</para>
+            <para>Setting this tunable can be useful for workloads where
+              relatively small objects are repeatedly accessed by many clients,
+              such as job startup objects, executables, log objects, etc., but
+              large objects are read or written only once. By not putting the
+              larger objects into the cache, it is much more likely that more
+              of the smaller objects will remain in cache for a longer time.
+            </para>
+            <para>When setting <literal>readcache_max_filesize</literal>,
+              the input value can be specified in bytes, or can have a suffix
+              to indicate other binary units such as
+                <literal>K</literal> (kibibytes),
+                <literal>M</literal> (mebibytes),
+                <literal>G</literal> (gibibytes),
+                <literal>T</literal> (tebibytes), or
+                <literal>P</literal> (pebibytes).</para>
+            <para>
+              To limit the maximum cached object size to 64 MiB on all OSTs of
+              a server, run:
+            </para>
+            <screen>
+              oss1# lctl set_param osd-*.*.readcache_max_filesize=64M
+            </screen>
+            <para>To disable the maximum cached object size on all targets, run:
+            </para>
+            <screen>
+              oss1# lctl set_param osd-*.*.readcache_max_filesize=-1
+            </screen>
+            <para>
+              To check the current maximum cached object size on all targets of
+              a server, run:
+            </para>
+            <screen>
+              oss1# lctl get_param osd-*.*.readcache_max_filesize
+            </screen>
           </listitem>
-
-</itemizedlist>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1295515" xreflabel=""/>OSS read cache offers these benefits:</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1295469" xreflabel=""/> Allows OSTs to cache read data more frequently</para>
+          <listitem>
+            <para><literal>readcache_max_io_mb</literal> - Controls the maximum
+              size of a single read IO that will be cached in memory. Reads
+              larger than <literal>readcache_max_io_mb</literal> will be read
+              directly from storage and bypass the page cache completely.
+              This avoids significant CPU overhead at high IO rates.
+              The read cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+            </para>
+            <para>When setting <literal>readcache_max_io_mb</literal>, the
+              input value can be specified in mebibytes, or can have a suffix
+              to indicate other binary units such as
+                <literal>K</literal> (kibibytes),
+                <literal>M</literal> (mebibytes),
+                <literal>G</literal> (gibibytes),
+                <literal>T</literal> (tebibytes), or
+                <literal>P</literal> (pebibytes).</para>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1295485" xreflabel=""/> Improves repeated reads to match network speeds instead of disk speeds</para>
+          <listitem>
+            <para><literal>writethrough_max_io_mb</literal> - Controls the
+              maximum size of a single writes IO that will be cached in memory.
+              Writes larger than <literal>writethrough_max_io_mb</literal> will
+              be written directly to storage and bypass the page cache entirely.
+              This avoids significant CPU overhead at high IO rates.
+              The write cache cannot be disabled for <literal>osd-zfs</literal>,
+              and as a result this parameter is unavailable for that backend.
+            </para>
+            <para>When setting <literal>writethrough_max_io_mb</literal>, the
+              input value can be specified in mebibytes, or can have a suffix
+              to indicate other binary units such as
+                <literal>K</literal> (kibibytes),
+                <literal>M</literal> (mebibytes),
+                <literal>G</literal> (gibibytes),
+                <literal>T</literal> (tebibytes), or
+                <literal>P</literal> (pebibytes).</para>
           </listitem>
+        </itemizedlist>
+      </section>
+    </section>
+    <section>
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>OSS journal</secondary>
+        </indexterm>Enabling OSS Asynchronous Journal Commit</title>
+      <para>The OSS asynchronous journal commit feature asynchronously writes data to disk without
+        forcing a journal flush. This reduces the number of seeks and significantly improves
+        performance on some hardware.</para>
+      <note>
+        <para>Asynchronous journal commit cannot work with direct I/O-originated writes
+            (<literal>O_DIRECT</literal> flag set). In this case, a journal flush is forced. </para>
+      </note>
+      <para>When the asynchronous journal commit feature is enabled, client nodes keep data in the
+        page cache (a page reference). Lustre clients monitor the last committed transaction number
+          (<literal>transno</literal>) in messages sent from the OSS to the clients. When a client
+        sees that the last committed <literal>transno</literal> reported by the OSS is at least
+        equal to the bulk write <literal>transno</literal>, it releases the reference on the
+        corresponding pages. To avoid page references being held for too long on clients after a
+        bulk write, a 7 second ping request is scheduled (the default OSS file system commit time
+        interval is 5 seconds) after the bulk write reply is received, so the OSS has an opportunity
+        to report the last committed <literal>transno</literal>.</para>
+      <para>If the OSS crashes before the journal commit occurs, then intermediate data is lost.
+        However, OSS recovery functionality incorporated into the asynchronous journal commit
+        feature causes clients to replay their write requests and compensate for the missing disk
+        updates by restoring the state of the file system.</para>
+      <para>By default, <literal>sync_journal</literal> is enabled
+          (<literal>sync_journal=1</literal>), so that journal entries are committed synchronously.
+        To enable asynchronous journal commit, set the <literal>sync_journal</literal> parameter to
+          <literal>0</literal> by entering: </para>
+      <screen>$ lctl set_param obdfilter.*.sync_journal=0 
+obdfilter.lol-OST0001.sync_journal=0</screen>
+      <para>An associated <literal>sync-on-lock-cancel</literal> feature (enabled by default)
+        addresses a data consistency issue that can result if an OSS crashes after multiple clients
+        have written data into intersecting regions of an object, and then one of the clients also
+        crashes. A condition is created in which the POSIX requirement for continuous writes is
+        violated along with a potential for corrupted data. With
+          <literal>sync-on-lock-cancel</literal> enabled, if a cancelled lock has any volatile
+        writes attached to it, the OSS synchronously writes the journal to disk on lock
+        cancellation. Disabling the <literal>sync-on-lock-cancel</literal> feature may enhance
+        performance for concurrent write workloads, but it is recommended that you not disable this
+        feature.</para>
+      <para> The <literal>sync_on_lock_cancel</literal> parameter can be set to the following
+        values:</para>
+      <itemizedlist>
+        <listitem>
+          <para><literal>always</literal> - Always force a journal flush on lock cancellation
+            (default when <literal>async_journal</literal> is enabled).</para>
+        </listitem>
+        <listitem>
+          <para><literal>blocking</literal> - Force a journal flush only when the local cancellation
+            is due to a blocking callback.</para>
+        </listitem>
+        <listitem>
+          <para><literal>never</literal> - Do not force any journal flush (default when
+              <literal>async_journal</literal> is disabled).</para>
+        </listitem>
+      </itemizedlist>
+      <para>For example, to set <literal>sync_on_lock_cancel</literal> to not to force a journal
+        flush, use a command similar to:</para>
+      <screen>$ lctl get_param obdfilter.*.sync_on_lock_cancel
+obdfilter.lol-OST0001.sync_on_lock_cancel=never</screen>
+    </section>
+    <section xml:id="dbdoclet.TuningModRPCs" condition='l28'>
+      <title>
+        <indexterm>
+          <primary>proc</primary>
+          <secondary>client metadata performance</secondary>
+        </indexterm>
+        Tuning the Client Metadata RPC Stream
+      </title>
+      <para>The client metadata RPC stream represents the metadata RPCs issued
+        in parallel by a client to a MDT target. The metadata RPCs can be split
+        in two categories: the requests that do not modify the file system
+        (like getattr operation), and the requests that do modify the file system
+        (like create, unlink, setattr operations). To help optimize the client
+        metadata RPC stream, several tuning variables are provided to adjust
+        behavior according to network conditions and cluster size.</para>
+      <para>Note that increasing the number of metadata RPCs issued in parallel
+        might improve the performance metadata intensive parallel applications,
+        but as a consequence it will consume more memory on the client and on
+        the MDS.</para>
+      <section>
+        <title>Configuring the Client Metadata RPC Stream</title>
+        <para>The MDC <literal>max_rpcs_in_flight</literal> parameter defines
+          the maximum number of metadata RPCs, both modifying and
+          non-modifying RPCs, that can be sent in parallel by a client to a MDT
+          target. This includes every file system metadata operations, such as
+          file or directory stat, creation, unlink. The default setting is 8,
+          minimum setting is 1 and maximum setting is 256.</para>
+        <para>To set the <literal>max_rpcs_in_flight</literal> parameter, run
+          the following command on the Lustre client:</para>
+        <screen>client$ lctl set_param mdc.*.max_rpcs_in_flight=16</screen>
+        <para>The MDC <literal>max_mod_rpcs_in_flight</literal> parameter
+          defines the maximum number of file system modifying RPCs that can be
+          sent in parallel by a client to a MDT target. For example, the Lustre
+          client sends modify RPCs when it performs file or directory creation,
+          unlink, access permission modification or ownership modification. The
+          default setting is 7, minimum setting is 1 and maximum setting is
+          256.</para>
+        <para>To set the <literal>max_mod_rpcs_in_flight</literal> parameter,
+          run the following command on the Lustre client:</para>
+        <screen>client$ lctl set_param mdc.*.max_mod_rpcs_in_flight=12</screen>
+        <para>The <literal>max_mod_rpcs_in_flight</literal> value must be
+          strictly less than the <literal>max_rpcs_in_flight</literal> value.
+          It must also be less or equal to the MDT
+          <literal>max_mod_rpcs_per_client</literal> value. If one of theses
+          conditions is not enforced, the setting fails and an explicit message
+          is written in the Lustre log.</para>
+        <para>The MDT <literal>max_mod_rpcs_per_client</literal> parameter is a
+          tunable of the kernel module <literal>mdt</literal> that defines the
+          maximum number of file system modifying RPCs in flight allowed per
+          client. The parameter can be updated at runtime, but the change is
+          effective to new client connections only. The default setting is 8.
+        </para>
+        <para>To set the <literal>max_mod_rpcs_per_client</literal> parameter,
+          run the following command on the MDS:</para>
+        <screen>mds$ echo 12 > /sys/module/mdt/parameters/max_mod_rpcs_per_client</screen>
+      </section>
+      <section>
+        <title>Monitoring the Client Metadata RPC Stream</title>
+        <para>The <literal>rpc_stats</literal> file contains histogram data
+          showing information about modify metadata RPCs. It can be helpful to
+          identify the level of parallelism achieved by an application doing
+          modify metadata operations.</para>
+        <para><emphasis role="bold">Example:</emphasis></para>
+        <screen>client$ lctl get_param mdc.*.rpc_stats
+snapshot_time:         1441876896.567070 (secs.usecs)
+modify_RPCs_in_flight:  0
 
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1295488" xreflabel=""/> Provides the building blocks for OST write cache (small-write aggregation)</para>
+                        modify
+rpcs in flight        rpcs   % cum %
+0:                       0   0   0
+1:                      56   0   0
+2:                      40   0   0
+3:                      70   0   0
+4                       41   0   0
+5:                      51   0   1
+6:                      88   0   1
+7:                     366   1   2
+8:                    1321   5   8
+9:                    3624  15  23
+10:                   6482  27  50
+11:                   7321  30  81
+12:                   4540  18 100</screen>
+        <para>The file information includes:</para>
+        <itemizedlist>
+          <listitem>
+            <para><literal>snapshot_time</literal> - UNIX epoch instant the
+              file was read.</para>
           </listitem>
-
-</itemizedlist>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1295497" xreflabel=""/>31.2.7.1 Using OSS Read Cache</title>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295566" xreflabel=""/>OSS read cache is implemented on the OSS, and does not require any special support on the client side. Since OSS read cache uses the memory available in the Linux page cache, you should use I/O patterns to determine the appropriate amount of memory for the cache; if the data is mostly reads, then more cache is required than for writes.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295331" xreflabel=""/>OSS read cache is enabled, by default, and managed by the following tunables:</para>
-          <itemizedlist><listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1296609" xreflabel=""/>read_cache_enable  controls whether data read from disk during a read request is kept in memory and available for later read requests for the same data, without having to re-read it from disk. By default, read cache is enabled (read_cache_enable = 1).</para>
-            </listitem>
-
-</itemizedlist>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296624" xreflabel=""/>When the OSS receives a read request from a client, it reads data from disk into its memory and sends the data as a reply to the requests. If read cache is enabled, this data stays in memory after the client's request is finished, and the OSS skips reading data from disk when subsequent read requests for the same are received. The read cache is managed by the Linux kernel globally across all OSTs on that OSS, and the least recently used cache pages will be dropped from memory when the amount of free memory is running low.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1295918" xreflabel=""/>If read cache is disabled (read_cache_enable = 0), then the OSS will discard the data after the client's read requests are serviced and, for subsequent read requests, the OSS must read the data from disk.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296677" xreflabel=""/>To disable read cache on all OSTs of an OSS, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296709" xreflabel=""/>root@oss1# lctl set_param obdfilter.*.read_cache_enable=0
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296681" xreflabel=""/>To re-enable read cache on one OST, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296722" xreflabel=""/>root@oss1# lctl set_param obdfilter.{OST_name}.read_cache_enable=1
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296685" xreflabel=""/>To check if read cache is enabled on all OSTs on an OSS, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296687" xreflabel=""/>root@oss1# lctl get_param obdfilter.*.read_cache_enable
-</screen>
-          <itemizedlist><listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1296775" xreflabel=""/>writethrough_cache_enable  controls whether data sent to the OSS as a write request is kept in the read cache and available for later reads, or if it is discarded from cache when the write is completed. By default, writethrough cache is enabled (writethrough_cache_enable = 1).</para>
-            </listitem>
-
-</itemizedlist>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296844" xreflabel=""/>When the OSS receives write requests from a client, it receives data from the client into its memory and writes the data to disk. If writethrough cache is enabled, this data stays in memory after the write request is completed, allowing the OSS to skip reading this data from disk if a later read request, or partial-page write request, for the same data is received.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296871" xreflabel=""/>If writethrough cache is disabled (writethrough_cache_enabled = 0), then the OSS discards the data after the client's write request is completed, and for subsequent read request, or partial-page write request, the OSS must re-read the data from disk.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296892" xreflabel=""/>Enabling writethrough cache is advisable if clients are doing small or unaligned writes that would cause partial-page updates, or if the files written by one node are immediately being accessed by other nodes. Some examples where this might be useful include producer-consumer I/O models or shared-file writes with a different node doing I/O not aligned on 4096-byte boundaries. Disabling writethrough cache is advisable in the case where files are mostly written to the file system but are not re-read within a short time period, or files are only written and re-read by the same node, regardless of whether the I/O is aligned or not.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296922" xreflabel=""/>To disable writethrough cache on all OSTs of an OSS, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296924" xreflabel=""/>root@oss1# lctl set_param obdfilter.*.writethrough_cache_enable=0
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296926" xreflabel=""/>To re-enable writethrough cache on one OST, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1296928" xreflabel=""/>root@oss1# lctl set_param \obdfilter.{OST_name}.writethrough_cache_enable=1
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1296930" xreflabel=""/>To check if writethrough cache is</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297043" xreflabel=""/>root@oss1# lctl set_param obdfilter.*.writethrough_cache_enable=1
-</screen>
-          <itemizedlist><listitem>
-              <para><anchor xml:id="dbdoclet.50438271_pgfId-1297052" xreflabel=""/>readcache_max_filesize  controls the maximum size of a file that both the read cache and writethrough cache will try to keep in memory. Files larger than readcache_max_filesize will not be kept in cache for either reads or writes.</para>
-            </listitem>
-
-</itemizedlist>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297105" xreflabel=""/>This can be very useful for workloads where relatively small files are repeatedly accessed by many clients, such as job startup files, executables, log files, etc., but large files are read or written only once. By not putting the larger files into the cache, it is much more likely that more of the smaller files will remain in cache for a longer time.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297126" xreflabel=""/>When setting readcache_max_filesize, the input value can be specified in bytes, or can have a suffix to indicate other binary units such as <emphasis role="bold">K</emphasis>ilobytes, <emphasis role="bold">M</emphasis>egabytes, <emphasis role="bold">G</emphasis>igabytes, <emphasis role="bold">T</emphasis>erabytes, or <emphasis role="bold">P</emphasis>etabytes.</para>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297068" xreflabel=""/>To limit the maximum cached file size to 32MB on all OSTs of an OSS, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297070" xreflabel=""/>root@oss1# lctl set_param obdfilter.*.readcache_max_filesize=32M
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297072" xreflabel=""/>To disable the maximum cached file size on an OST, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297074" xreflabel=""/>root@oss1# lctl set_param \obdfilter.{OST_name}.readcache_max_filesize=-1
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297076" xreflabel=""/>To check the current maximum cached file size on all OSTs of an OSS, run:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297049" xreflabel=""/>root@oss1# lctl get_param obdfilter.*.readcache_max_filesize
-</screen>
-        </section>
+          <listitem>
+            <para><literal>modify_RPCs_in_flight</literal> - Number of modify
+              RPCs issued by the MDC, but not completed at the time of the
+              snapshot. This value should always be less than or equal to
+              <literal>max_mod_rpcs_in_flight</literal>.</para>
+          </listitem>
+          <listitem>
+            <para><literal>rpcs in flight</literal> - Number of modify RPCs
+              that are pending when a RPC is sent, the relative percentage
+              (<literal>%</literal>) of total modify RPCs, and the cumulative
+              percentage (<literal>cum %</literal>) to that point.</para>
+          </listitem>
+        </itemizedlist>
+        <para>If a large proportion of modify metadata RPCs are issued with a
+          number of pending metadata RPCs close to the
+          <literal>max_mod_rpcs_in_flight</literal> value, it means the
+          <literal>max_mod_rpcs_in_flight</literal> value could be increased to
+          improve the modify metadata performance.</para>
       </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1300258" xreflabel=""/>31.2.8 OSS Asynchronous Journal Commit</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300265" xreflabel=""/>The OSS asynchronous journal commit feature synchronously writes data to disk without forcing a journal flush. This reduces the number of seeks and significantly improves performance on some hardware.</para>
-                <note><para>Asynchronous journal commit cannot work with O_DIRECT writes, a journal flush is still forced.</para></note>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300718" xreflabel=""/>When asynchronous journal commit is enabled, client nodes keep data in the page cache (a page reference). Lustre clients monitor the last committed transaction number (transno) in messages sent from the OSS to the clients. When a client sees that the last committed transno reported by the OSS is &gt;=bulk write transno, it releases the reference on the corresponding pages. To avoid page references being held for too long on clients after a bulk write, a 7 second ping request is scheduled (jbd commit time is 5 seconds) after the bulk write reply is received, so the OSS has an opportunity to report the last committed transno.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300661" xreflabel=""/>If the OSS crashes before the journal commit occurs, then the intermediate data is lost. However, new OSS recovery functionality (introduced in the asynchronous journal commit feature), causes clients to replay their write requests and compensate for the missing disk updates by restoring the state of the file system.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300287" xreflabel=""/>To enable asynchronous journal commit, set the sync_journal parameter to zero (sync_journal=0):</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1300289" xreflabel=""/>$ lctl set_param obdfilter.*.sync_journal=0 
-<anchor xml:id="dbdoclet.50438271_pgfId-1300410" xreflabel=""/>obdfilter.lol-OST0001.sync_journal=0
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300292" xreflabel=""/>By default, sync_journal is disabled (sync_journal=1), which forces a journal flush after every bulk write.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300727" xreflabel=""/>When asynchronous journal commit is used, clients keep a page reference until the journal transaction commits. This can cause problems when a client receives a blocking callback, because pages need to be removed from the page cache, but they cannot be removed because of the extra page reference.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300728" xreflabel=""/>This problem is solved by forcing a journal flush on lock cancellation. When this happens, the client is granted the metadata blocks that have hit the disk, and it can safely release the page reference before processing the blocking callback. The parameter which controls this action is sync_on_lock_cancel, which can be set to the following values:</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300441" xreflabel=""/>always: Always force a journal flush on lock cancellation</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300530" xreflabel=""/>blocking: Force a journal flush only when the local cancellation is due to a blocking callback</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300533" xreflabel=""/>never: Do not force any journal flush</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300297" xreflabel=""/>Here is an example of sync_on_lock_cancel being set not to force a journal flush:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1300298" xreflabel=""/>$ lctl get_param obdfilter.*.sync_on_lock_cancel
-<anchor xml:id="dbdoclet.50438271_pgfId-1300299" xreflabel=""/>obdfilter.lol-OST0001.sync_on_lock_cancel=never
+    </section>
+  </section>
+  <section>
+    <title>Configuring Timeouts in a Lustre File System</title>
+    <para>In a Lustre file system, RPC timeouts are set using an adaptive timeouts mechanism, which
+      is enabled by default. Servers track RPC completion times and then report back to clients
+      estimates for completion times for future RPCs. Clients  use these estimates to set RPC
+      timeout values. If the processing of server requests slows down for any reason, the server
+      estimates for RPC completion increase, and clients then revise RPC timeout values to allow
+      more time for RPC completion.</para>
+    <para>If the RPCs queued on the server approach the RPC timeout specified by the client, to
+      avoid RPC timeouts and disconnect/reconnect cycles, the server sends an "early reply" to the
+      client, telling the client to allow more time. Conversely, as server processing speeds up, RPC
+      timeout values decrease, resulting in faster detection if the server becomes non-responsive
+      and quicker connection to the failover partner of the server.</para>
+    <section>
+      <title><indexterm>
+          <primary>proc</primary>
+          <secondary>configuring adaptive timeouts</secondary>
+        </indexterm><indexterm>
+          <primary>configuring</primary>
+          <secondary>adaptive timeouts</secondary>
+        </indexterm><indexterm>
+          <primary>proc</primary>
+          <secondary>adaptive timeouts</secondary>
+        </indexterm>Configuring Adaptive Timeouts</title>
+      <para>The adaptive timeout parameters in the table below can be set persistently system-wide
+        using <literal>lctl conf_param</literal> on the MGS. For example, the following command sets
+        the <literal>at_max</literal> value  for all servers and clients associated with the file
+        system
+        <literal>testfs</literal>:<screen>lctl conf_param testfs.sys.at_max=1500</screen></para>
+      <note>
+        <para>Clients that access multiple Lustre file systems must use the same parameter values
+          for all file systems.</para>
+      </note>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="30*"/>
+          <colspec colname="c2" colwidth="80*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Parameter</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para>
+                  <literal> at_min </literal></para>
+              </entry>
+              <entry>
+                <para>Minimum adaptive timeout (in seconds). The default value is 0. The
+                    <literal>at_min</literal> parameter is the minimum processing time that a server
+                  will report. Ideally, <literal>at_min</literal> should be set to its default
+                  value. Clients base their timeouts on this value, but they do not use this value
+                  directly. </para>
+                <para>If, for unknown reasons (usually due to temporary network outages), the
+                  adaptive timeout value is too short and clients time out their RPCs, you can
+                  increase the <literal>at_min</literal> value to compensate for this.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> at_max </literal></para>
+              </entry>
+              <entry>
+                <para>Maximum adaptive timeout (in seconds). The <literal>at_max</literal> parameter
+                  is an upper-limit on the service time estimate. If <literal>at_max</literal> is
+                  reached, an RPC request times out.</para>
+                <para>Setting <literal>at_max</literal> to 0 causes adaptive timeouts to be disabled
+                  and a fixed timeout method to be used instead (see <xref
+                    xmlns:xlink="http://www.w3.org/1999/xlink" linkend="section_c24_nt5_dl"/></para>
+                <note>
+                  <para>If slow hardware causes the service estimate to increase beyond the default
+                    value of <literal>at_max</literal>, increase <literal>at_max</literal> to the
+                    maximum time you are willing to wait for an RPC completion.</para>
+                </note>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> at_history </literal></para>
+              </entry>
+              <entry>
+                <para>Time period (in seconds) within which adaptive timeouts remember the slowest
+                  event that occurred. The default is 600.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> at_early_margin </literal></para>
+              </entry>
+              <entry>
+                <para>Amount of time before the Lustre server sends an early reply (in seconds).
+                  Default is 5.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> at_extra </literal></para>
+              </entry>
+              <entry>
+                <para>Incremental amount of time that a server requests with each early reply (in
+                  seconds). The server does not know how much time the RPC will take, so it asks for
+                  a fixed value. The default is 30, which provides a balance between sending too
+                  many early replies for the same RPC and overestimating the actual completion
+                  time.</para>
+                <para>When a server finds a queued request about to time out and needs to send an
+                  early reply out, the server adds the <literal>at_extra</literal> value. If the
+                  time expires, the Lustre server drops the request, and the client enters recovery
+                  status and reconnects to restore the connection to normal status.</para>
+                <para>If you see multiple early replies for the same RPC asking for 30-second
+                  increases, change the <literal>at_extra</literal> value to a larger number to cut
+                  down on early replies sent and, therefore, network load.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> ldlm_enqueue_min </literal></para>
+              </entry>
+              <entry>
+                <para>Minimum lock enqueue time (in seconds). The default is 100. The time it takes
+                  to enqueue a lock, <literal>ldlm_enqueue</literal>, is the maximum of the measured
+                  enqueue estimate (influenced by <literal>at_min</literal> and
+                    <literal>at_max</literal> parameters), multiplied by a weighting factor and the
+                  value of <literal>ldlm_enqueue_min</literal>. </para>
+                <para>Lustre Distributed Lock Manager (LDLM) lock enqueues have a dedicated minimum
+                  value for <literal>ldlm_enqueue_min</literal>. Lock enqueue timeouts increase as
+                  the measured enqueue times increase (similar to adaptive timeouts).</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <section>
+        <title>Interpreting Adaptive Timeout Information</title>
+        <para>Adaptive timeout information can be obtained via
+          <literal>lctl get_param {osc,mdc}.*.timeouts</literal> files on each
+          client and <literal>lctl get_param {ost,mds}.*.*.timeouts</literal>
+          on each server.  To read information from a
+          <literal>timeouts</literal> file, enter a command similar to:</para>
+        <screen># lctl get_param -n ost.*.ost_io.timeouts
+service : cur 33  worst 34 (at 1193427052, 1600s ago) 1 1 33 2</screen>
+        <para>In this example, the <literal>ost_io</literal> service on this
+          node is currently reporting an estimated RPC service time of 33
+          seconds. The worst RPC service time was 34 seconds, which occurred
+          26 minutes ago.</para>
+        <para>The output also provides a history of service times.
+          Four &quot;bins&quot; of adaptive timeout history are shown, with the
+          maximum RPC time in each bin reported. In both the 0-150s bin and the
+          150-300s bin, the maximum RPC time was 1. The 300-450s bin shows the
+          worst (maximum) RPC time at 33 seconds, and the 450-600s bin shows a
+          maximum of RPC time of 2 seconds. The estimated service time is the
+          maximum value in the four bins (33 seconds in this example).</para>
+        <para>Service times (as reported by the servers) are also tracked in
+          the client OBDs, as shown in this example:</para>
+        <screen># lctl get_param osc.*.timeouts
+last reply : 1193428639, 0d0h00m00s ago
+network    : cur  1 worst  2 (at 1193427053, 0d0h26m26s ago)  1  1  1  1
+portal 6   : cur 33 worst 34 (at 1193427052, 0d0h26m27s ago) 33 33 33  2
+portal 28  : cur  1 worst  1 (at 1193426141, 0d0h41m38s ago)  1  1  1  1
+portal 7   : cur  1 worst  1 (at 1193426141, 0d0h41m38s ago)  1  0  1  1
+portal 17  : cur  1 worst  1 (at 1193426177, 0d0h41m02s ago)  1  0  0  1
 </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300317" xreflabel=""/>By default, sync_on_lock_cancel is set to never, because asynchronous journal commit is disabled by default.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300592" xreflabel=""/>When asynchronous journal commit is enabled (sync_journal=0), sync_on_lock_cancel is automatically set to always, if it was previously set to never.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1300596" xreflabel=""/>Similarly, when asynchronous journal commit is disabled, (sync_journal=1), sync_on_lock_cancel is enforced to never.</para>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1297046" xreflabel=""/>31.2.9 mballoc <anchor xml:id="dbdoclet.50438271_marker-1297045" xreflabel=""/>History</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290687" xreflabel=""/><emphasis role="bold">/proc/fs/ldiskfs/sda/mb_history</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290688" xreflabel=""/>Multi-Block-Allocate (mballoc), enables Lustre to ask ldiskfs to allocate multiple blocks with a single request to the block allocator. Typically, an ldiskfs file system allocates only one block per time. Each mballoc-enabled partition has this file. This is sample output:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290689" xreflabel=""/>pid  inode   goal            result          found   grps    cr      \   me\
-rge   tail    broken
-<anchor xml:id="dbdoclet.50438271_pgfId-1290690" xreflabel=""/>2838       139267  17/12288/1      17/12288/1      1       0       0       \
-\   M       1       8192
-<anchor xml:id="dbdoclet.50438271_pgfId-1290691" xreflabel=""/>2838       139267  17/12289/1      17/12289/1      1       0       0       \
-\   M       0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290692" xreflabel=""/>2838       139267  17/12290/1      17/12290/1      1       0       0       \
-\   M       1       2
-<anchor xml:id="dbdoclet.50438271_pgfId-1290693" xreflabel=""/>2838       24577   3/12288/1       3/12288/1       1       0       0       \
-\   M       1       8192
-<anchor xml:id="dbdoclet.50438271_pgfId-1290694" xreflabel=""/>2838       24578   3/12288/1       3/771/1         1       1       1       \
-\           0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290695" xreflabel=""/>2838       32769   4/12288/1       4/12288/1       1       0       0       \
-\   M       1       8192
-<anchor xml:id="dbdoclet.50438271_pgfId-1290696" xreflabel=""/>2838       32770   4/12288/1       4/12289/1       13      1       1       \
-\           0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290697" xreflabel=""/>2838       32771   4/12288/1       5/771/1         26      2       1       \
-\           0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290698" xreflabel=""/>2838       32772   4/12288/1       5/896/1         31      2       1       \
-\           1       128
-<anchor xml:id="dbdoclet.50438271_pgfId-1290699" xreflabel=""/>2838       32773   4/12288/1       5/897/1         31      2       1       \
-\           0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290700" xreflabel=""/>2828       32774   4/12288/1       5/898/1         31      2       1       \
-\           1       2
-<anchor xml:id="dbdoclet.50438271_pgfId-1290701" xreflabel=""/>2838       32775   4/12288/1       5/899/1         31      2       1       \
-\           0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290702" xreflabel=""/>2838       32776   4/12288/1       5/900/1         31      2       1       \
-\           1       4
-<anchor xml:id="dbdoclet.50438271_pgfId-1290703" xreflabel=""/>2838       32777   4/12288/1       5/901/1         31      2       1       \
-\           0       0
-<anchor xml:id="dbdoclet.50438271_pgfId-1290704" xreflabel=""/>2838       32778   4/12288/1       5/902/1         31      2       1       \
-\           1       2
+        <para>In this example, portal 6, the <literal>ost_io</literal> service
+          portal, shows the history of service estimates reported by the portal.
+        </para>
+        <para>Server statistic files also show the range of estimates including
+          min, max, sum, and sum-squared. For example:</para>
+        <screen># lctl get_param mdt.*.mdt.stats
+...
+req_timeout               6 samples [sec] 1 10 15 105
+...
 </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290791" xreflabel=""/>The parameters are described below:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292192" xreflabel=""/>Parameter</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292194" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292196" xreflabel=""/><emphasis role="bold">pid</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292198" xreflabel=""/>Process that made the allocation.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292200" xreflabel=""/><emphasis role="bold">inode</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292202" xreflabel=""/>inode number allocated blocks</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292204" xreflabel=""/><emphasis role="bold">goal</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292206" xreflabel=""/>Initial request that came to mballoc (group/block-in-group/number-of-blocks)</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292208" xreflabel=""/><emphasis role="bold">result</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292210" xreflabel=""/>What mballoc actually found for this request.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292212" xreflabel=""/><emphasis role="bold">found</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292214" xreflabel=""/>Number of free chunks mballoc found and measured before the final decision.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292216" xreflabel=""/><emphasis role="bold">grps</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292218" xreflabel=""/>Number of groups mballoc scanned to satisfy the request.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292220" xreflabel=""/><emphasis role="bold">cr</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292222" xreflabel=""/>Stage at which mballoc found the result:</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292223" xreflabel=""/><emphasis role="bold">0</emphasis> - best in terms of resource allocation. The request was 1MB or larger and was satisfied directly via the kernel buddy allocator.</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292224" xreflabel=""/><emphasis role="bold">1</emphasis> - regular stage (good at resource consumption)</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292225" xreflabel=""/><emphasis role="bold">2</emphasis> - fs is quite fragmented (not that bad at resource consumption)</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292226" xreflabel=""/><emphasis role="bold">3</emphasis> - fs is very fragmented (worst at resource consumption)</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292228" xreflabel=""/><emphasis role="bold">queue</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292230" xreflabel=""/>Total bytes in active/queued sends.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292232" xreflabel=""/><emphasis role="bold">merge</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292234" xreflabel=""/>Whether the request hit the goal. This is good as extents code can now merge new blocks to existing extent, eliminating the need for extents tree growth.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292236" xreflabel=""/><emphasis role="bold">tail</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292238" xreflabel=""/>Number of blocks left free after the allocation breaks large free chunks.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292240" xreflabel=""/><emphasis role="bold">broken</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292242" xreflabel=""/>How large the broken chunk was.</para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290792" xreflabel=""/>Most customers are probably interested in found/cr. If cr is 0 1 and found is less than 100, then mballoc is doing quite well.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290793" xreflabel=""/>Also, number-of-blocks-in-request (third number in the goal triple) can tell the number of blocks requested by the obdfilter. If the obdfilter is doing a lot of small requests (just few blocks), then either the client is processing input/output to a lot of small files, or something may be wrong with the client (because it is better if client sends large input/output requests). This can be investigated with the OSC rpc_stats or OST brw_stats mentioned above.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290794" xreflabel=""/>Number of groups scanned (grps column) should be small. If it reaches a few dozen often, then either your disk file system is pretty fragmented or mballoc is doing something wrong in the group selection part.</para>
       </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290796" xreflabel=""/>31.2.10 mballoc3<anchor xml:id="dbdoclet.50438271_marker-1290795" xreflabel=""/> Tunables</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290800" xreflabel=""/>Lustre version 1.6.1 and later includes mballoc3, which was built on top of mballoc2. By default, mballoc3 is enabled, and adds these features:</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1290801" xreflabel=""/> Pre-allocation for single files (helps to resist fragmentation)</para>
-          </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1290802" xreflabel=""/> Pre-allocation for a group of files (helps to pack small files into large, contiguous chunks)</para>
+    </section>
+    <section xml:id="section_c24_nt5_dl">
+      <title>Setting Static Timeouts<indexterm>
+          <primary>proc</primary>
+          <secondary>static timeouts</secondary>
+        </indexterm></title>
+      <para>The Lustre software provides two sets of static (fixed) timeouts, LND timeouts and
+        Lustre timeouts, which are used when adaptive timeouts are not enabled.</para>
+      <para>
+        <itemizedlist>
+          <listitem>
+            <para><emphasis role="italic"><emphasis role="bold">LND timeouts</emphasis></emphasis> -
+              LND timeouts ensure that point-to-point communications across a network complete in a
+              finite time in the presence of failures, such as packages lost or broken connections.
+              LND timeout parameters are set for each individual LND.</para>
+            <para>LND timeouts are logged with the <literal>S_LND</literal> flag set. They are not
+              printed as console messages, so check the Lustre log for <literal>D_NETERROR</literal>
+              messages or enable printing of <literal>D_NETERROR</literal> messages to the console
+              using:<screen>lctl set_param printk=+neterror</screen></para>
+            <para>Congested routers can be a source of spurious LND timeouts. To avoid this
+              situation, increase the number of LNet router buffers to reduce back-pressure and/or
+              increase LND timeouts on all nodes on all connected networks. Also consider increasing
+              the total number of LNet router nodes in the system so that the aggregate router
+              bandwidth matches the aggregate server bandwidth.</para>
           </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1290803" xreflabel=""/> Stream allocation (helps to decrease the seek rate)</para>
+          <listitem>
+            <para><emphasis role="italic"><emphasis role="bold">Lustre timeouts
+                </emphasis></emphasis>- Lustre timeouts ensure that Lustre RPCs complete in a finite
+              time in the presence of failures when adaptive timeouts are not enabled. Adaptive
+              timeouts are enabled by default. To disable adaptive timeouts at run time, set
+                <literal>at_max</literal> to 0 by running on the
+              MGS:<screen># lctl conf_param <replaceable>fsname</replaceable>.sys.at_max=0</screen></para>
+            <note>
+              <para>Changing the status of adaptive timeouts at runtime may cause a transient client
+                timeout, recovery, and reconnection.</para>
+            </note>
+            <para>Lustre timeouts are always printed as console messages. </para>
+            <para>If Lustre timeouts are not accompanied by LND timeouts, increase the Lustre
+              timeout on both servers and clients. Lustre timeouts are set using a command such as
+              the following:<screen># lctl set_param timeout=30</screen></para>
+            <para>Lustre timeout parameters are described in the table below.</para>
           </listitem>
-
-</itemizedlist>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290830" xreflabel=""/>The following mballoc3 tunables are available:</para>
+        </itemizedlist>
         <informaltable frame="all">
           <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
+            <colspec colname="c1" colnum="1" colwidth="30*"/>
+            <colspec colname="c2" colnum="2" colwidth="70*"/>
             <thead>
               <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292250" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292252" xreflabel=""/>Description</emphasis></para></entry>
+                <entry>Parameter</entry>
+                <entry>Description</entry>
               </row>
             </thead>
             <tbody>
               <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292283" xreflabel=""/><emphasis role="bold">stats</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292285" xreflabel=""/>Enables/disables the collection of statistics. Collected statistics can be found</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292286" xreflabel=""/>in /proc/fs/ldiskfs2/&lt;dev&gt;/mb_history.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292288" xreflabel=""/><emphasis role="bold">max_to_scan</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292290" xreflabel=""/>Maximum number of free chunks that mballoc finds before a final decision to avoid livelock.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292292" xreflabel=""/><emphasis role="bold">min_to_scan</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292294" xreflabel=""/>Minimum number of free chunks that mballoc finds before a final decision. This is useful for a very small request, to resist fragmentation of big free chunks.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292296" xreflabel=""/><emphasis role="bold">order2_req</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292298" xreflabel=""/>For requests equal to 2^N (where N &gt;= order2_req), a very fast search via buddy structures is used.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292300" xreflabel=""/><emphasis role="bold">stream_req</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292302" xreflabel=""/>Requests smaller or equal to this value are packed together to form large write I/Os.</para></entry>
+                <entry><literal>timeout</literal></entry>
+                <entry>
+                  <para>The time that a client waits for a server to complete an RPC (default 100s).
+                    Servers wait half this time for a normal client RPC to complete and a quarter of
+                    this time for a single bulk request (read or write of up to 4 MB) to complete.
+                    The client pings recoverable targets (MDS and OSTs) at one quarter of the
+                    timeout, and the server waits one and a half times the timeout before evicting a
+                    client for being &quot;stale.&quot;</para>
+                  <para>Lustre client sends periodic &apos;ping&apos; messages to servers with which
+                    it has had no communication for the specified period of time. Any network
+                    activity between a client and a server in the file system also serves as a
+                    ping.</para>
+                </entry>
+              </row>
+              <row>
+                <entry><literal>ldlm_timeout</literal></entry>
+                <entry>
+                  <para>The time that a server waits for a client to reply to an initial AST (lock
+                    cancellation request). The default is 20s for an OST and 6s for an MDS. If the
+                    client replies to the AST, the server will give it a normal timeout (half the
+                    client timeout) to flush any dirty data and release the lock.</para>
+                </entry>
+              </row>
+              <row>
+                <entry><literal>fail_loc</literal></entry>
+                <entry>
+                  <para>An internal debugging failure hook. The default value of
+                      <literal>0</literal> means that no failure will be triggered or
+                    injected.</para>
+                </entry>
+              </row>
+              <row>
+                <entry><literal>dump_on_timeout</literal></entry>
+                <entry>
+                  <para>Triggers a dump of the Lustre debug log when a timeout occurs. The default
+                    value of <literal>0</literal> (zero) means a dump of the Lustre debug log will
+                    not be triggered.</para>
+                </entry>
+              </row>
+              <row>
+                <entry><literal>dump_on_eviction</literal></entry>
+                <entry>
+                  <para>Triggers a dump of the Lustre debug log when an eviction occurs. The default
+                    value of <literal>0</literal> (zero) means a dump of the Lustre debug log will
+                    not be triggered. </para>
+                </entry>
               </row>
             </tbody>
           </tgroup>
         </informaltable>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290831" xreflabel=""/>The following tunables, providing more control over allocation policy, will be available in the next version:</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <thead>
-              <row>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292371" xreflabel=""/>Field</emphasis></para></entry>
-                <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292373" xreflabel=""/>Description</emphasis></para></entry>
-              </row>
-            </thead>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292405" xreflabel=""/><emphasis role="bold">stats</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292407" xreflabel=""/>Enables/disables the collection of statistics. Collected statistics can be found in /proc/fs/ldiskfs2/&lt;dev&gt;/mb_history.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292409" xreflabel=""/><emphasis role="bold">max_to_scan</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292411" xreflabel=""/>Maximum number of free chunks that mballoc finds before a final decision to avoid livelock.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292413" xreflabel=""/><emphasis role="bold">min_to_scan</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292415" xreflabel=""/>Minimum number of free chunks that mballoc finds before a final decision. This is useful for a very small request, to resist fragmentation of big free chunks.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292417" xreflabel=""/><emphasis role="bold">order2_req</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292419" xreflabel=""/>For requests equal to 2^N (where N &gt;= order2_req), a very fast search via buddy structures is used.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292392" xreflabel=""/><emphasis role="bold">small_req</emphasis></para></entry>
-                <entry morerows="1"><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292427" xreflabel=""/>All requests are divided into 3 categories:</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292428" xreflabel=""/>&lt; small_req (packed together to form large, aggregated requests)</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292429" xreflabel=""/>&lt; large_req (allocated mostly in linearly)</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292430" xreflabel=""/>&gt; large_req (very large requests so the arm seek does not matter)</para><para><anchor xml:id="dbdoclet.50438271_pgfId-1292431" xreflabel=""/>The idea is that we try to pack small requests to form large requests, and then place all large requests (including compound from the small ones) close to one another, causing as few arm seeks as possible.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292437" xreflabel=""/><emphasis role="bold">large_req</emphasis></para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292449" xreflabel=""/><emphasis role="bold">prealloc_table</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292451" xreflabel=""/>The amount of space to preallocate depends on the current file size. The idea is that for small files we do not need 1 MB preallocations and for large files, 1 MB preallocations are not large enough; it is better to preallocate 4 MB.</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292453" xreflabel=""/><emphasis role="bold">group_prealloc</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292455" xreflabel=""/>The amount of space preallocated for small requests to be grouped.</para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290875" xreflabel=""/>31.2.11 <anchor xml:id="dbdoclet.50438271_13474" xreflabel=""/>Lo<anchor xml:id="dbdoclet.50438271_marker-1290874" xreflabel=""/>cking</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290876" xreflabel=""/><emphasis role="bold">/proc/fs/lustre/ldlm/ldlm/namespaces/&lt;OSC name|MDC name&gt;/lru_size</emphasis></para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290877" xreflabel=""/>The lru_size parameter is used to control the number of client-side locks in an LRU queue. LRU size is dynamic, based on load. This optimizes the number of locks available to nodes that have different workloads (e.g., login/build nodes vs. compute nodes vs. backup nodes).</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1297289" xreflabel=""/>The total number of locks available is a function of the server's RAM. The default limit is 50 locks/1 MB of RAM. If there is too much memory pressure, then the LRU size is shrunk. The number of locks on the server is limited to {number of OST/MDT on node} * {number of clients} * {client lru_size}.</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1294671" xreflabel=""/> To enable automatic LRU sizing, set the lru_size parameter to 0. In this case, the lru_size parameter shows the current number of locks being used on the export. (In Lustre 1.6.5.1 and later, LRU sizing is enabled, by default.)</para>
-          </listitem>
-
-<listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1294717" xreflabel=""/> To specify a maximum number of locks, set the lru_size parameter to a value &gt; 0 (former numbers are okay, 100 * CPU_NR). We recommend that you only increase the LRU size on a few login nodes where users access the file system interactively.</para>
-          </listitem>
-
-</itemizedlist>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290878" xreflabel=""/>To clear the LRU on a single client, and as a result flush client cache, without changing the lru_size value:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1294737" xreflabel=""/>$ lctl set_param ldlm.namespaces.&lt;osc_name|mdc_name&gt;.lru_size=clear
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290880" xreflabel=""/>If you shrink the LRU size below the number of existing unused locks, then the unused locks are canceled immediately. Use echo clear to cancel all locks without changing the value.</para>
-                <note><para>Currently, the lru_size parameter can only be set temporarily with lctl set_param; it cannot be set permanently.</para></note>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1294998" xreflabel=""/>To disable LRU sizing, run this command on the Lustre clients:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1295001" xreflabel=""/>$ lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1295002" xreflabel=""/>Replace NR_CPU value with the number of CPUs on the node.</para>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1297273" xreflabel=""/>To determine the number of locks being granted:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297276" xreflabel=""/>$ lctl get_param ldlm.namespaces.*.pool.limit
-</screen>
-      </section>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1296246" xreflabel=""/>31.2.12 <anchor xml:id="dbdoclet.50438271_87260" xreflabel=""/>Setting MDS and OSS Thread Counts</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1298326" xreflabel=""/>MDS and OSS thread counts (minimum and maximum) can be set via the {min,max}_thread_count tunable. For each service, a new /proc/fs/lustre/{service}/*/thread_{min,max,started} entry is created. The tunable, {service}.thread_{min,max,started}, can be used to set the minimum and maximum thread counts or get the current number of running threads for the following services.</para>
-        <informaltable frame="all">
-          <tgroup cols="2">
-            <colspec colname="c1" colwidth="50*"/>
-            <colspec colname="c2" colwidth="50*"/>
-            <tbody>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298639" xreflabel=""/><emphasis role="bold">Service</emphasis></para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298641" xreflabel=""/><emphasis role="bold">Description</emphasis></para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298556" xreflabel=""/>mdt.MDS.mds</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298558" xreflabel=""/>normal metadata ops</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298560" xreflabel=""/>mdt.MDS.mds_readpage</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298562" xreflabel=""/>metadata readdir</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298564" xreflabel=""/>mdt.MDS.mds_setattr</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298566" xreflabel=""/>metadata setattr</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298581" xreflabel=""/>ost.OSS.ost</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298583" xreflabel=""/>normal data</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298585" xreflabel=""/>ost.OSS.ost_io</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298587" xreflabel=""/>bulk data IO</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298589" xreflabel=""/>ost.OSS.ost_create</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298591" xreflabel=""/>OST object pre-creation service</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298593" xreflabel=""/>ldlm.services.ldlm_canceld</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298595" xreflabel=""/>DLM lock cancel</para></entry>
-              </row>
-              <row>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298568" xreflabel=""/>ldlm.services.ldlm_cbd</para></entry>
-                <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1298570" xreflabel=""/>DLM lock grant</para></entry>
-              </row>
-            </tbody>
-          </tgroup>
-        </informaltable>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299411" xreflabel=""/> To temporarily set this tunable, run:</para>
-          </listitem>
-
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299412" xreflabel=""/># lctl {get,set}_param {service}.thread_{min,max,started} 
-</screen>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299413" xreflabel=""/> To permanently set this tunable, run:</para>
-          </listitem>
-
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299414" xreflabel=""/># lctl conf_param {service}.thread_{min,max,started} </screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299409" xreflabel=""/>The following examples show how to set thread counts and get the number of running threads for the ost_io service.</para>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299429" xreflabel=""/> To get the number of running threads, run:</para>
-          </listitem>
-
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299270" xreflabel=""/># lctl get_param ost.OSS.ost_io.threads_started</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299276" xreflabel=""/>The command output will be similar to this:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299282" xreflabel=""/>ost.OSS.ost_io.threads_started=128
-</screen>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299257" xreflabel=""/> To set the maximum number of threads (512), run:</para>
-          </listitem>
-
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299294" xreflabel=""/># lctl get_param ost.OSS.ost_io.threads_max
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299330" xreflabel=""/>The command output will be:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299331" xreflabel=""/>ost.OSS.ost_io.threads_max=512
-</screen>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299306" xreflabel=""/> To set the maximum thread count to 256 instead of 512 (to avoid overloading the storage or for an array with requests), run:</para>
-          </listitem>
-
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299316" xreflabel=""/># lctl set_param ost.OSS.ost_io.threads_max=256
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299352" xreflabel=""/>The command output will be:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299353" xreflabel=""/>ost.OSS.ost_io.threads_max=256
-</screen>
-        <itemizedlist><listitem>
-            <para><anchor xml:id="dbdoclet.50438271_pgfId-1299368" xreflabel=""/> To check if the new threads_max setting is active, run:</para>
-          </listitem>
-
-</itemizedlist>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299375" xreflabel=""/># lctl get_param ost.OSS.ost_io.threads_max
-</screen>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1299381" xreflabel=""/>The command output will be similar to this:</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1299382" xreflabel=""/>ost.OSS.ost_io.threads_max=256
-</screen>
-                <note><para>Currently, the maximum thread count setting is advisory because Lustre does not reduce the number of service threads in use, even if that number exceeds the threads_max value. Lustre does not stop service threads once they are started.</para></note>
-      </section>
+      </para>
     </section>
-    <section xml:id="dbdoclet.50438271_83523">
-      <title>31.3 Debug</title>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290884" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/debug</emphasis></para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1298213" xreflabel=""/>By default, Lustre generates a detailed log of all operations to aid in debugging. The level of debugging can affect the performance or speed you achieve with Lustre. Therefore, it is useful to reduce this overhead by turning down the debug level<footnote><para><anchor xml:id="dbdoclet.50438271_pgfId-1298216" xreflabel=""/>This controls the level of Lustre debugging kept in the internal log buffer. It does not alter the level of debugging that goes to syslog.</para></footnote> to improve performance. Raise the debug level when you need to collect the logs for debugging problems. The debugging mask can be set with &quot;symbolic names&quot; instead of the numerical values that were used in prior releases. The new symbolic format is shown in the examples below.</para>
-              <note><para>All of the commands below must be run as root; note the # nomenclature.</para></note>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1298218" xreflabel=""/>To verify the debug level used by examining the sysctl that controls debugging, run:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297891" xreflabel=""/># sysctl lnet.debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297995" xreflabel=""/>lnet.debug = ioctl neterror warning error emerg ha config console
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1297894" xreflabel=""/>To turn off debugging (except for network error debugging), run this command on all concerned nodes:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297895" xreflabel=""/># sysctl -w lnet.debug=&quot;neterror&quot; 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297938" xreflabel=""/>lnet.debug = neterror
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1298239" xreflabel=""/>To turn off debugging completely, run this command on all concerned nodes:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1298240" xreflabel=""/># sysctl -w lnet.debug=0 
-<anchor xml:id="dbdoclet.50438271_pgfId-1298657" xreflabel=""/>lnet.debug = 0
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1298237" xreflabel=""/>To set an appropriate debug level for a production environment, run:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1298260" xreflabel=""/># sysctl -w lnet.debug=&quot;warning dlmtrace error emerg ha rpctrace vfstrace&quot; 
-<anchor xml:id="dbdoclet.50438271_pgfId-1298658" xreflabel=""/>lnet.debug = warning dlmtrace error emerg ha rpctrace vfstrace
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1298258" xreflabel=""/>The flags above collect enough high-level information to aid debugging, but they do not cause any serious performance impact.</para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1298069" xreflabel=""/>To clear all flags and set new ones, run:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297899" xreflabel=""/># sysctl -w lnet.debug=&quot;warning&quot; 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297944" xreflabel=""/>lnet.debug = warning
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1297902" xreflabel=""/>To add new flags to existing ones, prefix them with a &quot;+&quot;:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297903" xreflabel=""/># sysctl -w lnet.debug=&quot;+neterror +ha&quot; 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297950" xreflabel=""/>lnet.debug = +neterror +ha
-<anchor xml:id="dbdoclet.50438271_pgfId-1297905" xreflabel=""/># sysctl lnet.debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297953" xreflabel=""/>lnet.debug = neterror warning ha
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1297908" xreflabel=""/>To remove flags, prefix them with a &quot;-&quot;:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297909" xreflabel=""/># sysctl -w lnet.debug=&quot;-ha&quot; 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297962" xreflabel=""/>lnet.debug = -ha
-<anchor xml:id="dbdoclet.50438271_pgfId-1297911" xreflabel=""/># sysctl lnet.debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297974" xreflabel=""/>lnet.debug = neterror warning
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1297914" xreflabel=""/>You can verify and change the debug level using the /proc interface in Lustre. To use the flags with /proc, run:</para>
-      <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297916" xreflabel=""/># cat /proc/sys/lnet/debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1298027" xreflabel=""/>neterror warning
-<anchor xml:id="dbdoclet.50438271_pgfId-1298104" xreflabel=""/># echo &quot;+ha&quot; &gt; /proc/sys/lnet/debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1298108" xreflabel=""/># cat /proc/sys/lnet/debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1298037" xreflabel=""/>neterror warning ha
-<anchor xml:id="dbdoclet.50438271_pgfId-1297921" xreflabel=""/># echo &quot;-warning&quot; &gt; /proc/sys/lnet/debug
-<anchor xml:id="dbdoclet.50438271_pgfId-1297922" xreflabel=""/># cat /proc/sys/lnet/debug 
-<anchor xml:id="dbdoclet.50438271_pgfId-1298040" xreflabel=""/>neterror ha
-</screen>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290897" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/subsystem_debug</emphasis></para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290898" xreflabel=""/>This controls the debug logs3 for subsystems (see S_* definitions).</para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290899" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/debug_path</emphasis></para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290900" xreflabel=""/>This indicates the location where debugging symbols should be stored for gdb. The default is set to /r/tmp/lustre-log-localhost.localdomain.</para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1290901" xreflabel=""/>These values can also be set via sysctl -w lnet.debug={value}</para>
-              <note><para>The above entries only exist when Lustre has already been loaded.</para></note>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1294968" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/panic_on_lbug</emphasis></para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1294969" xreflabel=""/>This causes Lustre to call &apos;&apos;panic&apos;&apos; when it detects an internal problem (an LBUG); panic crashes the node. This is particularly useful when a kernel crash dump utility is configured. The crash dump is triggered when the internal inconsistency is detected by Lustre.</para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1294977" xreflabel=""/><emphasis role="bold">/proc/sys/lnet/upcall</emphasis></para>
-      <para><anchor xml:id="dbdoclet.50438271_pgfId-1294978" xreflabel=""/>This allows you to specify the path to the binary which will be invoked when an LBUG is encountered. This binary is called with four parameters. The first one is the string &apos;&apos;LBUG&apos;&apos;. The second one is the file where the LBUG occurred. The third one is the function name. The fourth one is the line number in the file.</para>
-      <section remap="h3">
-        <title><anchor xml:id="dbdoclet.50438271_pgfId-1290905" xreflabel=""/>31.3.1 RPC Information for Other OBD Devices</title>
-        <para><anchor xml:id="dbdoclet.50438271_pgfId-1290906" xreflabel=""/>Some OBD devices maintain a count of the number of RPC events that they process. Sometimes these events are more specific to operations of the device, like llite, than actual raw RPC counts.</para>
-        <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290907" xreflabel=""/>$ find /proc/fs/lustre/ -name stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290908" xreflabel=""/>/proc/fs/lustre/osc/lustre-OST0001-osc-ce63ca00/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290909" xreflabel=""/>/proc/fs/lustre/osc/lustre-OST0000-osc-ce63ca00/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290910" xreflabel=""/>/proc/fs/lustre/osc/lustre-OST0001-osc/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290911" xreflabel=""/>/proc/fs/lustre/osc/lustre-OST0000-osc/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290912" xreflabel=""/>/proc/fs/lustre/mdt/MDS/mds_readpage/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290913" xreflabel=""/>/proc/fs/lustre/mdt/MDS/mds_setattr/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290914" xreflabel=""/>/proc/fs/lustre/mdt/MDS/mds/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290915" xreflabel=""/>/proc/fs/lustre/mds/lustre-MDT0000/exports/ab206805-0630-6647-8543-d24265c9\
-1a3d/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290916" xreflabel=""/>/proc/fs/lustre/mds/lustre-MDT0000/exports/08ac6584-6c4a-3536-2c6d-b36cf9cb\
-daa0/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290917" xreflabel=""/>/proc/fs/lustre/mds/lustre-MDT0000/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290918" xreflabel=""/>/proc/fs/lustre/ldlm/services/ldlm_canceld/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290919" xreflabel=""/>/proc/fs/lustre/ldlm/services/ldlm_cbd/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290920" xreflabel=""/>/proc/fs/lustre/llite/lustre-ce63ca00/stats
-</screen>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1290921" xreflabel=""/>31.3.1.1 Interpreting OST Statistics</title>
-          <note><para>See also <xref linkend="dbdoclet.50438219_84890"/> (llobdstat) and <xref linend="dbdoclet.50438273_80593"/> (CollectL).</para></note>
-
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1301139" xreflabel=""/>The OST .../stats files can be used to track client statistics (client activity) for each OST. It is possible to get a periodic dump of values from these file (for example, every 10 seconds), that show the RPC rates (similar to iostat) by using the llstat.pl tool:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290922" xreflabel=""/># llstat /proc/fs/lustre/osc/lustre-OST0000-osc/stats 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290923" xreflabel=""/>/usr/bin/llstat: STATS on 09/14/07 /proc/fs/lustre/osc/lustre-OST0000-osc/s\
-tats on 192.168.10.34@tcp
-<anchor xml:id="dbdoclet.50438271_pgfId-1290924" xreflabel=""/>snapshot_time                      1189732762.835363
-<anchor xml:id="dbdoclet.50438271_pgfId-1290925" xreflabel=""/>ost_create                 1
-<anchor xml:id="dbdoclet.50438271_pgfId-1290926" xreflabel=""/>ost_get_info                       1
-<anchor xml:id="dbdoclet.50438271_pgfId-1290927" xreflabel=""/>ost_connect                        1
-<anchor xml:id="dbdoclet.50438271_pgfId-1290928" xreflabel=""/>ost_set_info                       1
-<anchor xml:id="dbdoclet.50438271_pgfId-1290929" xreflabel=""/>obd_ping                   212
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1290930" xreflabel=""/>To clear the statistics, give the -c option to llstat.pl. To specify how frequently the statistics should be cleared (in seconds), use an integer for the -i option. This is sample output with -c and -i10 options used, providing statistics every 10s):</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1290931" xreflabel=""/>$ llstat -c -i10 /proc/fs/lustre/ost/OSS/ost_io/stats
-<anchor xml:id="dbdoclet.50438271_pgfId-1290932" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290933" xreflabel=""/>/usr/bin/llstat: STATS on 06/06/07 /proc/fs/lustre/ost/OSS/ost_io/ stats on\
- 192.168.16.35@tcp
-<anchor xml:id="dbdoclet.50438271_pgfId-1290934" xreflabel=""/>snapshot_time                              1181074093.276072
-<anchor xml:id="dbdoclet.50438271_pgfId-1290935" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290936" xreflabel=""/>/proc/fs/lustre/ost/OSS/ost_io/stats @ 1181074103.284895
-<anchor xml:id="dbdoclet.50438271_pgfId-1290937" xreflabel=""/>Name               Cur.Count       Cur.Rate        #Events Unit            \
-\last               min             avg             max             stddev
-<anchor xml:id="dbdoclet.50438271_pgfId-1290938" xreflabel=""/>req_waittime       8               0               8       [usec]          \
-2078\               34              259.75          868             317.49
-<anchor xml:id="dbdoclet.50438271_pgfId-1290939" xreflabel=""/>req_qdepth 8               0               8       [reqs]          1\      \
-    0               0.12            1               0.35
-<anchor xml:id="dbdoclet.50438271_pgfId-1290940" xreflabel=""/>req_active 8               0               8       [reqs]          11\     \
-            1               1.38            2               0.52
-<anchor xml:id="dbdoclet.50438271_pgfId-1290941" xreflabel=""/>reqbuf_avail       8               0               8       [bufs]          \
-511\                63              63.88           64              0.35
-<anchor xml:id="dbdoclet.50438271_pgfId-1290942" xreflabel=""/>ost_write  8               0               8       [bytes]         1697677\\
-    72914           212209.62       387579          91874.29
-<anchor xml:id="dbdoclet.50438271_pgfId-1290943" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290944" xreflabel=""/>/proc/fs/lustre/ost/OSS/ost_io/stats @ 1181074113.290180
-<anchor xml:id="dbdoclet.50438271_pgfId-1290945" xreflabel=""/>Name               Cur.Count       Cur.Rate        #Events Unit            \
-\last               min             avg             max             stddev
-<anchor xml:id="dbdoclet.50438271_pgfId-1290946" xreflabel=""/>req_waittime       31              3               39      [usec]          \
-30011\              34              822.79          12245           2047.71
-<anchor xml:id="dbdoclet.50438271_pgfId-1290947" xreflabel=""/>req_qdepth 31              3               39      [reqs]          0\      \
-    0               0.03            1               0.16
-<anchor xml:id="dbdoclet.50438271_pgfId-1290948" xreflabel=""/>req_active 31              3               39      [reqs]          58\     \
-    1               1.77            3               0.74
-<anchor xml:id="dbdoclet.50438271_pgfId-1290949" xreflabel=""/>reqbuf_avail       31              3               39      [bufs]          \
-1977\               63              63.79           64              0.41
-<anchor xml:id="dbdoclet.50438271_pgfId-1290950" xreflabel=""/>ost_write  30              3               38      [bytes]         10284679\
-\   15019           315325.16       910694          197776.51
-<anchor xml:id="dbdoclet.50438271_pgfId-1290951" xreflabel=""/> 
-<anchor xml:id="dbdoclet.50438271_pgfId-1290952" xreflabel=""/>/proc/fs/lustre/ost/OSS/ost_io/stats @ 1181074123.325560
-<anchor xml:id="dbdoclet.50438271_pgfId-1290953" xreflabel=""/>Name               Cur.Count       Cur.Rate        #Events Unit            \
-\last               min             avg             max             stddev
-<anchor xml:id="dbdoclet.50438271_pgfId-1290954" xreflabel=""/>req_waittime       21              2               60      [usec]          \
-14970\              34              784.32          12245           1878.66
-<anchor xml:id="dbdoclet.50438271_pgfId-1290955" xreflabel=""/>req_qdepth 21              2               60      [reqs]          0\      \
-    0               0.02            1               0.13
-<anchor xml:id="dbdoclet.50438271_pgfId-1290956" xreflabel=""/>req_active 21              2               60      [reqs]          33\     \
-            1               1.70            3               0.70
-<anchor xml:id="dbdoclet.50438271_pgfId-1290957" xreflabel=""/>reqbuf_avail       21              2               60      [bufs]          \
-1341\               63              63.82           64              0.39
-<anchor xml:id="dbdoclet.50438271_pgfId-1290958" xreflabel=""/>ost_write  21              2               59      [bytes]         7648424\\
-    15019           332725.08       910694          180397.87
-</screen>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1292572" xreflabel=""/>Where:</para>
+  </section>
+  <section remap="h3">
+    <title><indexterm>
+        <primary>proc</primary>
+        <secondary>LNet</secondary>
+      </indexterm><indexterm>
+        <primary>LNet</primary>
+        <secondary>proc</secondary>
+      </indexterm>Monitoring LNet</title>
+    <para>LNet information is located via <literal>lctl get_param</literal>
+      in these parameters:
+      <itemizedlist>
+        <listitem>
+          <para><literal>peers</literal> - Shows all NIDs known to this node
+            and provides information on the queue state.</para>
+          <para>Example:</para>
+          <screen># lctl get_param peers
+nid                refs   state  max  rtr  min   tx    min   queue
+0@lo               1      ~rtr   0    0    0     0     0     0
+192.168.10.35@tcp  1      ~rtr   8    8    8     8     6     0
+192.168.10.36@tcp  1      ~rtr   8    8    8     8     6     0
+192.168.10.37@tcp  1      ~rtr   8    8    8     8     6     0</screen>
+          <para>The fields are explained in the table below:</para>
           <informaltable frame="all">
             <tgroup cols="2">
-              <colspec colname="c1" colwidth="50*"/>
-              <colspec colname="c2" colwidth="50*"/>
+              <colspec colname="c1" colwidth="30*"/>
+              <colspec colname="c2" colwidth="80*"/>
               <thead>
                 <row>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292580" xreflabel=""/>Parameter</emphasis></para></entry>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292582" xreflabel=""/>Description</emphasis></para></entry>
+                  <entry>
+                    <para><emphasis role="bold">Field</emphasis></para>
+                  </entry>
+                  <entry>
+                    <para><emphasis role="bold">Description</emphasis></para>
+                  </entry>
                 </row>
               </thead>
               <tbody>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292631" xreflabel=""/><emphasis role="bold">Cur. Count</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292633" xreflabel=""/>Number of events of each type sent in the last interval (in this example, 10s)</para></entry>
-                </row>
-                <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292635" xreflabel=""/><emphasis role="bold">Cur. Rate</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292637" xreflabel=""/>Number of events per second in the last interval</para></entry>
+                  <entry>
+                    <para>
+                      <literal>refs</literal>
+                    </para>
+                  </entry>
+                  <entry>
+                    <para>A reference count. </para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292639" xreflabel=""/><emphasis role="bold">#Events</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292641" xreflabel=""/>Total number of such events since the system started</para></entry>
+                  <entry>
+                    <para>
+                      <literal>state</literal>
+                    </para>
+                  </entry>
+                  <entry>
+                    <para>If the node is a router, indicates the state of the router. Possible
+                      values are:</para>
+                    <itemizedlist>
+                      <listitem>
+                        <para><literal>NA</literal> - Indicates the node is not a router.</para>
+                      </listitem>
+                      <listitem>
+                        <para><literal>up/down</literal>- Indicates if the node (router) is up or
+                          down.</para>
+                      </listitem>
+                    </itemizedlist>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292643" xreflabel=""/><emphasis role="bold">Unit</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292645" xreflabel=""/>Unit of measurement for that statistic (microseconds, requests, buffers)</para></entry>
+                  <entry>
+                    <para>
+                      <literal>max </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Maximum number of concurrent sends from this peer.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292647" xreflabel=""/><emphasis role="bold">last</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292649" xreflabel=""/>Average rate of these events (in units/event) for the last interval during which they arrived. For instance, in the above mentioned case of ost_destroy it took an average of 736 microseconds per destroy for the 400 object destroys in the previous 10 seconds.</para></entry>
+                  <entry>
+                    <para>
+                      <literal>rtr </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Number of available routing buffer credits.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292651" xreflabel=""/><emphasis role="bold">min</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292653" xreflabel=""/>Minimum rate (in units/events) since the service started</para></entry>
+                  <entry>
+                    <para>
+                      <literal>min </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Minimum number of routing buffer credits seen.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292655" xreflabel=""/><emphasis role="bold">avg</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292657" xreflabel=""/>Average rate</para></entry>
+                  <entry>
+                    <para>
+                      <literal>tx </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Number of available send credits.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292659" xreflabel=""/><emphasis role="bold">max</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292661" xreflabel=""/>Maximum rate</para></entry>
+                  <entry>
+                    <para>
+                      <literal>min </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Minimum number of send credits seen.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292663" xreflabel=""/><emphasis role="bold">stddev</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292665" xreflabel=""/>Standard deviation (not measured in all cases)</para></entry>
+                  <entry>
+                    <para>
+                      <literal>queue </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Total bytes in active/queued sends.</para>
+                  </entry>
                 </row>
               </tbody>
             </tgroup>
           </informaltable>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1292739" xreflabel=""/>The events common to all services are:</para>
+          <para>Credits are initialized to allow a certain number of operations
+            (in the example above the table, eight as shown in the
+            <literal>max</literal> column. LNet keeps track of the minimum
+            number of credits ever seen over time showing the peak congestion
+            that has occurred during the time monitored. Fewer available credits
+            indicates a more congested resource. </para>
+          <para>The number of credits currently available is shown in the
+            <literal>tx</literal> column. The maximum number of send credits is
+            shown in the <literal>max</literal> column and never changes. The
+            number of currently active transmits can be derived by
+            <literal>(max - tx)</literal>, as long as
+            <literal>tx</literal> is greater than or equal to 0. Once
+            <literal>tx</literal> is less than 0, it indicates the number of
+            transmits on that peer which have been queued for lack of credits.
+          </para>
+          <para>The number of router buffer credits available for consumption
+            by a peer is shown in <literal>rtr</literal> column. The number of
+            routing credits can be configured separately at the LND level or at
+            the LNet level by using the <literal>peer_buffer_credits</literal>
+            module parameter for the appropriate module. If the routing credits
+            is not set explicitly, it'll default to the maximum transmit credits
+            defined by <literal>peer_credits</literal> module parameter.
+            Whenever a gateway routes a message from a peer, it decrements the
+            number of available routing credits for that peer. If that value
+            goes to zero, then messages will be queued. Negative values show the
+            number of queued message waiting to be routed. The number of
+            messages which are currently being routed from a peer can be derived
+            by <literal>(max_rtr_credits - rtr)</literal>.</para>
+          <para>LNet also limits concurrent sends and number of router buffers
+            allocated to a single peer so that no peer can occupy all resources.
+          </para>
+        </listitem>
+        <listitem>
+          <para><literal>nis</literal> - Shows current queue health on the node.
+          </para>
+          <para>Example:</para>
+          <screen># lctl get_param nis
+nid                    refs   peer    max   tx    min
+0@lo                   3      0       0     0     0
+192.168.10.34@tcp      4      8       256   256   252
+</screen>
+          <para> The fields are explained in the table below.</para>
           <informaltable frame="all">
             <tgroup cols="2">
-              <colspec colname="c1" colwidth="50*"/>
-              <colspec colname="c2" colwidth="50*"/>
+              <colspec colname="c1" colwidth="30*"/>
+              <colspec colname="c2" colwidth="80*"/>
               <thead>
                 <row>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292673" xreflabel=""/>Parameter</emphasis></para></entry>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292675" xreflabel=""/>Description</emphasis></para></entry>
+                  <entry>
+                    <para><emphasis role="bold">Field</emphasis></para>
+                  </entry>
+                  <entry>
+                    <para><emphasis role="bold">Description</emphasis></para>
+                  </entry>
                 </row>
               </thead>
               <tbody>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292720" xreflabel=""/><emphasis role="bold">req_waittime</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292722" xreflabel=""/>Amount of time a request waited in the queue before being handled by an available server thread.</para></entry>
+                  <entry>
+                    <para>
+                      <literal> nid </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Network interface.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292724" xreflabel=""/><emphasis role="bold">req_qdepth</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292726" xreflabel=""/>Number of requests waiting to be handled in the queue for this service.</para></entry>
+                  <entry>
+                    <para>
+                      <literal> refs </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Internal reference counter.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292728" xreflabel=""/><emphasis role="bold">req_active</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292730" xreflabel=""/>Number of requests currently being handled.</para></entry>
+                  <entry>
+                    <para>
+                      <literal> peer </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Number of peer-to-peer send credits on this NID. Credits are used to size
+                      buffer pools.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292732" xreflabel=""/><emphasis role="bold">reqbuf_avail</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292734" xreflabel=""/>Number of unsolicited lnet request buffers for this service.</para></entry>
+                  <entry>
+                    <para>
+                      <literal> max </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Total number of send credits on this NID.</para>
+                  </entry>
                 </row>
-              </tbody>
-            </tgroup>
-          </informaltable>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1291033" xreflabel=""/>Some service-specific events of interest are:</para>
-          <informaltable frame="all">
-            <tgroup cols="2">
-              <colspec colname="c1" colwidth="50*"/>
-              <colspec colname="c2" colwidth="50*"/>
-              <thead>
                 <row>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292473" xreflabel=""/>Parameter</emphasis></para></entry>
-                  <entry><para><emphasis role="bold"><anchor xml:id="dbdoclet.50438271_pgfId-1292475" xreflabel=""/>Description</emphasis></para></entry>
+                  <entry>
+                    <para>
+                      <literal> tx </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Current number of send credits available on this NID.</para>
+                  </entry>
                 </row>
-              </thead>
-              <tbody>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292507" xreflabel=""/><emphasis role="bold">ldlm_enqueue</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292509" xreflabel=""/>Time it takes to enqueue a lock (this includes file open on the MDS)</para></entry>
+                  <entry>
+                    <para>
+                      <literal> min </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Lowest number of send credits available on this NID.</para>
+                  </entry>
                 </row>
                 <row>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292511" xreflabel=""/><emphasis role="bold">mds_reint</emphasis></para></entry>
-                  <entry><para> <anchor xml:id="dbdoclet.50438271_pgfId-1292513" xreflabel=""/>Time it takes to process an MDS modification record (includes create, mkdir, unlink, rename and setattr)</para></entry>
+                  <entry>
+                    <para>
+                      <literal> queue </literal></para>
+                  </entry>
+                  <entry>
+                    <para>Total bytes in active/queued sends.</para>
+                  </entry>
                 </row>
               </tbody>
             </tgroup>
           </informaltable>
-        </section>
-        <section remap="h4">
-          <title><anchor xml:id="dbdoclet.50438271_pgfId-1291461" xreflabel=""/>31.3.1.2 Interpreting MDT Statistics</title>
-          <note><para>See also <xref linkend="dbdoclet.50438219_84890"/> (llobdstat) and <xref linkend="dbdoclet.50438273_80593"/> (CollectL).</para></note>
-          <para><anchor xml:id="dbdoclet.50438271_pgfId-1297382" xreflabel=""/>The MDT .../stats files can be used to track MDT statistics for the MDS. Here is sample output for an MDT stats file:</para>
-          <screen><anchor xml:id="dbdoclet.50438271_pgfId-1297438" xreflabel=""/># cat /proc/fs/lustre/mds/*-MDT0000/stats 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297466" xreflabel=""/>snapshot_time                              1244832003.676892 secs.usecs 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297469" xreflabel=""/>open                                       2 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297472" xreflabel=""/>close                                      1 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297475" xreflabel=""/>getxattr                           3 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297478" xreflabel=""/>process_config                             1 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297481" xreflabel=""/>connect                                    2 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297484" xreflabel=""/>disconnect                         2 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297487" xreflabel=""/>statfs                                     3 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297490" xreflabel=""/>setattr                                    1 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297493" xreflabel=""/>getattr                                    3 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297496" xreflabel=""/>llog_init                          6 samples [reqs] 
-<anchor xml:id="dbdoclet.50438271_pgfId-1297499" xreflabel=""/>notify                                     16 samples [reqs]
+          <para><emphasis role="bold"><emphasis role="italic">Analysis:</emphasis></emphasis></para>
+          <para>Subtracting <literal>max</literal> from <literal>tx</literal>
+              (<literal>max</literal> - <literal>tx</literal>) yields the number of sends currently
+            active. A large or increasing number of active sends may indicate a problem.</para>
+        </listitem>
+      </itemizedlist></para>
+  </section>
+  <section remap="h3" xml:id="dbdoclet.balancing_free_space">
+    <title><indexterm>
+        <primary>proc</primary>
+        <secondary>free space</secondary>
+      </indexterm>Allocating Free Space on OSTs</title>
+    <para>Free space is allocated using either a round-robin or a weighted
+    algorithm. The allocation method is determined by the maximum amount of
+    free-space imbalance between the OSTs. When free space is relatively
+    balanced across OSTs, the faster round-robin allocator is used, which
+    maximizes network balancing. The weighted allocator is used when any two
+    OSTs are out of balance by more than a specified threshold.</para>
+    <para>Free space distribution can be tuned using these two
+    tunable parameters:</para>
+    <itemizedlist>
+      <listitem>
+        <para><literal>lod.*.qos_threshold_rr</literal> - The threshold at which
+        the allocation method switches from round-robin to weighted is set
+        in this file. The default is to switch to the weighted algorithm when
+        any two OSTs are out of balance by more than 17 percent.</para>
+      </listitem>
+      <listitem>
+        <para><literal>lod.*.qos_prio_free</literal> - The weighting priority
+        used by the weighted allocator can be adjusted in this file. Increasing
+        the value of <literal>qos_prio_free</literal> puts more weighting on the
+        amount of free space available on each OST and less on how stripes are
+        distributed across OSTs. The default value is 91 percent weighting for
+        free space rebalancing and 9 percent for OST balancing. When the
+        free space priority is set to 100, weighting is based entirely on free
+        space and location is no longer used by the striping algorithm.</para>
+      </listitem>
+      <listitem>
+        <para condition="l29"><literal>osp.*.reserved_mb_low</literal>
+          - The low watermark used to stop object allocation if available space
+          is less than this. The default is 0.1% of total OST size.</para>
+      </listitem>
+       <listitem>
+        <para condition="l29"><literal>osp.*.reserved_mb_high</literal>
+          - The high watermark used to start object allocation if available
+          space is more than this. The default is 0.2% of total OST size.</para>
+      </listitem>
+    </itemizedlist>
+    <para>For more information about monitoring and managing free space, see <xref
+        xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438209_10424"/>.</para>
+  </section>
+  <section remap="h3">
+    <title><indexterm>
+        <primary>proc</primary>
+        <secondary>locking</secondary>
+      </indexterm>Configuring Locking</title>
+    <para>The <literal>lru_size</literal> parameter is used to control the
+      number of client-side locks in the LRU cached locks queue. LRU size is
+      normally dynamic, based on load to optimize the number of locks cached
+      on nodes that have different workloads (e.g., login/build nodes vs.
+      compute nodes vs. backup nodes).</para>
+    <para>The total number of locks available is a function of the server RAM.
+      The default limit is 50 locks/1 MB of RAM. If memory pressure is too high,
+      the LRU size is shrunk. The number of locks on the server is limited to
+      <replaceable>num_osts_per_oss * num_clients * lru_size</replaceable>
+      as follows: </para>
+    <itemizedlist>
+      <listitem>
+        <para>To enable automatic LRU sizing, set the
+        <literal>lru_size</literal> parameter to 0. In this case, the
+        <literal>lru_size</literal> parameter shows the current number of locks
+        being used on the client. Dynamic LRU resizing is enabled by default.
+        </para>
+      </listitem>
+      <listitem>
+        <para>To specify a maximum number of locks, set the
+        <literal>lru_size</literal> parameter to a value other than zero.
+        A good default value for compute nodes is around
+        <literal>100 * <replaceable>num_cpus</replaceable></literal>.
+        It is recommended that you only set <literal>lru_size</literal>
+        to be signifivantly larger on a few login nodes where multiple
+        users access the file system interactively.</para>
+      </listitem>
+    </itemizedlist>
+    <para>To clear the LRU on a single client, and, as a result, flush client
+      cache without changing the <literal>lru_size</literal> value, run:</para>
+    <screen># lctl set_param ldlm.namespaces.<replaceable>osc_name|mdc_name</replaceable>.lru_size=clear</screen>
+    <para>If the LRU size is set lower than the number of existing locks,
+      <emphasis>unused</emphasis> locks are canceled immediately. Use
+      <literal>clear</literal> to cancel all locks without changing the value.
+    </para>
+    <note>
+      <para>The <literal>lru_size</literal> parameter can only be set
+        temporarily using <literal>lctl set_param</literal>, it cannot be set
+        permanently.</para>
+    </note>
+    <para>To disable dynamic LRU resizing on the clients, run for example:
+    </para>
+    <screen># lctl set_param ldlm.namespaces.*osc*.lru_size=5000</screen>
+    <para>To determine the number of locks being granted with dynamic LRU
+      resizing, run:</para>
+    <screen>$ lctl get_param ldlm.namespaces.*.pool.limit</screen>
+    <para>The <literal>lru_max_age</literal> parameter is used to control the
+      age of client-side locks in the LRU cached locks queue. This limits how
+      long unused locks are cached on the client, and avoids idle clients from
+      holding locks for an excessive time, which reduces memory usage on both
+      the client and server, as well as reducing work during server recovery.
+    </para>
+    <para>The <literal>lru_max_age</literal> is set and printed in milliseconds,
+      and by default is 3900000 ms (65 minutes).</para>
+    <para condition='l2B'>Since Lustre 2.11, in addition to setting the
+      maximum lock age in milliseconds, it can also be set using a suffix of
+      <literal>s</literal> or <literal>ms</literal> to indicate seconds or
+      milliseconds, respectively.  For example to set the client's maximum
+      lock age to 15 minutes (900s) run:
+    </para>
+    <screen>
+# lctl set_param ldlm.namespaces.*MDT*.lru_max_age=900s
+# lctl get_param ldlm.namespaces.*MDT*.lru_max_age
+ldlm.namespaces.myth-MDT0000-mdc-ffff8804296c2800.lru_max_age=900000
+    </screen>
+  </section>
+  <section xml:id="dbdoclet.50438271_87260">
+    <title><indexterm>
+        <primary>proc</primary>
+        <secondary>thread counts</secondary>
+      </indexterm>Setting MDS and OSS Thread Counts</title>
+    <para>MDS and OSS thread counts tunable can be used to set the minimum and maximum thread counts
+      or get the current number of running threads for the services listed in the table
+      below.</para>
+    <informaltable frame="all">
+      <tgroup cols="2">
+        <colspec colname="c1" colwidth="50*"/>
+        <colspec colname="c2" colwidth="50*"/>
+        <tbody>
+          <row>
+            <entry>
+              <para>
+                <emphasis role="bold">Service</emphasis></para>
+            </entry>
+            <entry>
+              <para>
+                <emphasis role="bold">Description</emphasis></para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> mds.MDS.mdt </literal>
+            </entry>
+            <entry>
+              <para>Main metadata operations service</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> mds.MDS.mdt_readpage </literal>
+            </entry>
+            <entry>
+              <para>Metadata <literal>readdir</literal> service</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> mds.MDS.mdt_setattr </literal>
+            </entry>
+            <entry>
+              <para>Metadata <literal>setattr/close</literal> operations service </para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> ost.OSS.ost </literal>
+            </entry>
+            <entry>
+              <para>Main data operations service</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> ost.OSS.ost_io </literal>
+            </entry>
+            <entry>
+              <para>Bulk data I/O services</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> ost.OSS.ost_create </literal>
+            </entry>
+            <entry>
+              <para>OST object pre-creation service</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> ldlm.services.ldlm_canceld </literal>
+            </entry>
+            <entry>
+              <para>DLM lock cancel service</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <literal> ldlm.services.ldlm_cbd </literal>
+            </entry>
+            <entry>
+              <para>DLM lock grant service</para>
+            </entry>
+          </row>
+        </tbody>
+      </tgroup>
+    </informaltable>
+    <para>For each service, tunable parameters as shown below are available.
+    </para>
+    <itemizedlist>
+      <listitem>
+        <para>To temporarily set these tunables, run:</para>
+        <screen># lctl set_param <replaceable>service</replaceable>.threads_<replaceable>min|max|started=num</replaceable> </screen>
+        </listitem>
+      <listitem>
+        <para>To permanently set this tunable, run the following command on
+        the MGS:
+        <screen>mgs# lctl set_param -P <replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></screen></para>
+        <para condition='l25'>For Lustre 2.5 or earlier, run:
+        <screen>mgs# lctl conf_param <replaceable>obdname|fsname.obdtype</replaceable>.threads_<replaceable>min|max|started</replaceable></screen>
+        </para>
+      </listitem>
+    </itemizedlist>
+      <para>The following examples show how to set thread counts and get the
+        number of running threads for the service <literal>ost_io</literal>
+        using the tunable
+        <literal><replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></literal>.</para>
+    <itemizedlist>
+      <listitem>
+        <para>To get the number of running threads, run:</para>
+        <screen># lctl get_param ost.OSS.ost_io.threads_started
+ost.OSS.ost_io.threads_started=128</screen>
+      </listitem>
+      <listitem>
+        <para>To set the number of threads to the maximum value (512), run:</para>
+        <screen># lctl get_param ost.OSS.ost_io.threads_max
+ost.OSS.ost_io.threads_max=512</screen>
+      </listitem>
+      <listitem>
+        <para>To set the maximum thread count to 256 instead of 512 (to avoid overloading the
+          storage or for an array with requests), run:</para>
+        <screen># lctl set_param ost.OSS.ost_io.threads_max=256
+ost.OSS.ost_io.threads_max=256</screen>
+      </listitem>
+      <listitem>
+        <para>To set the maximum thread count to 256 instead of 512 permanently, run:</para>
+        <screen># lctl conf_param testfs.ost.ost_io.threads_max=256</screen>
+        <para condition='l25'>For version 2.5 or later, run:
+        <screen># lctl set_param -P ost.OSS.ost_io.threads_max=256
+ost.OSS.ost_io.threads_max=256 </screen> </para>
+      </listitem>
+      <listitem>
+        <para> To check if the <literal>threads_max</literal> setting is active, run:</para>
+        <screen># lctl get_param ost.OSS.ost_io.threads_max
+ost.OSS.ost_io.threads_max=256</screen>
+      </listitem>
+    </itemizedlist>
+    <note>
+      <para>If the number of service threads is changed while the file system is running, the change
+        may not take effect until the file system is stopped and rest. If the number of service
+        threads in use exceeds the new <literal>threads_max</literal> value setting, service threads
+        that are already running will not be stopped.</para>
+    </note>
+    <para>See also <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="lustretuning"/></para>
+  </section>
+  <section xml:id="dbdoclet.50438271_83523">
+    <title><indexterm>
+        <primary>proc</primary>
+        <secondary>debug</secondary>
+      </indexterm>Enabling and Interpreting Debugging Logs</title>
+    <para>By default, a detailed log of all operations is generated to aid in
+      debugging. Flags that control debugging are found via
+      <literal>lctl get_param debug</literal>.</para>
+    <para>The overhead of debugging can affect the performance of Lustre file
+      system. Therefore, to minimize the impact on performance, the debug level
+      can be lowered, which affects the amount of debugging information kept in
+      the internal log buffer but does not alter the amount of information to
+      goes into syslog. You can raise the debug level when you need to collect
+      logs to debug problems. </para>
+    <para>The debugging mask can be set using &quot;symbolic names&quot;. The
+      symbolic format is shown in the examples below.
+      <itemizedlist>
+        <listitem>
+          <para>To verify the debug level used, examine the parameter that
+            controls debugging by running:</para>
+          <screen># lctl get_param debug 
+debug=
+ioctl neterror warning error emerg ha config console</screen>
+        </listitem>
+        <listitem>
+          <para>To turn off debugging except for network error debugging, run
+          the following command on all nodes concerned:</para>
+          <screen># sysctl -w lnet.debug=&quot;neterror&quot; 
+debug=neterror</screen>
+        </listitem>
+      </itemizedlist>
+      <itemizedlist>
+        <listitem>
+          <para>To turn off debugging completely (except for the minimum error
+            reporting to the console), run the following command on all nodes
+            concerned:</para>
+          <screen># lctl set_param debug=0 
+debug=0</screen>
+        </listitem>
+        <listitem>
+          <para>To set an appropriate debug level for a production environment,
+            run:</para>
+          <screen># lctl set_param debug=&quot;warning dlmtrace error emerg ha rpctrace vfstrace&quot; 
+debug=warning dlmtrace error emerg ha rpctrace vfstrace</screen>
+          <para>The flags shown in this example collect enough high-level
+            information to aid debugging, but they do not cause any serious
+            performance impact.</para>
+        </listitem>
+      </itemizedlist>
+      <itemizedlist>
+        <listitem>
+          <para>To add new flags to flags that have already been set,
+            precede each one with a &quot;<literal>+</literal>&quot;:</para>
+          <screen># lctl set_param debug=&quot;+neterror +ha&quot; 
+debug=+neterror +ha
+# lctl get_param debug 
+debug=neterror warning error emerg ha console</screen>
+        </listitem>
+        <listitem>
+          <para>To remove individual flags, precede them with a
+            &quot;<literal>-</literal>&quot;:</para>
+          <screen># lctl set_param debug=&quot;-ha&quot; 
+debug=-ha
+# lctl get_param debug 
+debug=neterror warning error emerg console</screen>
+        </listitem>
+      </itemizedlist>
+    </para>
+    <para>Debugging parameters include:</para>
+    <itemizedlist>
+      <listitem>
+        <para><literal>subsystem_debug</literal> - Controls the debug logs for subsystems.</para>
+      </listitem>
+      <listitem>
+        <para><literal>debug_path</literal> - Indicates the location where the debug log is dumped
+          when triggered automatically or manually. The default path is
+            <literal>/tmp/lustre-log</literal>.</para>
+      </listitem>
+    </itemizedlist>
+    <para>These parameters can also be set using:<screen>sysctl -w lnet.debug={value}</screen></para>
+    <para>Additional useful parameters: <itemizedlist>
+        <listitem>
+          <para><literal>panic_on_lbug</literal> - Causes &apos;&apos;panic&apos;&apos; to be called
+            when the Lustre software detects an internal problem (an <literal>LBUG</literal> log
+            entry); panic crashes the node. This is particularly useful when a kernel crash dump
+            utility is configured. The crash dump is triggered when the internal inconsistency is
+            detected by the Lustre software. </para>
+        </listitem>
+        <listitem>
+          <para><literal>upcall</literal> - Allows you to specify the path to the binary which will
+            be invoked when an <literal>LBUG</literal> log entry is encountered. This binary is
+            called with four parameters:</para>
+          <para> - The string &apos;&apos;<literal>LBUG</literal>&apos;&apos;.</para>
+          <para> - The file where the <literal>LBUG</literal> occurred.</para>
+          <para> - The function name.</para>
+          <para> - The line number in the file</para>
+        </listitem>
+      </itemizedlist></para>
+    <section>
+      <title>Interpreting OST Statistics</title>
+      <note>
+        <para>See also
+            <xref linkend="dbdoclet.50438273_80593"/> (<literal>collectl</literal>).</para>
+      </note>
+      <para>OST <literal>stats</literal> files can be used to provide statistics showing activity
+        for each OST. For example:</para>
+      <screen># lctl get_param osc.testfs-OST0000-osc.stats 
+snapshot_time                      1189732762.835363
+ost_create                 1
+ost_get_info               1
+ost_connect                1
+ost_set_info               1
+obd_ping                   212</screen>
+      <para>Use the <literal>llstat</literal> utility to monitor statistics over time.</para>
+      <para>To clear the statistics, use the <literal>-c</literal> option to
+        <literal>llstat</literal>. To specify how frequently the statistics
+        should be reported (in seconds), use the <literal>-i</literal> option.
+        In the example below, the <literal>-c</literal> option clears the
+        statistics and <literal>-i10</literal> option reports statistics every
+        10 seconds:</para>
+<screen role="smaller">$ llstat -c -i10 ost_io
+/usr/bin/llstat: STATS on 06/06/07 
+        /proc/fs/lustre/ost/OSS/ost_io/ stats on 192.168.16.35@tcp
+snapshot_time                              1181074093.276072
+/proc/fs/lustre/ost/OSS/ost_io/stats @ 1181074103.284895
+Name        Cur.  Cur. #
+            Count Rate Events Unit  last   min    avg       max    stddev
+req_waittime 8    0    8    [usec]  2078   34     259.75    868    317.49
+req_qdepth   8    0    8    [reqs]  1      0      0.12      1      0.35
+req_active   8    0    8    [reqs]  11     1      1.38      2      0.52
+reqbuf_avail 8    0    8    [bufs]  511    63     63.88     64     0.35
+ost_write    8    0    8    [bytes] 169767 72914  212209.62 387579 91874.29
+/proc/fs/lustre/ost/OSS/ost_io/stats @ 1181074113.290180
+Name        Cur.  Cur. #
+            Count Rate Events Unit  last    min   avg       max    stddev
+req_waittime 31   3    39   [usec]  30011   34    822.79    12245  2047.71
+req_qdepth   31   3    39   [reqs]  0       0     0.03      1      0.16
+req_active   31   3    39   [reqs]  58      1     1.77      3      0.74
+reqbuf_avail 31   3    39   [bufs]  1977    63    63.79     64     0.41
+ost_write    30   3    38   [bytes] 1028467 15019 315325.16 910694 197776.51
+/proc/fs/lustre/ost/OSS/ost_io/stats @ 1181074123.325560
+Name        Cur.  Cur. #
+            Count Rate Events Unit  last    min    avg       max    stddev
+req_waittime 21   2    60   [usec]  14970   34     784.32    12245  1878.66
+req_qdepth   21   2    60   [reqs]  0       0      0.02      1      0.13
+req_active   21   2    60   [reqs]  33      1      1.70      3      0.70
+reqbuf_avail 21   2    60   [bufs]  1341    63     63.82     64     0.39
+ost_write    21   2    59   [bytes] 7648424 15019  332725.08 910694 180397.87
 </screen>
-        </section>
-      </section>
+      <para>The columns in this example are described in the table below.</para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="50*"/>
+          <colspec colname="c2" colwidth="50*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Parameter</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry><literal>Name</literal></entry>
+              <entry>Name of the service event.  See the tables below for descriptions of service
+                events that are tracked.</entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>Cur. Count </literal></para>
+              </entry>
+              <entry>
+                <para>Number of events of each type sent in the last interval.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal>Cur. Rate </literal></para>
+              </entry>
+              <entry>
+                <para>Number of events per second in the last interval.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> # Events </literal></para>
+              </entry>
+              <entry>
+                <para>Total number of such events since the events have been cleared.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> Unit </literal></para>
+              </entry>
+              <entry>
+                <para>Unit of measurement for that statistic (microseconds, requests,
+                  buffers).</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> last </literal></para>
+              </entry>
+              <entry>
+                <para>Average rate of these events (in units/event) for the last interval during
+                  which they arrived. For instance, in the above mentioned case of
+                    <literal>ost_destroy</literal> it took an average of 736 microseconds per
+                  destroy for the 400 object destroys in the previous 10 seconds.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> min </literal></para>
+              </entry>
+              <entry>
+                <para>Minimum rate (in units/events) since the service started.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> avg </literal></para>
+              </entry>
+              <entry>
+                <para>Average rate.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> max </literal></para>
+              </entry>
+              <entry>
+                <para>Maximum rate.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> stddev </literal></para>
+              </entry>
+              <entry>
+                <para>Standard deviation (not measured in some cases)</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <para>Events common to all services are shown in the table below.</para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="50*"/>
+          <colspec colname="c2" colwidth="50*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Parameter</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para>
+                  <literal> req_waittime </literal></para>
+              </entry>
+              <entry>
+                <para>Amount of time a request waited in the queue before being handled by an
+                  available server thread.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> req_qdepth </literal></para>
+              </entry>
+              <entry>
+                <para>Number of requests waiting to be handled in the queue for this service.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> req_active </literal></para>
+              </entry>
+              <entry>
+                <para>Number of requests currently being handled.</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> reqbuf_avail </literal></para>
+              </entry>
+              <entry>
+                <para>Number of unsolicited lnet request buffers for this service.</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+      <para>Some service-specific events of interest are described in the table below.</para>
+      <informaltable frame="all">
+        <tgroup cols="2">
+          <colspec colname="c1" colwidth="50*"/>
+          <colspec colname="c2" colwidth="50*"/>
+          <thead>
+            <row>
+              <entry>
+                <para><emphasis role="bold">Parameter</emphasis></para>
+              </entry>
+              <entry>
+                <para><emphasis role="bold">Description</emphasis></para>
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>
+                <para>
+                  <literal> ldlm_enqueue </literal></para>
+              </entry>
+              <entry>
+                <para>Time it takes to enqueue a lock (this includes file open on the MDS)</para>
+              </entry>
+            </row>
+            <row>
+              <entry>
+                <para>
+                  <literal> mds_reint </literal></para>
+              </entry>
+              <entry>
+                <para>Time it takes to process an MDS modification record (includes
+                    <literal>create</literal>, <literal>mkdir</literal>, <literal>unlink</literal>,
+                    <literal>rename</literal> and <literal>setattr</literal>)</para>
+              </entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+    </section>
+    <section>
+      <title>Interpreting MDT Statistics</title>
+      <note>
+        <para>See also
+            <xref linkend="dbdoclet.50438273_80593"/> (<literal>collectl</literal>).</para>
+      </note>
+      <para>MDT <literal>stats</literal> files can be used to track MDT
+      statistics for the MDS. The example below shows sample output from an
+      MDT <literal>stats</literal> file.</para>
+      <screen># lctl get_param mds.*-MDT0000.stats
+snapshot_time                   1244832003.676892 secs.usecs 
+open                            2 samples [reqs]
+close                           1 samples [reqs]
+getxattr                        3 samples [reqs]
+process_config                  1 samples [reqs]
+connect                         2 samples [reqs]
+disconnect                      2 samples [reqs]
+statfs                          3 samples [reqs]
+setattr                         1 samples [reqs]
+getattr                         3 samples [reqs]
+llog_init                       6 samples [reqs] 
+notify                          16 samples [reqs]</screen>
     </section>
+  </section>
 </chapter>
+<!--
+  vim:expandtab:shiftwidth=2:tabstop=8:
+  -->