LUDOC-445 fix minor typo

[doc/manual.git] / BenchmarkingTests.xml
diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml

index 0f10348..3830830 100644 (file)
--- a/BenchmarkingTests.xml
+++ b/BenchmarkingTests.xml
@@ -1,7 +1,8 @@
-<?xml version='1.0' encoding='UTF-8'?>
-<!-- This document was created with Syntext Serna Free. --><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="benchmarkingtests">
-  <title xml:id="benchmarkingtests.title">Benchmarking Lustre Performance (Lustre I/O Kit)</title>
-  <para>This chapter describes the Lustre I/O kit, a collection of I/O benchmarking tools for a Lustre cluster, and PIOS, a parallel I/O simulator for Linux and Solaris. It includes:</para>
+<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="benchmarkingtests">
+  <title xml:id="benchmarkingtests.title">Benchmarking Lustre File System Performance (Lustre I/O
+    Kit)</title>
+  <para>This chapter describes the Lustre I/O kit, a collection of I/O
+  benchmarking tools for a Lustre cluster.  It includes:</para>
    <itemizedlist>
      <listitem>
        <para><xref linkend="dbdoclet.50438212_44437"/></para>
@@ -30,23 +31,33 @@
            <indexterm><primary>performance</primary><see>benchmarking</see></indexterm>
            
            Using Lustre I/O Kit Tools</title>
-    <para>The tools in the Lustre I/O Kit are used to benchmark Lustre hardware and validate that it is working as expected before you install the Lustre software. It can also be used to to validate the performance of the various hardware and software layers in the cluster and also to find and troubleshoot I/O issues.</para>
+    <para>The tools in the Lustre I/O Kit are used to benchmark Lustre file system hardware and
+      validate that it is working as expected before you install the Lustre software. It can also be
+      used to to validate the performance of the various hardware and software layers in the cluster
+      and also to find and troubleshoot I/O issues.</para>
      <para>Typically, performance is measured starting with single raw devices and then proceeding to groups of devices. Once raw performance has been established, other software layers are then added incrementally and tested.</para>
      <section remap="h3">
        <title>Contents of the Lustre I/O Kit</title>
-      <para>The I/O kit contains three tests, each of which tests a progressively higher layer in the Lustre stack:</para>
+      <para>The I/O kit contains three tests, each of which tests a progressively higher layer in
+        the Lustre software stack:</para>
        <itemizedlist>
          <listitem>
-          <para><literal>sgpdd_survey</literal>  - Measure basic &apos;bare metal&apos; performance of devices while bypassing the kernel block device layers, buffer cache, and file system.</para>
+          <para><literal>sgpdd-survey</literal> - Measure basic &apos;bare metal&apos; performance
+            of devices while bypassing the kernel block device layers, buffer cache, and file
+            system.</para>
          </listitem>
          <listitem>
-          <para><literal>obdfilter_survey</literal>  - Measure the performance of one or more OSTs directly on the OSS node or alternately over the network from a Lustre client.</para>
+          <para><literal>obdfilter-survey</literal> - Measure the performance of one or more OSTs
+            directly on the OSS node or alternately over the network from a Lustre client.</para>
          </listitem>
          <listitem>
-          <para><literal>ost_survey</literal>  - Performs I/O against OSTs individually to allow performance comparisons to detect if an OST is performing suboptimally due to hardware issues.</para>
+          <para><literal>ost-survey</literal> - Performs I/O against OSTs individually to allow
+            performance comparisons to detect if an OST is performing sub-optimally due to hardware
+            issues.</para>
          </listitem>
        </itemizedlist>
-      <para>Typically with these tests, Lustre should deliver 85-90% of the raw device performance.</para>
+      <para>Typically with these tests, a Lustre file system should deliver 85-90% of the raw device
+        performance.</para>
        <para>A utility <literal>stats-collect</literal> is also provided to collect application profiling information from Lustre clients and servers. See <xref linkend="dbdoclet.50438212_58201"/> for more information.</para>
      </section>
      <section remap="h3">
@@ -57,7 +68,8 @@
            <para>Password-free remote access to nodes in the system (provided by <literal>ssh</literal> or <literal>rsh</literal>).</para>
          </listitem>
          <listitem>
-          <para>LNET self-test completed to test that Lustre Networking has been properly installed and configured. See <xref linkend="lnetselftest"/>.</para>
+          <para>LNet self-test completed to test that Lustre networking has been properly installed
+            and configured. See <xref linkend="lnetselftest"/>.</para>
          </listitem>
          <listitem>
            <para>Lustre file system software installed.</para>
@@ -71,8 +83,15 @@
      </section>
    </section>
    <section xml:id="dbdoclet.50438212_51053">
-    <title><indexterm><primary>benchmarking</primary><secondary>raw hardware with sgpdd_survey</secondary></indexterm>Testing I/O Performance of Raw Hardware (<literal>sgpdd_survey</literal>)</title>
-    <para>The <literal>sgpdd_survey</literal> tool is used to test bare metal I/O performance of the raw hardware, while bypassing as much of the kernel as possible. This survey may be used to characterize the performance of a SCSI device by simulating an OST serving multiple stripe files. The data gathered by this survey can help set expectations for the performance of a Lustre OST using this device.</para>
+    <title><indexterm>
+        <primary>benchmarking</primary>
+        <secondary>raw hardware with sgpdd-survey</secondary>
+      </indexterm>Testing I/O Performance of Raw Hardware (<literal>sgpdd-survey</literal>)</title>
+    <para>The <literal>sgpdd-survey</literal> tool is used to test bare metal I/O performance of the
+      raw hardware, while bypassing as much of the kernel as possible. This survey may be used to
+      characterize the performance of a SCSI device by simulating an OST serving multiple stripe
+      files. The data gathered by this survey can help set expectations for the performance of a
+      Lustre OST using this device.</para>
      <para>The script uses <literal>sgp_dd</literal> to carry out raw sequential disk I/O. It runs with variable numbers of <literal>sgp_dd</literal> threads to show how performance varies with different request queue depths.</para>
      <para>The script spawns variable numbers of <literal>sgp_dd</literal> instances, each reading or writing a separate area of the disk to demonstrate performance variance within a number of concurrent stripe files.</para>
      <para>Several tips and insights for disk performance measurement are described below. Some of this information is specific to RAID arrays and/or the Linux RAID implementation.</para>
@@ -87,7 +106,8 @@
        </listitem>
      </itemizedlist>
      <caution>
-      <para>The <literal>sgpdd_survey</literal> script overwrites the device being tested, which results in the <emphasis>
+      <para>The <literal>sgpdd-survey</literal> script overwrites the device being tested, which
+        results in the <emphasis>
            <emphasis role="bold">LOSS OF ALL DATA</emphasis>
          </emphasis> on that device. Exercise caution when selecting the device to be tested.</para>
      </caution>
@@ -114,7 +134,9 @@
      </itemizedlist>
      <para>Raw and SCSI devices cannot be mixed in the test specification.</para>
      <note>
-      <para>If you need to create raw devices to use the <literal>sgpdd_survey</literal> tool, note that raw device 0 cannot be used due to a bug in certain versions of the &quot;raw&quot; utility (including that shipped with RHEL4U4.)</para>
+      <para>If you need to create raw devices to use the <literal>sgpdd-survey</literal> tool, note
+        that raw device 0 cannot be used due to a bug in certain versions of the &quot;raw&quot;
+        utility (including the version shipped with Red Hat Enterprise Linux 4U4.)</para>
      </note>
      <section remap="h3">
        <title><indexterm><primary>benchmarking</primary><secondary>tuning storage</secondary></indexterm>Tuning Linux Storage Devices</title>
@@ -124,24 +146,36 @@
  /proc/scsi/sg/allow_dio = 1
  /sys/module/ib_srp/parameters/srp_sg_tablesize = 255
  /sys/block/sdN/queue/scheduler</screen>
+      <note>
+        <para>Recommended schedulers are <emphasis role="bold">deadline</emphasis> and <emphasis
+            role="bold">noop</emphasis>. The  scheduler is set by default to <emphasis role="bold"
+            >deadline</emphasis>, unless it has already been set to <emphasis role="bold"
+            >noop</emphasis>.</para>
+      </note>
      </section>
      <section remap="h3">
-      <title>Running sgpdd_survey</title>
-      <para>The <literal>sgpdd_survey</literal> script must be customized for the particular device being tested and for the location where the script saves its working and result files (by specifying the <literal>${rslt}</literal> variable). Customization variables are described at the beginning of the script.</para>
-      <para>When the <literal>sgpdd_survey</literal> script runs, it creates a number of working files and a pair of result files. The names of all the files created start with the prefix defined in the variable <literal>${rslt}</literal>. (The default value is <literal>/tmp</literal>.) The files include:</para>
+      <title>Running sgpdd-survey</title>
+      <para>The <literal>sgpdd-survey</literal> script must be customized for the particular device
+        being tested and for the location where the script saves its working and result files (by
+        specifying the <literal>${rslt}</literal> variable). Customization variables are described
+        at the beginning of the script.</para>
+      <para>When the <literal>sgpdd-survey</literal> script runs, it creates a number of working
+        files and a pair of result files. The names of all the files created start with the prefix
+        defined in the variable <literal>${rslt}</literal>. (The default value is
+          <literal>/tmp</literal>.) The files include:</para>
        <itemizedlist>
          <listitem>
            <para>File containing standard output data (same as <literal>stdout</literal>)</para>
-          <screen>${rslt}_<emphasis>&lt;date/time&gt;</emphasis>.summary</screen>
+          <screen><replaceable>rslt_date_time</replaceable>.summary</screen>
          </listitem>
          <listitem>
            <para>Temporary (tmp) files</para>
-          <screen>${rslt}_<emphasis>&lt;date/time&gt;</emphasis>_*
+          <screen><replaceable>rslt_date_time</replaceable>_*
  </screen>
          </listitem>
          <listitem>
            <para>Collected tmp files for post-mortem</para>
-          <screen>${rslt}_<emphasis>&lt;date/time&gt;</emphasis>.detail
+          <screen><replaceable>rslt_date_time</replaceable>.detail
  </screen>
          </listitem>
        </itemizedlist>
@@ -175,47 +209,81 @@
      </section>
    </section>
    <section xml:id="dbdoclet.50438212_26516">
-    <title><indexterm><primary>benchmarking</primary><secondary>OST performance</secondary></indexterm>Testing OST Performance (<literal>obdfilter_survey</literal>)</title>
-    <para>The <literal>obdfilter_survey</literal> script generates sequential I/O from varying numbers of threads and objects (files) to simulate the I/O patterns of a Lustre client.</para>
-    <para>The <literal>obdfilter_survey</literal> script can be run directly on the OSS node to measure the OST storage performance without any intervening network, or it can be run remotely on a Lustre client to measure the OST performance including network overhead.</para>
-    <para>The <literal>obdfilter_survey</literal> is used to characterize the performance of the following:</para>
+    <title><indexterm>
+        <primary>benchmarking</primary>
+        <secondary>OST performance</secondary>
+      </indexterm>Testing OST Performance (<literal>obdfilter-survey</literal>)</title>
+    <para>The <literal>obdfilter-survey</literal> script generates sequential I/O from varying
+      numbers of threads and objects (files) to simulate the I/O patterns of a Lustre client.</para>
+    <para>The <literal>obdfilter-survey</literal> script can be run directly on the OSS node to
+      measure the OST storage performance without any intervening network, or it can be run remotely
+      on a Lustre client to measure the OST performance including network overhead.</para>
+    <para>The <literal>obdfilter-survey</literal> is used to characterize the performance of the
+      following:</para>
      <itemizedlist>
        <listitem>
-        <para><emphasis role="bold">Local file system</emphasis>  - In this mode, the <literal>obdfilter_survey</literal> script exercises one or more instances of the obdfilter directly. The script may run on one or more OSS nodes, for example, when the OSSs are all attached to the same multi-ported disk subsystem.</para>
+        <para><emphasis role="bold">Local file system</emphasis> - In this mode, the
+            <literal>obdfilter-survey</literal> script exercises one or more instances of the
+          obdfilter directly. The script may run on one or more OSS nodes, for example, when the
+          OSSs are all attached to the same multi-ported disk subsystem.</para>
          <para>Run the script using the <literal>case=disk</literal> parameter to run the test against all the local OSTs. The script automatically detects all local OSTs and includes them in the survey.</para>
-        <para>To run the test against only specific OSTs, run the script using the <literal>target=parameter</literal> to list the OSTs to be tested explicitly. If some OSTs are on remote nodes, specify their hostnames in addition to the OST name (for example, <literal>oss2:lustre-OST0004</literal>).</para>
+        <para>To run the test against only specific OSTs, run the script using the <literal>targets=parameter</literal> to list the OSTs to be tested explicitly. If some OSTs are on remote nodes, specify their hostnames in addition to the OST name (for example, <literal>oss2:lustre-OST0004</literal>).</para>
          <para>All <literal>obdfilter</literal> instances are driven directly. The script automatically loads the <literal>obdecho</literal> module (if required) and creates one instance of <literal>echo_client</literal> for each <literal>obdfilter</literal> instance in order to generate I/O requests directly to the OST.</para>
          <para>For more details, see <xref linkend="dbdoclet.50438212_59319"/>.</para>
        </listitem>
        <listitem>
          <para><emphasis role="bold">Network</emphasis>  - In this mode, the Lustre client generates I/O requests over the network but these requests are not sent to the OST file system. The OSS node runs the obdecho server to receive the requests but discards them before they are sent to the disk.</para>
-        <para>Pass the parameters <literal>case=network</literal> and <literal>target=<replaceable>&lt;hostname</replaceable>|<replaceable>IP_of_server&gt;</replaceable></literal> to the script. For each network case, the script does the required setup.</para>
+        <para>Pass the parameters <literal>case=network</literal> and <literal>targets=<replaceable>hostname|IP_of_server</replaceable></literal> to the script. For each network case, the script does the required setup.</para>
          <para>For more details, see <xref linkend="dbdoclet.50438212_36037"/></para>
        </listitem>
        <listitem>
-        <para><emphasis role="bold">Remote file system over the network</emphasis>  - In this mode the <literal>obdfilter_survey</literal> script generates I/O from a Lustre client to a remote OSS to write the data to the file system.</para>
+        <para><emphasis role="bold">Remote file system over the network</emphasis> - In this mode
+          the <literal>obdfilter-survey</literal> script generates I/O from a Lustre client to a
+          remote OSS to write the data to the file system.</para>
          <para>To run the test against all the local OSCs, pass the parameter <literal>case=netdisk</literal> to the script. Alternately you can pass the target= parameter with one or more OSC devices (e.g., <literal>lustre-OST0000-osc-ffff88007754bc00</literal>) against which the tests are to be run.</para>
          <para>For more details, see <xref linkend="dbdoclet.50438212_62662"/>.</para>
        </listitem>
      </itemizedlist>
      <caution>
-      <para>The <literal>obdfilter_survey</literal> script is destructive and should not be run on devices that containing existing data that needs to be preserved. Thus, tests using <literal>obdfilter_survey</literal> should be run before the Lustre file system is placed in production.</para>
+      <para>The <literal>obdfilter-survey</literal> script is potentially destructive and there is a
+        small risk data may be lost. To reduce this risk, <literal>obdfilter-survey</literal> should
+        not be run on devices that contain data that needs to be preserved. Thus, the best time to
+        run <literal>obdfilter-survey</literal> is before the Lustre file system is put into
+        production. The reason <literal>obdfilter-survey</literal> may be safe to run on a
+        production file system is because it creates objects with object sequence 2. Normal file
+        system objects are typically created with object sequence 0.</para>
      </caution>
      <note>
-      <para>If the <literal>obdfilter_survey</literal> test is terminated before it completes, some small amount of space is leaked. you can either ignore it or reformat the file system.</para>
+      <para>If the <literal>obdfilter-survey</literal> test is terminated before it completes, some
+        small amount of space is leaked. you can either ignore it or reformat the file
+        system.</para>
      </note>
      <note>
-      <para>The <literal>obdfilter_survey</literal> script is <emphasis>NOT</emphasis> scalable beyond tens of OSTs since it is only intended to measure the I/O performance of individual storage subsystems, not the scalability of the entire system.</para>
+      <para>The <literal>obdfilter-survey</literal> script is <emphasis>NOT</emphasis> scalable
+        beyond tens of OSTs since it is only intended to measure the I/O performance of individual
+        storage subsystems, not the scalability of the entire system.</para>
      </note>
      <note>
-      <para>The <literal>obdfilter_survey</literal> script must be customized, depending on the components under test and where the script&apos;s working files should be kept. Customization variables are described at the beginning of the <literal>obdfilter_survey</literal> script. In particular, pay attention to the listed maximum values listed for each parameter in the script.</para>
+      <para>The <literal>obdfilter-survey</literal> script must be customized, depending on the
+        components under test and where the script&apos;s working files should be kept.
+        Customization variables are described at the beginning of the
+          <literal>obdfilter-survey</literal> script. In particular, pay attention to the listed
+        maximum values listed for each parameter in the script.</para>
      </note>
      <section xml:id="dbdoclet.50438212_59319">
        <title><indexterm><primary>benchmarking</primary><secondary>local disk</secondary></indexterm>Testing Local Disk Performance</title>
-      <para>The <literal>obdfilter_survey</literal> script can be run automatically or manually against a local disk. This script profiles the overall throughput of storage hardware, including the file system and RAID layers managing the storage, by sending workloads to the OSTs that vary in thread count, object count, and I/O size.</para>
-      <para>When the <literal>obdfilter_survey</literal> script is run, it provides information about the performance abilities of the storage hardware and shows the saturation points.</para>
-      <para>The <literal>plot-obdfilter</literal> script generates from the output of the <literal>obdfilter_survey</literal> a CSV file and parameters for importing into a spreadsheet or gnuplot to visualize the data.</para>
-      <para>To run the <literal>obdfilter_survey</literal> script, create a standard Lustre configuration; no special setup is needed.</para>
+      <para>The <literal>obdfilter-survey</literal> script can be run automatically or manually
+        against a local disk. This script profiles the overall throughput of storage hardware,
+        including the file system and RAID layers managing the storage, by sending workloads to the
+        OSTs that vary in thread count, object count, and I/O size.</para>
+      <para>When the <literal>obdfilter-survey</literal> script is run, it provides information
+        about the performance abilities of the storage hardware and shows the saturation
+        points.</para>
+      <para>The <literal>plot-obdfilter</literal> script generates from the output of the
+          <literal>obdfilter-survey</literal> a CSV file and parameters for importing into a
+        spreadsheet or gnuplot to visualize the data.</para>
+      <para>To run the <literal>obdfilter-survey</literal> script, create a standard Lustre file
+        system configuration; no special setup is needed.</para>
        <para><emphasis role="bold">To perform an automatic run:</emphasis></para>
        <orderedlist>
          <listitem>
@@ -227,10 +295,34 @@
            <screen>modprobe obdecho</screen>
          </listitem>
          <listitem>
-          <para>Run the <literal>obdfilter_survey</literal> script with the parameter <literal>case=disk</literal>.</para>
+          <para>Run the <literal>obdfilter-survey</literal> script with the parameter
+              <literal>case=disk</literal>.</para>
            <para>For example, to run a local test with up to two objects (nobjhi), up to two threads (thrhi), and 1024 MB transfer size (size):</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 case=disk sh obdfilter-survey</screen>
          </listitem>
+        <listitem>
+               <para>Performance measurements for write, rewrite, read etc are provided below:</para>
+               <screen># example output
+Fri Sep 25 11:14:03 EDT 2015 Obdfilter-survey for case=disk from hds1fnb6123
+ost 10 sz 167772160K rsz 1024K obj   10 thr   10 write 10982.73 [ 601.97,2912.91] rewrite 15696.54 [1160.92,3450.85] read 12358.60 [ 938.96,2634.87] 
+...</screen>
+               <para>The file <literal>./lustre-iokit/obdfilter-survey/README.obdfilter-survey</literal>
+               provides an explaination for the output as follows:</para>
+               <screen>ost 10          is the total number of OSTs under test.
+sz 167772160K   is the total amount of data read or written (in bytes).
+rsz 1024K       is the record size (size of each echo_client I/O, in bytes).
+obj    10       is the total number of objects over all OSTs
+thr    10       is the total number of threads over all OSTs and objects
+write           is the test name.  If more tests have been specified they
+           all appear on the same line.
+10982.73        is the aggregate bandwidth over all OSTs measured by
+           dividing the total number of MB by the elapsed time.
+[601.97,2912.91] are the minimum and maximum instantaneous bandwidths seen on
+           any individual OST.
+Note that although the numbers of threads and objects are specifed per-OST
+in the customization section of the script, results are reported aggregated
+over all OSTs.</screen>
+        </listitem>
        </orderedlist>
        <para><emphasis role="italic">To perform a manual run:</emphasis></para>
        <orderedlist>
@@ -252,12 +344,13 @@
          </listitem>
          <listitem>
            <para>List all OSTs you want to test.</para>
-          <para>Use the <literal>target=parameter</literal> to list the OSTs separated by spaces. List the individual OSTs by name using the format <emphasis>
-              <literal>&lt;fsname&gt;-&lt;OSTnumber&gt;</literal>
-            </emphasis> (for example, lustre-OST0001). You do not have to specify an MDS or LOV.</para>
+          <para>Use the <literal>targets=parameter</literal> to list the OSTs separated by spaces. List the individual OSTs by name using the format
+              <literal><replaceable>fsname</replaceable>-<replaceable>OSTnumber</replaceable></literal>
+            (for example, <literal>lustre-OST0001</literal>). You do not have to specify an MDS or LOV.</para>
          </listitem>
          <listitem>
-          <para>Run the <literal>obdfilter_survey</literal> script with the <literal>target=parameter</literal>.</para>
+          <para>Run the <literal>obdfilter-survey</literal> script with the
+              <literal>targets=parameter</literal>.</para>
            <para>For example, to run a local test with up to two objects (<literal>nobjhi</literal>), up to two threads (<literal>thrhi</literal>), and 1024 Mb (size) transfer size:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 targets=&quot;lustre-OST0001 \
            lustre-OST0002&quot; sh obdfilter-survey</screen>
@@ -266,8 +359,10 @@
      </section>
      <section xml:id="dbdoclet.50438212_36037">
        <title><indexterm><primary>benchmarking</primary><secondary>network</secondary></indexterm>Testing Network Performance</title>
-      <para>The <literal>obdfilter_survey</literal> script can only be run automatically against a network; no manual test is provided.</para>
-      <para>To run the network test, a specific Lustre setup is needed. Make sure that these configuration requirements have been met.</para>
+      <para>The <literal>obdfilter-survey</literal> script can only be run automatically against a
+        network; no manual test is provided.</para>
+      <para>To run the network test, a specific Lustre file system setup is needed. Make sure that
+        these configuration requirements have been met.</para>
        <para><emphasis role="bold">To perform an automatic run:</emphasis></para>
        <orderedlist>
          <listitem>
@@ -283,22 +378,26 @@
            <screen>lctl dl</screen>
          </listitem>
          <listitem>
-          <para>Run the <literal>obdfilter_survey</literal> script with the parameters <literal>case=network</literal> and <literal>targets=<replaceable>&lt;hostname</replaceable>|<replaceable>ip_of_server&gt;</replaceable></literal>. For example:</para>
+          <para>Run the <literal>obdfilter-survey</literal> script with the parameters
+              <literal>case=network</literal> and
+                <literal>targets=<replaceable>hostname|ip_of_server</replaceable></literal>. For
+            example:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 targets=&quot;oss0 oss1&quot; \
-          case=network sh odbfilter-survey</screen>
+          case=network sh obdfilter-survey</screen>
          </listitem>
          <listitem>
            <para>On the server side, view the statistics at:</para>
-          <screen>/proc/fs/lustre/obdecho/<emphasis>&lt;echo_srv&gt;</emphasis>/stats</screen>
-          <para>where <emphasis>
-              <literal>&lt;echo_srv&gt;</literal>
-            </emphasis> is the <literal>obdecho</literal> server created by the script.</para>
+          <screen>lctl get_param obdecho.<replaceable>echo_srv</replaceable>.stats</screen>
+          <para>where <literal><replaceable>echo_srv</replaceable></literal>
+            is the <literal>obdecho</literal> server created by the script.</para>
          </listitem>
        </orderedlist>
      </section>
      <section xml:id="dbdoclet.50438212_62662">
        <title><indexterm><primary>benchmarking</primary><secondary>remote disk</secondary></indexterm>Testing Remote Disk Performance</title>
-      <para>The <literal>obdfilter_survey</literal> script can be run automatically or manually against a network disk. To run the network disk test, start with a standard Lustre configuration. No special setup is needed.</para>
+      <para>The <literal>obdfilter-survey</literal> script can be run automatically or manually
+        against a network disk. To run the network disk test, start with a standard Lustre
+        configuration. No special setup is needed.</para>
        <para><emphasis role="bold">To perform an automatic run:</emphasis></para>
        <orderedlist>
          <listitem>
@@ -310,7 +409,8 @@
            <screen>modprobe obdecho</screen>
          </listitem>
          <listitem>
-          <para>Run the <literal>obdfilter_survey</literal> script with the parameter <literal>case=netdisk</literal>. For example:</para>
+          <para>Run the <literal>obdfilter-survey</literal> script with the parameter
+              <literal>case=netdisk</literal>. For example:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 case=netdisk sh obdfilter-survey
  </screen>
          </listitem>
@@ -338,10 +438,12 @@
          </listitem>
          <listitem>
            <para>List all OSCs you want to test.</para>
-          <para>Use the <literal>target=parameter</literal> to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format <literal><replaceable>&lt;fsname&gt;-&lt;OST_name&gt;</replaceable>-osc-<replaceable>&lt;OSC_number&gt;</replaceable></literal> (for example, <literal>lustre-OST0000-osc-ffff88007754bc00</literal>). You <replaceable role="bold">do not have to specify an MDS or LOV.</replaceable></para>
+          <para>Use the <literal>targets=parameter</literal> to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format <literal><replaceable>fsname</replaceable>-<replaceable>OST_name</replaceable>-osc-<replaceable>instance</replaceable></literal> (for example, <literal>lustre-OST0000-osc-ffff88007754bc00</literal>). You <emphasis>do not have to specify an MDS or LOV.</emphasis></para>
          </listitem>
          <listitem>
-          <para>Run the <literal><replaceable role="bold">o</replaceable>bdfilter_survey</literal> script with the <literal>target=parameter</literal> and <literal>case=netdisk</literal>.</para>
+          <para>Run the <literal>obdfilter-survey</literal> script with the
+                <literal>targets=<replaceable>osc</replaceable></literal> and
+              <literal>case=netdisk</literal>.</para>
            <para>An example of a local test run with up to two objects (<literal>nobjhi</literal>), up to two threads (<literal>thrhi</literal>), and 1024 Mb (size) transfer size is shown below:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 \
             targets=&quot;lustre-OST0000-osc-ffff88007754bc00 \
@@ -352,7 +454,9 @@
      </section>
      <section remap="h3">
        <title>Output Files</title>
-      <para>When the <literal>obdfilter_survey</literal> script runs, it creates a number of working files and a pair of result files. All files start with the prefix defined in the variable <literal>${rslt}</literal>.</para>
+      <para>When the <literal>obdfilter-survey</literal> script runs, it creates a number of working
+        files and a pair of result files. All files start with the prefix defined in the variable
+          <literal>${rslt}</literal>.</para>
        <informaltable frame="all">
          <tgroup cols="2">
            <colspec colname="c1" colwidth="50*"/>
@@ -403,13 +507,20 @@
            </tbody>
          </tgroup>
        </informaltable>
-      <para>The <literal>obdfilter_survey</literal> script iterates over the given number of threads and objects performing the specified tests and checks that all test processes have completed successfully.</para>
+      <para>The <literal>obdfilter-survey</literal> script iterates over the given number of threads
+        and objects performing the specified tests and checks that all test processes have completed
+        successfully.</para>
        <note>
-        <para>The <literal>obdfilter_survey</literal> script may not clean up properly if it is aborted or if it encounters an unrecoverable error. In this case, a manual cleanup may be required, possibly including killing any running instances of <literal>lctl</literal> (local or remote), removing <literal>echo_client</literal> instances created by the script and unloading <literal>obdecho</literal>.</para>
+        <para>The <literal>obdfilter-survey</literal> script may not clean up properly if it is
+          aborted or if it encounters an unrecoverable error. In this case, a manual cleanup may be
+          required, possibly including killing any running instances of <literal>lctl</literal>
+          (local or remote), removing <literal>echo_client</literal> instances created by the script
+          and unloading <literal>obdecho</literal>.</para>
        </note>
        <section remap="h4">
          <title>Script Output</title>
-        <para>The <literal>.summary</literal> file and <literal>stdout</literal> of the <literal>obdfilter_survey</literal> script contain lines like:</para>
+        <para>The <literal>.summary</literal> file and <literal>stdout</literal> of the
+            <literal>obdfilter-survey</literal> script contain lines like:</para>
          <screen>ost 8 sz 67108864K rsz 1024 obj 8 thr 8 write 613.54 [ 64.00, 82.00]
  </screen>
          <para>Where:</para>
@@ -501,45 +612,69 @@
        </section>
        <section remap="h4">
          <title>Visualizing Results</title>
-        <para>It is useful to import the <literal>obdfilter_survey</literal> script summary data (it is fixed width) into Excel (or any graphing package) and graph the bandwidth versus the number of threads for varying numbers of concurrent regions. This shows how the OSS performs for a given number of concurrently-accessed objects (files) with varying numbers of I/Os in flight.</para>
-        <para>It is also useful to monitor and record average disk I/O sizes during each test using the &apos;disk io size&apos; histogram in the file <literal>/proc/fs/lustre/obdfilter/</literal> (see <xref linkend="dbdoclet.50438271_55057"/> for details). These numbers help identify problems in the system when full-sized I/Os are not submitted to the underlying disk. This may be caused by problems in the device driver or Linux block layer.</para>
-        <screen> */brw_stats</screen>
-        <para>The <literal>plot-obdfilter</literal> script included in the I/O toolkit is an example of processing output files to a .csv format and plotting a graph using <literal>gnuplot</literal>.</para>
+        <para>It is useful to import the <literal>obdfilter-survey</literal>
+        script summary data (it is fixed width) into Excel (or any graphing
+        package) and graph the bandwidth versus the number of threads for
+        varying numbers of concurrent regions. This shows how the OSS performs
+        for a given number of concurrently-accessed objects (files) with varying
+        numbers of I/Os in flight.</para>
+        <para>It is also useful to monitor and record average disk I/O sizes
+        during each test using the &apos;disk io size&apos; histogram in the
+        file <literal>lctl get_param obdfilter.*.brw_stats</literal>
+        (see <xref linkend="dbdoclet.50438271_55057"/> for details). These
+        numbers help identify problems in the system when full-sized I/Os are
+        not submitted to the underlying disk. This may be caused by problems in
+        the device driver or Linux block layer.</para>
+        <para>The <literal>plot-obdfilter</literal> script included in the I/O
+        toolkit is an example of processing output files to a .csv format and
+        plotting a graph using <literal>gnuplot</literal>.</para>
        </section>
      </section>
    </section>
    <section xml:id="dbdoclet.50438212_85136">
-      <title><indexterm><primary>benchmarking</primary><secondary>OST I/O</secondary></indexterm>Testing OST I/O Performance (<literal>ost_survey</literal>)</title>
-    <para>The <literal>ost_survey</literal> tool is a shell script that uses <literal>lfs setstripe</literal> to perform I/O against a single OST. The script writes a file (currently using <literal>dd</literal>) to each OST in the Lustre file system, and compares read and write speeds. The <literal>ost_survey</literal> tool is used to detect anomalies between otherwise identical disk subsystems.</para>
+      <title><indexterm>
+        <primary>benchmarking</primary>
+        <secondary>OST I/O</secondary>
+      </indexterm>Testing OST I/O Performance (<literal>ost-survey</literal>)</title>
+    <para>The <literal>ost-survey</literal> tool is a shell script that uses <literal>lfs
+        setstripe</literal> to perform I/O against a single OST. The script writes a file (currently
+      using <literal>dd</literal>) to each OST in the Lustre file system, and compares read and
+      write speeds. The <literal>ost-survey</literal> tool is used to detect anomalies between
+      otherwise identical disk subsystems.</para>
      <note>
-      <para>We have frequently discovered wide performance variations across all LUNs in a cluster. This may be caused by faulty disks, RAID parity reconstruction during the test, or faulty network hardware.</para>
+      <para>We have frequently discovered wide performance variations across all LUNs in a cluster.
+        This may be caused by faulty disks, RAID parity reconstruction during the test, or faulty
+        network hardware.</para>
      </note>
-    <para>To run the <literal>ost_survey</literal> script, supply a file size (in KB) and the Lustre mount point. For example, run:</para>
-    <screen>$ ./ost-survey.sh 10 /mnt/lustre
+    <para>To run the <literal>ost-survey</literal> script, supply a file size (in KB) and the Lustre
+      file system mount point. For example, run:</para>
+    <screen>$ ./ost-survey.sh -s 10 /mnt/lustre
  </screen>
      <para>Typical output is:</para>
      <screen>
-Average read Speed:                  6.73
-Average write Speed:                 5.41
-read - Worst OST indx 0              5.84 MB/s
-write - Worst OST indx 0             3.77 MB/s
-read - Best OST indx 1               7.38 MB/s
-write - Best OST indx 1              6.31 MB/s
-3 OST devices found
-Ost index 0 Read speed               5.84         Write speed     3.77
-Ost index 0 Read time                0.17         Write time      0.27
-Ost index 1 Read speed               7.38         Write speed     6.31
-Ost index 1 Read time                0.14         Write time      0.16
-Ost index 2 Read speed               6.98         Write speed     6.16
-Ost index 2 Read time                0.14         Write time      0.16 
+Number of Active OST devices : 4
+Worst  Read OST indx: 2 speed: 2835.272725
+Best   Read OST indx: 3 speed: 2872.889668
+Read Average: 2852.508999 +/- 16.444792 MB/s
+Worst  Write OST indx: 3 speed: 17.705545
+Best   Write OST indx: 2 speed: 128.172576
+Write Average: 95.437735 +/- 45.518117 MB/s
+Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
+----------------------------------------------------
+0     2837.440       126.918        0.035      0.788
+1     2864.433       108.954        0.035      0.918
+2     2835.273       128.173        0.035      0.780
+3     2872.890       17.706        0.035      5.648
  </screen>
    </section>
    <section xml:id="mds_survey_ref">
      <title><indexterm><primary>benchmarking</primary><secondary>MDS
  performance</secondary></indexterm>Testing MDS Performance (<literal>mds-survey</literal>)</title>
-    <para>The <literal>mds-survey</literal> script tests the local metadata
-performance using the echo_client to drive different layers of the MDS stack:
-mdd, mdt, osd (current lustre version only supports mdd stack). It can be used with the following classes of operations:</para>
+       <para><literal>mds-survey</literal> is available in Lustre software release 2.2 and beyond. The
+        <literal>mds-survey</literal> script tests the local metadata performance using the
+      echo_client to drive different layers of the MDS stack: mdd, mdt, osd (the Lustre software
+      only supports mdd stack). It can be used with the following classes of operations:</para>
+
      <itemizedlist>
        <listitem>
          <para><literal>Open-create/mkdir/create</literal></para>
@@ -575,40 +710,46 @@ mdd, mdt, osd (current lustre version only supports mdd stack). It can be used w
            <para>The script must be customized according to the components under test and where it should keep its working files. Customization variables are described as followed:</para>
            <itemizedlist>
              <listitem>
-              <para>thrlo - threads to start testing. skipped if less than dir_count</para>
+              <para><literal>thrlo</literal> - threads to start testing. skipped if less than
+                <literal>dir_count</literal></para>
              </listitem>
              <listitem>
-              <para>thrhi - maximum number of threads to test</para>
+              <para><literal>thrhi</literal> - maximum number of threads to test</para>
              </listitem>
              <listitem>
-              <para>targets - MDT instance</para>
+              <para><literal>targets</literal> - MDT instance</para>
              </listitem>
              <listitem>
-              <para>file_count - number of files per thread to test</para>
+              <para><literal>file_count</literal> - number of files per thread to test</para>
              </listitem>
              <listitem>
-              <para>dir_count - total number of directories to test. Must be less than thrhi</para>
+              <para><literal>dir_count</literal> - total number of directories to test. Must be less
+              than or equal to <literal>thrhi</literal></para>
              </listitem>
              <listitem>
-              <para>stripe_count - number stripe on OST objects</para>
+              <para><literal>stripe_count </literal>- number stripe on OST objects</para>
              </listitem>
              <listitem>
-              <para>tests_str - test operations. Must have at least "create" and "destroy"</para>
+              <para><literal>tests_str</literal> - test operations. Must have at least "create" and
+              "destroy"</para>
              </listitem>
              <listitem>
-              <para>start_number - base number for each thread to prevent name collisions</para>
+              <para><literal>start_number</literal> - base number for each thread to prevent name
+              collisions</para>
              </listitem>
              <listitem>
-              <para>layer - MDS stack's layer to be tested</para>
+              <para><literal>layer</literal> - MDS stack's layer to be tested</para>
              </listitem>
            </itemizedlist>
            <para>Run without OST objects creation:</para>
            <para>Setup the Lustre MDS without OST mounted. Then invoke the <literal>mds-survey</literal> script</para>
            <screen>$ thrhi=64 file_count=200000 sh mds-survey</screen>
            <para>Run with OST objects creation:</para>
-          <para>Setup the Lustre MDS with at least one OST mounted. Then invoke the <literal>mds-survey</literal> script with stripe_count parameter</para>
+          <para>Setup the Lustre MDS with at least one OST mounted. Then invoke the
+            <literal>mds-survey</literal> script with <literal>stripe_count</literal>
+          parameter</para>
            <screen>$ thrhi=64 file_count=200000 stripe_count=2 sh mds-survey</screen>
-          <para>Note: a specific mdt instance can be specified using targets variable.</para>
+          <para>Note: a specific MDT instance can be specified using targets variable.</para>
            <screen>$ targets=lustre-MDT0000 thrhi=64 file_count=200000 stripe_count=2 sh mds-survey</screen>
          </listitem>
        </orderedlist>
@@ -771,7 +912,7 @@ mdd, mdt, osd (current lustre version only supports mdd stack). It can be used w
      <para>The <literal>stats-collect</literal> utility requires:</para>
      <itemizedlist>
        <listitem>
-        <para>Lustre to be installed and set up on your cluster</para>
+        <para>Lustre software to be installed and set up on your cluster</para>
        </listitem>
        <listitem>
          <para>SSH and SCP access to these nodes without requiring a password</para>
@@ -780,7 +921,7 @@ mdd, mdt, osd (current lustre version only supports mdd stack). It can be used w
      <section remap="h3">
        <title>Using <literal>stats-collect</literal></title>
        <para>The stats-collect utility is configured by including profiling configuration variables in the config.sh script. Each configuration variable takes the following form, where 0 indicates statistics are to be collected only when the script starts and stops and <emphasis>n</emphasis> indicates the interval in seconds at which statistics are to be collected:</para>
-      <screen><emphasis>&lt;statistic&gt;</emphasis>_INTERVAL=<emphasis>[</emphasis>0<emphasis>|n]</emphasis></screen>
+      <screen><replaceable>statistic</replaceable>_INTERVAL=<replaceable>0|n</replaceable></screen>
        <para>Statistics that can be collected include:</para>
        <itemizedlist>
          <listitem>
@@ -790,7 +931,7 @@ mdd, mdt, osd (current lustre version only supports mdd stack). It can be used w
            <para><literal>SERVICE</literal>  - Lustre OST and MDT RPC service statistics</para>
          </listitem>
          <listitem>
-          <para><literal>BRW</literal>  - OST block read/write statistics (brw_stats)</para>
+          <para><literal>BRW</literal> - OST bulk read/write statistics (brw_stats)</para>
          </listitem>
          <listitem>
            <para><literal>SDIO</literal>  - SCSI disk IO statistics (sd_iostats)</para>
@@ -822,10 +963,9 @@ mdd, mdt, osd (current lustre version only supports mdd stack). It can be used w
          <listitem>
            <para>Stop collecting statistics on each node, clean up the temporary file, and create a profiling tarball.</para>
            <para>Enter:</para>
-          <screen>sh gather_stats_everywhere.sh config.sh stop <emphasis>&lt;log_name.tgz&gt;</emphasis></screen>
-          <para>When <emphasis>
-              <literal>&lt;log_name.tgz&gt;</literal>
-            </emphasis> is specified, a profile tarball <literal>/tmp/<replaceable>&lt;log_name.tgz&gt;</replaceable></literal> is created.</para>
+          <screen>sh gather_stats_everywhere.sh config.sh stop <replaceable>log_name</replaceable>.tgz</screen>
+          <para>When <literal><replaceable>log_name</replaceable>.tgz</literal>
+            is specified, a profile tarball <literal><replaceable>/tmp/log_name</replaceable>.tgz</literal> is created.</para>
          </listitem>
          <listitem>
            <para>Analyze the collected statistics and create a csv tarball for the specified profiling data.</para>