LUDOC-307 example: example output for obdfilter-survey.

[doc/manual.git] / BenchmarkingTests.xml
diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml

index 61683b6..6e7a823 100644 (file)
--- a/BenchmarkingTests.xml
+++ b/BenchmarkingTests.xml
@@ -1,5 +1,6 @@
  <?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="benchmarkingtests">
-  <title xml:id="benchmarkingtests.title">Benchmarking Lustre Performance (Lustre I/O Kit)</title>
+  <title xml:id="benchmarkingtests.title">Benchmarking Lustre File System Performance (Lustre I/O
+    Kit)</title>
    <para>This chapter describes the Lustre I/O kit, a collection of I/O benchmarking tools for a
      Lustre cluster, and PIOS, a parallel I/O simulator for Linux and
        Solaris<superscript>*</superscript> operating systems. It includes:</para>
@@ -31,11 +32,15 @@
            <indexterm><primary>performance</primary><see>benchmarking</see></indexterm>
            
            Using Lustre I/O Kit Tools</title>
-    <para>The tools in the Lustre I/O Kit are used to benchmark Lustre hardware and validate that it is working as expected before you install the Lustre software. It can also be used to to validate the performance of the various hardware and software layers in the cluster and also to find and troubleshoot I/O issues.</para>
+    <para>The tools in the Lustre I/O Kit are used to benchmark Lustre file system hardware and
+      validate that it is working as expected before you install the Lustre software. It can also be
+      used to to validate the performance of the various hardware and software layers in the cluster
+      and also to find and troubleshoot I/O issues.</para>
      <para>Typically, performance is measured starting with single raw devices and then proceeding to groups of devices. Once raw performance has been established, other software layers are then added incrementally and tested.</para>
      <section remap="h3">
        <title>Contents of the Lustre I/O Kit</title>
-      <para>The I/O kit contains three tests, each of which tests a progressively higher layer in the Lustre stack:</para>
+      <para>The I/O kit contains three tests, each of which tests a progressively higher layer in
+        the Lustre software stack:</para>
        <itemizedlist>
          <listitem>
            <para><literal>sgpdd-survey</literal> - Measure basic &apos;bare metal&apos; performance
@@ -52,7 +57,8 @@
              issues.</para>
          </listitem>
        </itemizedlist>
-      <para>Typically with these tests, Lustre should deliver 85-90% of the raw device performance.</para>
+      <para>Typically with these tests, a Lustre file system should deliver 85-90% of the raw device
+        performance.</para>
        <para>A utility <literal>stats-collect</literal> is also provided to collect application profiling information from Lustre clients and servers. See <xref linkend="dbdoclet.50438212_58201"/> for more information.</para>
      </section>
      <section remap="h3">
@@ -63,7 +69,8 @@
            <para>Password-free remote access to nodes in the system (provided by <literal>ssh</literal> or <literal>rsh</literal>).</para>
          </listitem>
          <listitem>
-          <para>LNET self-test completed to test that Lustre Networking has been properly installed and configured. See <xref linkend="lnetselftest"/>.</para>
+          <para>LNET self-test completed to test that Lustre networking has been properly installed
+            and configured. See <xref linkend="lnetselftest"/>.</para>
          </listitem>
          <listitem>
            <para>Lustre file system software installed.</para>
@@ -221,13 +228,13 @@
            obdfilter directly. The script may run on one or more OSS nodes, for example, when the
            OSSs are all attached to the same multi-ported disk subsystem.</para>
          <para>Run the script using the <literal>case=disk</literal> parameter to run the test against all the local OSTs. The script automatically detects all local OSTs and includes them in the survey.</para>
-        <para>To run the test against only specific OSTs, run the script using the <literal>target=parameter</literal> to list the OSTs to be tested explicitly. If some OSTs are on remote nodes, specify their hostnames in addition to the OST name (for example, <literal>oss2:lustre-OST0004</literal>).</para>
+        <para>To run the test against only specific OSTs, run the script using the <literal>targets=parameter</literal> to list the OSTs to be tested explicitly. If some OSTs are on remote nodes, specify their hostnames in addition to the OST name (for example, <literal>oss2:lustre-OST0004</literal>).</para>
          <para>All <literal>obdfilter</literal> instances are driven directly. The script automatically loads the <literal>obdecho</literal> module (if required) and creates one instance of <literal>echo_client</literal> for each <literal>obdfilter</literal> instance in order to generate I/O requests directly to the OST.</para>
          <para>For more details, see <xref linkend="dbdoclet.50438212_59319"/>.</para>
        </listitem>
        <listitem>
          <para><emphasis role="bold">Network</emphasis>  - In this mode, the Lustre client generates I/O requests over the network but these requests are not sent to the OST file system. The OSS node runs the obdecho server to receive the requests but discards them before they are sent to the disk.</para>
-        <para>Pass the parameters <literal>case=network</literal> and <literal>target=<replaceable>hostname|IP_of_server</replaceable></literal> to the script. For each network case, the script does the required setup.</para>
+        <para>Pass the parameters <literal>case=network</literal> and <literal>targets=<replaceable>hostname|IP_of_server</replaceable></literal> to the script. For each network case, the script does the required setup.</para>
          <para>For more details, see <xref linkend="dbdoclet.50438212_36037"/></para>
        </listitem>
        <listitem>
@@ -276,8 +283,8 @@
        <para>The <literal>plot-obdfilter</literal> script generates from the output of the
            <literal>obdfilter-survey</literal> a CSV file and parameters for importing into a
          spreadsheet or gnuplot to visualize the data.</para>
-      <para>To run the <literal>obdfilter-survey</literal> script, create a standard Lustre
-        configuration; no special setup is needed.</para>
+      <para>To run the <literal>obdfilter-survey</literal> script, create a standard Lustre file
+        system configuration; no special setup is needed.</para>
        <para><emphasis role="bold">To perform an automatic run:</emphasis></para>
        <orderedlist>
          <listitem>
@@ -294,6 +301,29 @@
            <para>For example, to run a local test with up to two objects (nobjhi), up to two threads (thrhi), and 1024 MB transfer size (size):</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 case=disk sh obdfilter-survey</screen>
          </listitem>
+        <listitem>
+               <para>Performance measurements for write, rewrite, read etc are provided below:</para>
+               <screen># example output
+Fri Sep 25 11:14:03 EDT 2015 Obdfilter-survey for case=disk from hds1fnb6123
+ost 10 sz 167772160K rsz 1024K obj   10 thr   10 write 10982.73 [ 601.97,2912.91] rewrite 15696.54 [1160.92,3450.85] read 12358.60 [ 938.96,2634.87] 
+...</screen>
+               <para>The file <literal>./lustre-iokit/obdfilter-survey/README.obdfilter-survey</literal>
+               provides an explaination for the output as follows:</para>
+               <screen>ost 10          is the total number of OSTs under test.
+sz 167772160K   is the total amount of data read or written (in bytes).
+rsz 1024K       is the record size (size of each echo_client I/O, in bytes).
+obj    10       is the total number of objects over all OSTs
+thr    10       is the total number of threads over all OSTs and objects
+write           is the test name.  If more tests have been specified they
+           all appear on the same line.
+10982.73        is the aggregate bandwidth over all OSTs measured by
+           dividing the total number of MB by the elapsed time.
+[601.97,2912.91] are the minimum and maximum instantaneous bandwidths seen on
+           any individual OST.
+Note that although the numbers of threads and objects are specifed per-OST
+in the customization section of the script, results are reported aggregated
+over all OSTs.</screen>
+        </listitem>
        </orderedlist>
        <para><emphasis role="italic">To perform a manual run:</emphasis></para>
        <orderedlist>
@@ -315,13 +345,13 @@
          </listitem>
          <listitem>
            <para>List all OSTs you want to test.</para>
-          <para>Use the <literal>target=parameter</literal> to list the OSTs separated by spaces. List the individual OSTs by name using the format
+          <para>Use the <literal>targets=parameter</literal> to list the OSTs separated by spaces. List the individual OSTs by name using the format
                <literal><replaceable>fsname</replaceable>-<replaceable>OSTnumber</replaceable></literal>
              (for example, <literal>lustre-OST0001</literal>). You do not have to specify an MDS or LOV.</para>
          </listitem>
          <listitem>
            <para>Run the <literal>obdfilter-survey</literal> script with the
-              <literal>target=parameter</literal>.</para>
+              <literal>targets=parameter</literal>.</para>
            <para>For example, to run a local test with up to two objects (<literal>nobjhi</literal>), up to two threads (<literal>thrhi</literal>), and 1024 Mb (size) transfer size:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 targets=&quot;lustre-OST0001 \
            lustre-OST0002&quot; sh obdfilter-survey</screen>
@@ -332,7 +362,8 @@
        <title><indexterm><primary>benchmarking</primary><secondary>network</secondary></indexterm>Testing Network Performance</title>
        <para>The <literal>obdfilter-survey</literal> script can only be run automatically against a
          network; no manual test is provided.</para>
-      <para>To run the network test, a specific Lustre setup is needed. Make sure that these configuration requirements have been met.</para>
+      <para>To run the network test, a specific Lustre file system setup is needed. Make sure that
+        these configuration requirements have been met.</para>
        <para><emphasis role="bold">To perform an automatic run:</emphasis></para>
        <orderedlist>
          <listitem>
@@ -353,7 +384,7 @@
                  <literal>targets=<replaceable>hostname|ip_of_server</replaceable></literal>. For
              example:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 targets=&quot;oss0 oss1&quot; \
-          case=network sh odbfilter-survey</screen>
+          case=network sh obdfilter-survey</screen>
          </listitem>
          <listitem>
            <para>On the server side, view the statistics at:</para>
@@ -408,11 +439,11 @@
          </listitem>
          <listitem>
            <para>List all OSCs you want to test.</para>
-          <para>Use the <literal>target=parameter</literal> to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format <literal><replaceable>fsname</replaceable>-<replaceable>OST_name</replaceable>-osc-<replaceable>instance</replaceable></literal> (for example, <literal>lustre-OST0000-osc-ffff88007754bc00</literal>). You <emphasis>do not have to specify an MDS or LOV.</emphasis></para>
+          <para>Use the <literal>targets=parameter</literal> to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format <literal><replaceable>fsname</replaceable>-<replaceable>OST_name</replaceable>-osc-<replaceable>instance</replaceable></literal> (for example, <literal>lustre-OST0000-osc-ffff88007754bc00</literal>). You <emphasis>do not have to specify an MDS or LOV.</emphasis></para>
          </listitem>
          <listitem>
            <para>Run the <literal>obdfilter-survey</literal> script with the
-                <literal>target=<replaceable>osc</replaceable></literal> and
+                <literal>targets=<replaceable>osc</replaceable></literal> and
                <literal>case=netdisk</literal>.</para>
            <para>An example of a local test run with up to two objects (<literal>nobjhi</literal>), up to two threads (<literal>thrhi</literal>), and 1024 Mb (size) transfer size is shown below:</para>
            <screen>$ nobjhi=2 thrhi=2 size=1024 \
@@ -609,30 +640,33 @@
          network hardware.</para>
      </note>
      <para>To run the <literal>ost-survey</literal> script, supply a file size (in KB) and the Lustre
-      mount point. For example, run:</para>
-    <screen>$ ./ost-survey.sh 10 /mnt/lustre
+      file system mount point. For example, run:</para>
+    <screen>$ ./ost-survey.sh -s 10 /mnt/lustre
  </screen>
      <para>Typical output is:</para>
      <screen>
-Average read Speed:                  6.73
-Average write Speed:                 5.41
-read - Worst OST indx 0              5.84 MB/s
-write - Worst OST indx 0             3.77 MB/s
-read - Best OST indx 1               7.38 MB/s
-write - Best OST indx 1              6.31 MB/s
-3 OST devices found
-Ost index 0 Read speed               5.84         Write speed     3.77
-Ost index 0 Read time                0.17         Write time      0.27
-Ost index 1 Read speed               7.38         Write speed     6.31
-Ost index 1 Read time                0.14         Write time      0.16
-Ost index 2 Read speed               6.98         Write speed     6.16
-Ost index 2 Read time                0.14         Write time      0.16 
+Number of Active OST devices : 4
+Worst  Read OST indx: 2 speed: 2835.272725
+Best   Read OST indx: 3 speed: 2872.889668
+Read Average: 2852.508999 +/- 16.444792 MB/s
+Worst  Write OST indx: 3 speed: 17.705545
+Best   Write OST indx: 2 speed: 128.172576
+Write Average: 95.437735 +/- 45.518117 MB/s
+Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
+----------------------------------------------------
+0     2837.440       126.918        0.035      0.788
+1     2864.433       108.954        0.035      0.918
+2     2835.273       128.173        0.035      0.780
+3     2872.890       17.706        0.035      5.648
  </screen>
    </section>
    <section xml:id="mds_survey_ref">
      <title><indexterm><primary>benchmarking</primary><secondary>MDS
  performance</secondary></indexterm>Testing MDS Performance (<literal>mds-survey</literal>)</title>
-       <para><literal>mds-survey</literal> is available in Lustre 2.2 and beyond.  The <literal>mds-survey</literal> script tests the local metadata performance using the echo_client to drive different layers of the MDS stack: mdd, mdt, osd (current lustre version only supports mdd stack). It can be used with the following classes of operations:</para>
+       <para><literal>mds-survey</literal> is available in Lustre software release 2.2 and beyond. The
+        <literal>mds-survey</literal> script tests the local metadata performance using the
+      echo_client to drive different layers of the MDS stack: mdd, mdt, osd (the Lustre software
+      only supports mdd stack). It can be used with the following classes of operations:</para>
  
      <itemizedlist>
        <listitem>
@@ -871,7 +905,7 @@ performance</secondary></indexterm>Testing MDS Performance (<literal>mds-survey<
      <para>The <literal>stats-collect</literal> utility requires:</para>
      <itemizedlist>
        <listitem>
-        <para>Lustre to be installed and set up on your cluster</para>
+        <para>Lustre software to be installed and set up on your cluster</para>
        </listitem>
        <listitem>
          <para>SSH and SCP access to these nodes without requiring a password</para>