Whamcloud - gitweb
LUDOC-479 lnet: Clarify transmit and routing credits
[doc/manual.git] / LustreMonitoring.xml
index 28fa9c6..c91dec3 100644 (file)
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter version="5.0" xml:lang="en-US" xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" xml:id='lustremonitoring'>
-  <info>
-    <title xml:id='lustremonitoring.title'>Lustre Monitoring</title>
-  </info>
-  <para>This chapter provides information on monitoring Lustre and includes the following sections:</para>
+<?xml version='1.0' encoding='UTF-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustremonitoring">
+  <title xml:id="lustremonitoring.title">Monitoring a Lustre File System</title>
+  <para>This chapter provides information on monitoring a Lustre file system and includes the
+    following sections:</para>
+  <itemizedlist>
+    <listitem>
+      <para><xref linkend="dbdoclet.50438273_18711"/>Lustre Changelogs</para>
+    </listitem>
+    <listitem>
+      <para><xref linkend="dbdoclet.jobstats"/>Lustre Jobstats</para>
+    </listitem>
+    <listitem>
+      <para><xref linkend="dbdoclet.50438273_81684"/>Lustre Monitoring Tool</para>
+    </listitem>
+    <listitem>
+      <para><xref linkend="dbdoclet.50438273_80593"/>CollectL</para>
+    </listitem>
+    <listitem>
+      <para><xref linkend="dbdoclet.50438273_44185"/>Other Monitoring Options</para>
+    </listitem>
+  </itemizedlist>
+  <section xml:id="dbdoclet.50438273_18711">
+      <title><indexterm><primary>change logs</primary><see>monitoring</see></indexterm>
+<indexterm><primary>monitoring</primary></indexterm>
+<indexterm><primary>monitoring</primary><secondary>change logs</secondary></indexterm>
 
-  <itemizedlist><listitem>
-          <para><xref linkend='dbdoclet.50438273_18711'/>Lustre Changelogs</para>
+Lustre Changelogs</title>
+    <para>The changelogs feature records events that change the file system
+    namespace or file metadata. Changes such as file creation, deletion,
+    renaming, attribute changes, etc. are recorded with the target and parent
+    file identifiers (FIDs), the name of the target, a timestamp, and user
+    information. These records can be used for a variety of purposes:</para>
+    <itemizedlist>
+      <listitem>
+        <para>Capture recent changes to feed into an archiving system.</para>
       </listitem>
       <listitem>
-          <para><xref linkend="dbdoclet.50438273_81684"/>Lustre Monitoring Tool</para>
+        <para>Use changelog entries to exactly replicate changes in a file
+       system mirror.</para>
       </listitem>
       <listitem>
-          <para><xref linkend="dbdoclet.50438273_80593"/>CollectL</para>
+        <para>Set up &quot;watch scripts&quot; that take action on certain
+       events or directories.</para>
       </listitem>
       <listitem>
-          <para><xref linkend="dbdoclet.50438273_44185"/>Other Monitoring Options</para>
-          </listitem>
-      </itemizedlist>
-
-    <section xml:id="dbdoclet.50438273_18711">
-      <title>12.1 Lustre <anchor xml:id="dbdoclet.50438273_marker-1297383" xreflabel=""/>Changelogs</title>
-      <para>The changelogs feature records events that change the file system namespace or file metadata. Changes such as file creation, deletion, renaming, attribute changes, etc. are recorded with the target and parent file identifiers (FIDs), the name of the target, and a timestamp. These records can be used for a variety of purposes:</para>
-      <itemizedlist><listitem>
-          <para> Capture recent changes to feed into an archiving system.</para>
-        </listitem>
-<listitem>
-          <para> Use changelog entries to exactly replicate changes in a file system mirror.</para>
-        </listitem>
-<listitem>
-          <para> Set up &quot;watch scripts&quot; that take action on certain events or directories.</para>
-        </listitem>
-<listitem>
-          <para> Maintain a rough audit trail (file/directory changes with timestamps, but no user information).</para>
-        </listitem>
-</itemizedlist>
-      <para>Changelogs record types are:</para>
-      <informaltable frame="all">
-        <tgroup cols="2">
-          <colspec colname="c1" colwidth="50*"/>
-          <colspec colname="c2" colwidth="50*"/>
-          <thead>
-            <row>
-              <entry><para><emphasis role="bold">Value</emphasis></para></entry>
-              <entry><para><emphasis role="bold">Description</emphasis></para></entry>
-            </row>
-          </thead>
-          <tbody>
-            <row>
-              <entry><para> MARK</para></entry>
-              <entry><para> Internal recordkeeping</para></entry>
-            </row>
-            <row>
-              <entry><para> CREAT</para></entry>
-              <entry><para> Regular file creation</para></entry>
-            </row>
-            <row>
-              <entry><para> MKDIR</para></entry>
-              <entry><para> Directory creation</para></entry>
-            </row>
-            <row>
-              <entry><para> HLINK</para></entry>
-              <entry><para> Hard link</para></entry>
-            </row>
-            <row>
-              <entry><para> SLINK</para></entry>
-              <entry><para> Soft link</para></entry>
-            </row>
-            <row>
-              <entry><para> MKNOD</para></entry>
-              <entry><para> Other file creation</para></entry>
-            </row>
-            <row>
-              <entry><para> UNLNK</para></entry>
-              <entry><para> Regular file removal</para></entry>
-            </row>
-            <row>
-              <entry><para> RMDIR</para></entry>
-              <entry><para> Directory removal</para></entry>
-            </row>
-            <row>
-              <entry><para> RNMFM</para></entry>
-              <entry><para> Rename, original</para></entry>
-            </row>
-            <row>
-              <entry><para> RNMTO</para></entry>
-              <entry><para> Rename, final</para></entry>
-            </row>
-            <row>
-              <entry><para> IOCTL</para></entry>
-              <entry><para> ioctl on file or directory</para></entry>
-            </row>
-            <row>
-              <entry><para> TRUNC</para></entry>
-              <entry><para> Regular file truncated</para></entry>
-            </row>
-            <row>
-              <entry><para> SATTR</para></entry>
-              <entry><para> Attribute change</para></entry>
-            </row>
-            <row>
-              <entry><para> XATTR</para></entry>
-              <entry><para> Extended attribute change</para></entry>
-            </row>
-            <row>
-              <entry><para> UNKNW</para></entry>
-              <entry><para> Unknown operation</para></entry>
-            </row>
-          </tbody>
-        </tgroup>
-      </informaltable>
-      <para>FID-to-full-pathname and pathname-to-FID functions are also included to map target and parent FIDs into the file system namespace.</para>
-      <section remap="h3">
-        <title>12.1.1 Working with Changelogs</title>
-        <para>Several commands are available to work with changelogs.</para>
-        <section remap="h5">
-          <title>lctl changelog_register</title>
-          <para>Because changelog records take up space on the MDT, the system administration must register changelog users. The registrants specify which records they are &quot;done with&quot;, and the system purges up to the greatest common record.</para>
-          <para>To register a new changelog user, run:</para>
-          <screen>lctl --device &lt;mdt_device&gt; changelog_register
-</screen>
-          <para>Changelog entries are not purged beyond a registered user's set point (see lfs changelog_clear).</para>
-        </section>
-        <section remap="h5">
-          <title>lfs changelog</title>
-          <para>To display the metadata changes on an MDT (the changelog records), run:</para>
-          <screen>lfs changelog &lt;MDT name&gt; [startrec [endrec]] 
-</screen>
-          <para>It is optional whether to specify the start and end records.</para>
-          <para>These are sample changelog records:</para>
-          <screen>2 02MKDIR 4298396676 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\
- pics 
-3 01CREAT 4298402264 0x0 t=[0x200000405:0x15fa:0x0] p=[0x200000405:0x15f9:0\
-x0] chloe.jpg 
-4 06UNLNK 4298404466 0x0 t=[0x200000405:0x15fa:0x0] p=[0x200000405:0x15f9:0\
-x0] chloe.jpg 
-5 07RMDIR 4298405394 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\
- pics 
-</screen>
-        </section>
-        <section remap="h5">
-          <title>lfs changelog_clear</title>
-          <para>To clear old changelog records for a specific user (records that the user no longer needs), run:</para>
-          <screen>lfs changelog_clear &lt;MDT name&gt; &lt;user ID&gt; &lt;endrec&gt;
-</screen>
-          <para>The changelog_clear command indicates that changelog records previous to &lt;endrec&gt; are no longer of interest to a particular user &lt;user ID&gt;, potentially allowing the MDT to free up disk space. An &lt;endrec&gt; value of 0 indicates the current last record. To run changelog_clear, the changelog user must be registered on the MDT node using lctl.</para>
-          <para>When all changelog users are done with records &lt; X, the records are deleted.</para>
-        </section>
-        <section remap="h5">
-          <title>lctl changelog_deregister</title>
-          <para>To deregister (unregister) a changelog user, run:</para>
-          <screen>lctl --device &lt;mdt_device&gt; changelog_deregister &lt;user ID&gt;                
+        <para>Audit activity on Lustre, thanks to user information associated to
+       file/directory changes with timestamps.</para>
+      </listitem>
+    </itemizedlist>
+    <para>Changelogs record types are:</para>
+    <informaltable frame="all">
+      <tgroup cols="2">
+        <colspec colname="c1" colwidth="50*"/>
+        <colspec colname="c2" colwidth="50*"/>
+        <thead>
+          <row>
+            <entry>
+              <para><emphasis role="bold">Value</emphasis></para>
+            </entry>
+            <entry>
+              <para><emphasis role="bold">Description</emphasis></para>
+            </entry>
+          </row>
+        </thead>
+        <tbody>
+          <row>
+            <entry>
+              <para> MARK</para>
+            </entry>
+            <entry>
+              <para> Internal recordkeeping</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> CREAT</para>
+            </entry>
+            <entry>
+              <para> Regular file creation</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> MKDIR</para>
+            </entry>
+            <entry>
+              <para> Directory creation</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> HLINK</para>
+            </entry>
+            <entry>
+              <para> Hard link</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> SLINK</para>
+            </entry>
+            <entry>
+              <para> Soft link</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> MKNOD</para>
+            </entry>
+            <entry>
+              <para> Other file creation</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> UNLNK</para>
+            </entry>
+            <entry>
+              <para> Regular file removal</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> RMDIR</para>
+            </entry>
+            <entry>
+              <para> Directory removal</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> RENME</para>
+            </entry>
+            <entry>
+              <para> Rename, original</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> RNMTO</para>
+            </entry>
+            <entry>
+              <para> Rename, final</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> OPEN *</para>
+            </entry>
+            <entry>
+              <para> Open</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> CLOSE</para>
+            </entry>
+            <entry>
+              <para> Close</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> LYOUT</para>
+            </entry>
+            <entry>
+              <para> Layout change</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> TRUNC</para>
+            </entry>
+            <entry>
+              <para> Regular file truncated</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> SATTR</para>
+            </entry>
+            <entry>
+              <para> Attribute change</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> XATTR</para>
+            </entry>
+            <entry>
+              <para> Extended attribute change (setxattr)</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> HSM</para>
+            </entry>
+            <entry>
+              <para> HSM specific event</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> MTIME</para>
+            </entry>
+            <entry>
+              <para> MTIME change</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> CTIME</para>
+            </entry>
+            <entry>
+              <para> CTIME change</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> ATIME *</para>
+            </entry>
+            <entry>
+              <para> ATIME change</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> MIGRT</para>
+            </entry>
+            <entry>
+              <para> Migration event</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> FLRW</para>
+            </entry>
+            <entry>
+              <para> File Level Replication: file initially written</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> RESYNC</para>
+            </entry>
+            <entry>
+              <para> File Level Replication: file re-synced</para>
+            </entry>
+          </row>
+         <row>
+            <entry>
+              <para> GXATR *</para>
+            </entry>
+            <entry>
+              <para> Extended attribute access (getxattr)</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para> NOPEN *</para>
+            </entry>
+            <entry>
+              <para> Denied open</para>
+            </entry>
+          </row>
+        </tbody>
+      </tgroup>
+    </informaltable>
+    <note><para>Event types marked with * are not recorded by default. Refer to
+    <xref linkend="dbdoclet.modifyChangelogMask" /> for instructions on
+    modifying the Changelogs mask.</para></note>
+    <para>FID-to-full-pathname and pathname-to-FID functions are also included
+    to map target and parent FIDs into the file system namespace.</para>
+    <section remap="h3">
+      <title><indexterm><primary>monitoring</primary><secondary>change logs
+    </secondary></indexterm>
+Working with Changelogs</title>
+      <para>Several commands are available to work with changelogs.</para>
+      <section remap="h5">
+        <title>
+          <literal>lctl changelog_register</literal>
+        </title>
+        <para>Because changelog records take up space on the MDT, the system
+       administration must register changelog users. As soon as a changelog
+       user is registered, the Changelogs feature is enabled. The registrants
+       specify which records they are &quot;done with&quot;, and the system
+       purges up to the greatest common record.</para>
+       <para>To register a new changelog user, run:</para>
+        <screen>mds# lctl --device <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> changelog_register
 </screen>
-          <para> Changelog_deregister cl1 effectively does a changelog_clear cl10 as it deregisters.</para>
-        </section>
+        <para>Changelog entries are not purged beyond a registered user&apos;s
+       set point (see <literal>lfs changelog_clear</literal>).</para>
       </section>
-      <section remap="h3">
-        <title>12.1.2 Changelog Examples</title>
-        <para>This section provides examples of different changelog commands.</para>
-        <section remap="h5">
-          <title>Registering a Changelog User</title>
-          <para>To register a new changelog user for a device (lustre-MDT0000):</para>
-          <screen># lctl --device lustre-MDT0000 changelog_register
-lustre-MDT0000: Registered changelog userid &apos;cl1&apos;
-</screen>
-        </section>
-        <section remap="h5">
-          <title>Displaying Changelog Records</title>
-          <para>To display changelog records on an MDT (lustre-MDT0000):</para>
-          <screen>$ lfs changelog lustre-MDT0000
-1 00MARK  19:08:20.890432813 2010.03.24 0x0 t=[0x10001:0x0:0x0] p=[0:0x0:0x\
-0] mdd_obd-lustre-MDT0000-0 
-2 02MKDIR 19:10:21.509659173 2010.03.24 0x0 t=[0x200000420:0x3:0x0] p=[0x61\
-b4:0xca2c7dde:0x0] mydir 
-3 14SATTR 19:10:27.329356533 2010.03.24 0x0 t=[0x200000420:0x3:0x0] 
-4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\
-0000420:0x3:0x0] hosts 
-</screen>
-          <para>Changelog records include this information:</para>
-          <screen>rec# 
+      <section remap="h5">
+        <title>
+          <literal>lfs changelog</literal>
+        </title>
+        <para>To display the metadata changes on an MDT (the changelog records),
+       run:</para>
+        <screen>lfs changelog <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> [startrec [endrec]] </screen>
+        <para>It is optional whether to specify the start and end
+       records.</para>
+        <para>These are sample changelog records:</para>
+        <screen>1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
+2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
+3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
+4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics </screen>
+      </section>
+      <section remap="h5">
+        <title>
+          <literal>lfs changelog_clear</literal>
+        </title>
+        <para>To clear old changelog records for a specific user (records that
+       the user no longer needs), run:</para>
+        <screen>lfs changelog_clear <replaceable>mdt_name</replaceable> <replaceable>userid</replaceable> <replaceable>endrec</replaceable></screen>
+        <para>The <literal>changelog_clear</literal> command indicates that
+       changelog records previous to <replaceable>endrec</replaceable> are no
+       longer of interest to a particular user
+       <replaceable>userid</replaceable>, potentially allowing the MDT to free
+       up disk space. An <literal><replaceable>endrec</replaceable></literal>
+       value of 0 indicates the current last record. To run
+       <literal>changelog_clear</literal>, the changelog user must be
+       registered on the MDT node using <literal>lctl</literal>.</para>
+        <para>When all changelog users are done with records &lt; X, the records
+       are deleted.</para>
+      </section>
+      <section remap="h5">
+        <title>
+          <literal>lctl changelog_deregister</literal>
+        </title>
+        <para>To deregister (unregister) a changelog user, run:</para>
+        <screen>mds# lctl --device <replaceable>mdt_device</replaceable> changelog_deregister <replaceable>userid</replaceable>       </screen>
+        <para> <literal>changelog_deregister cl1</literal> effectively does a
+       <literal>lfs changelog_clear cl1 0</literal> as it deregisters.</para>
+      </section>
+    </section>
+    <section remap="h3">
+      <title>Changelog Examples</title>
+      <para>This section provides examples of different changelog
+      commands.</para>
+      <section remap="h5">
+        <title>Registering a Changelog User</title>
+        <para>To register a new changelog user for a device
+       (<literal>lustre-MDT0000</literal>):</para>
+        <screen>mds# lctl --device lustre-MDT0000 changelog_register
+lustre-MDT0000: Registered changelog userid &apos;cl1&apos;</screen>
+      </section>
+      <section remap="h5">
+        <title>Displaying Changelog Records</title>
+        <para>To display changelog records on an MDT
+       (<literal>lustre-MDT0000</literal>):</para>
+        <screen>$ lfs changelog lustre-MDT0000
+1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
+2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
+3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
+4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics</screen>
+        <para>Changelog records include this information:</para>
+        <screen>rec# 
 operation_type(numerical/text) 
 timestamp 
 datestamp 
 flags 
 t=target_FID 
+ef=extended_flags
+u=uid:gid
+nid=client_NID
 p=parent_FID 
-target_name
-</screen>
-          <para>Displayed in this format:</para>
-          <screen>rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
-p=parent_FID target_name
-</screen>
-          <para>For example:</para>
-          <screen>4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\
-0000420:0x3:0x0] hosts
-</screen>
-        </section>
-        <section remap="h5">
-          <title>Clearing Changelog Records</title>
-          <para>To notify a device that a specific user (cl1) no longer needs records (up to and including 3):</para>
-          <screen>$ lfs changelog_clear  lustre-MDT0000 cl1 3
-</screen>
-          <para>To confirm that the changelog_clear operation was successful, run lfs changelog; only records after id-3 are listed:</para>
-          <screen>$ lfs changelog lustre-MDT0000
-4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\
-0000420:0x3:0x0] hosts
-</screen>
-        </section>
-        <section remap="h5">
-          <title>Deregistering a Changelog User</title>
-          <para>To deregister a changelog user (cl1) for a specific device (lustre-MDT0000):</para>
-          <screen># lctl --device lustre-MDT0000 changelog_deregister cl1
-lustre-MDT0000: Deregistered changelog user &apos;cl1&apos;
-</screen>
-          <para>The deregistration operation clears all changelog records for the specified user (cli).</para>
-          <screen>$ lfs changelog lustre-MDT0000
-5 00MARK  19:13:40.858292517 2010.03.24 0x0 t=[0x40001:0x0:0x0] p=[0:0x0:0x\
-0] mdd_obd-lustre-MDT0000-0 
+target_name</screen>
+        <para>Displayed in this format:</para>
+        <screen>rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
+ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name</screen>
+        <para>For example:</para>
+        <screen>2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg</screen>
+      </section>
+      <section remap="h5">
+        <title>Clearing Changelog Records</title>
+        <para>To notify a device that a specific user (<literal>cl1</literal>)
+       no longer needs records (up to and including 3):</para>
+        <screen>$ lfs changelog_clear  lustre-MDT0000 cl1 3</screen>
+        <para>To confirm that the <literal>changelog_clear</literal> operation
+       was successful, run <literal>lfs changelog</literal>; only records after
+       id-3 are listed:</para>
+        <screen>$ lfs changelog lustre-MDT0000
+4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics</screen>
+      </section>
+      <section remap="h5">
+        <title>Deregistering a Changelog User</title>
+        <para>To deregister a changelog user (<literal>cl1</literal>) for a
+       specific device (<literal>lustre-MDT0000</literal>):</para>
+        <screen>mds# lctl --device lustre-MDT0000 changelog_deregister cl1
+lustre-MDT0000: Deregistered changelog user &apos;cl1&apos;</screen>
+        <para>The deregistration operation clears all changelog records for the
+       specified user (<literal>cl1</literal>).</para>
+        <screen>$ lfs changelog lustre-MDT0000
+5 00MARK  15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \
+u=500:500 nid=0@&lt;0:0&gt; p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0
 </screen>
-          <informaltable frame="none">
-            <tgroup cols="1">
-              <colspec colname="c1" colwidth="100*"/>
-              <tbody>
-                <row>
-                  <entry><para><emphasis role="bold">Note -</emphasis>MARK records typically indicate changelog recording status changes.</para></entry>
-                </row>
-              </tbody>
-            </tgroup>
-          </informaltable>
-        </section>
-        <section remap="h5">
-          <title>Displaying the Changelog Index and Registered Users</title>
-          <para>To display the current, maximum changelog index and registered changelog users for a specific device (lustre-MDT0000):</para>
-          <screen># lctl get_param  mdd.lustre-MDT0000.changelog_users 
+        <note>
+          <para>MARK records typically indicate changelog recording status
+         changes.</para>
+        </note>
+      </section>
+      <section remap="h5">
+        <title>Displaying the Changelog Index and Registered Users</title>
+        <para>To display the current, maximum changelog index and registered
+       changelog users for a specific device
+       (<literal>lustre-MDT0000</literal>):</para>
+        <screen>mds# lctl get_param  mdd.lustre-MDT0000.changelog_users 
 mdd.lustre-MDT0000.changelog_users=current index: 8 
-ID    index 
-cl2   8
+ID    index (idle seconds)
+cl2   8 (180)
 </screen>
-        </section>
-        <section remap="h5">
-          <title>Displaying the Changelog Mask</title>
-          <para>To show the current changelog mask on a specific device (lustre-MDT0000):</para>
-          <screen># lctl get_param  mdd.lustre-MDT0000.changelog_mask 
+      </section>
+      <section remap="h5">
+        <title>Displaying the Changelog Mask</title>
+        <para>To show the current changelog mask on a specific device
+       (<literal>lustre-MDT0000</literal>):</para>
+        <screen>mds# lctl get_param  mdd.lustre-MDT0000.changelog_mask 
+
 mdd.lustre-MDT0000.changelog_mask= 
-MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RNMFM RNMTO OPEN CLOSE IOCTL\
- TRUNC SATTR XATTR HSM 
+MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \
+TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
 </screen>
-        </section>
-        <section remap="h5">
-          <title>Setting the Changelog Mask</title>
-          <para>To set the current changelog mask on a specific device (lustre-MDT0000):</para>
-          <screen># lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK 
+      </section>
+      <section xml:id="dbdoclet.modifyChangelogMask" remap="h5">
+        <title>Setting the Changelog Mask</title>
+        <para>To set the current changelog mask on a specific device
+       (<literal>lustre-MDT0000</literal>):</para>
+        <screen>mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK 
 mdd.lustre-MDT0000.changelog_mask=HLINK 
 $ lfs changelog_clear lustre-MDT0000 cl1 0 
 $ mkdir /mnt/lustre/mydir/foo
 $ cp /etc/hosts /mnt/lustre/mydir/foo/file
 $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink
 </screen>
-          <para> Only item types that are in the mask show up in the changelog.</para>
-          <screen>$ lfs changelog lustre-MDT0000
-9 03HLINK 19:19:35.171867477 2010.03.24 0x0 t=[0x200000420:0x6:0x0] p=[0x20\
-0000420:0x3:0x0] myhardlink
+        <para>Only item types that are in the mask show up in the
+       changelog.</para>
+        <screen>$ lfs changelog lustre-MDT0000
+9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \
+u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink
 </screen>
-        </section>
+        <para></para>
       </section>
     </section>
-    <section xml:id="dbdoclet.50438273_81684">
-      <title>12.2 Lustre <anchor xml:id="dbdoclet.50438273_marker-1297386" xreflabel=""/>Monitoring Tool</title>
-      <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed system developed and maintained by Lawrence Livermore National Lab (LLNL)). It provides a &apos;&apos;top&apos;&apos; like display of activity on server-side nodes (MDS, OSS and portals routers) on one or more Lustre file systems. It does not provide support for monitoring clients. For more information on LMT, including the setup procedure, see:</para>
-      <para><link xl:href="http://code.google.com/p/lmt/">http://code.google.com/p/lmt/</link></para>
-      <para>LMT questions can be directed to:</para>
-      <para><link xl:href="mailto:lmt-discuss@googlegroups.com">lmt-discuss@googlegroups.com</link></para>
+    <section remap="h3" condition='l2B'>
+      <title><indexterm><primary>audit</primary>
+      <secondary>change logs</secondary></indexterm>
+Audit with Changelogs</title>
+      <para>A specific use case for Lustre Changelogs is audit. According to a
+      definition found on <link xmlns:xlink="http://www.w3.org/1999/xlink"
+      xlink:href="https://en.wikipedia.org/wiki/Information_technology_audit">
+      Wikipedia</link>, information technology audits are used to evaluate the
+      organization's ability to protect its information assets and to properly
+      dispense information to authorized parties. Basically, audit consists in
+      controlling that all data accesses made were done according to the access
+      control policy in place. And usually, this is done by analyzing access
+      logs.</para>
+      <para>Audit can be used as a proof of security in place. But Audit can
+      also be a requirement to comply with regulations.</para>
+      <para>Lustre Changelogs are a good mechanism for audit, because this is a
+      centralized facility, and it is designed to be transactional. Changelog
+      records contain all information necessary for auditing purposes:</para>
+      <itemizedlist>
+       <listitem>
+          <para>ability to identify object of action thanks to file identifiers
+         (FIDs) and name of targets</para>
+       </listitem>
+       <listitem>
+         <para>ability to identify subject of action thanks to UID/GID and NID
+         information</para>
+       </listitem>
+       <listitem>
+         <para>ability to identify time of action thanks to timestamp</para>
+       </listitem>
+      </itemizedlist>
+      <section remap="h5">
+        <title>Enabling Audit</title>
+       <para>To have a fully functional Changelogs-based audit facility, some
+       additional Changelog record types must be enabled, to be able to record
+       events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that
+       enabling these record types may have some performance impact. For
+       instance, recording OPEN and GETXATTR events generate writes in the
+       Changelog records for a read operation from a file-system
+       standpoint.</para>
+       <para>Being able to record events such as OPEN or DENIED OPEN is
+       important from an audit perspective. For instance, if Lustre file system
+       is used to store medical records on a system dedicated to Life Sciences,
+       data privacy is crucial. Administrators may need to know which doctors
+       accessed, or tried to access, a given medical record and when. And
+       conversely, they might need to know which medical records a given doctor
+       accessed.</para>
+       <para>To enable all changelog entry types, do:</para>
+        <screen>mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL
+mdd.seb-MDT0000.changelog_mask=ALL</screen>
+       <para>Once all required record types have been enabled, just register a
+       Changelogs user and the audit facility is operational.</para>
+       <para>Note that, however, it is possible to control which Lustre client
+       nodes can trigger the recording of file system access events to the
+       Changelogs, thanks to the <literal>audit_mode</literal> flag on nodemap
+       entries. The reason to disable audit on a per-nodemap basis is to
+       prevent some nodes (e.g. backup, HSM agent nodes) from flooding the
+       audit logs. When <literal>audit_mode</literal> flag is
+       set to 1 on a nodemap entry, a client pertaining to this nodemap will be
+       able to record file system access events to the Changelogs, if
+       Changelogs are otherwise activated. When set to 0, events are not logged
+       into the Changelogs, no matter if Changelogs are activated or not. By
+       default, <literal>audit_mode</literal> flag is set to 1 in newly created
+       nodemap entries. And it is also set to 1 in 'default' nodemap.</para>
+       <para>To prevent nodes pertaining to a nodemap to generate Changelog
+       entries, do:</para>
+       <screen>
+mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0</screen>
+      </section>
+      <section remap="h5">
+        <title>Audit examples</title>
+       <section remap="h5">
+          <title>
+           <literal>OPEN</literal>
+         </title>
+         <para>An OPEN changelog entry is in the form:</para>
+         <screen>
+7 10OPEN  13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \
+ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w-</screen>
+         <para>It includes information about the open mode, in the form
+         m=rwx.</para>
+         <para>OPEN entries are recorded only once per UID/GID, for a given
+         open mode, as long as the file is not closed by this UID/GID. It
+         avoids flooding the Changelogs for instance if there is an MPI job
+         opening the same file thousands of times from different threads. It
+         reduces the ChangeLog load significantly, without significantly
+         affecting the audit information. Similarly, only the last CLOSE per
+         UID/GID is recorded.</para>
+       </section>
+       <section remap="h5">
+          <title>
+           <literal>GETXATTR</literal>
+         </title>
+         <para>A GETXATTR changelog entry is in the form:</para>
+         <screen>
+8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \
+ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0</screen>
+         <para>It includes information about the name of the extended attribute
+         being accessed, in the form <literal>x=&lt;xattr name&gt;</literal>.
+         </para>
+       </section>
+       <section remap="h5">
+          <title>
+           <literal>SETXATTR</literal>
+         </title>
+         <para>A SETXATTR changelog entry is in the form:</para>
+         <screen>
+4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \
+ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0</screen>
+         <para>It includes information about the name of the extended attribute
+         being modified, in the form <literal>x=&lt;xattr name&gt;</literal>.
+         </para>
+       </section>
+       <section remap="h5">
+          <title>
+           <literal>DENIED OPEN</literal>
+         </title>
+         <para>A DENIED OPEN changelog entry is in the form:</para>
+         <screen>
+4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \
+ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w-</screen>
+         <para>It has the same information as a regular OPEN entry. In order to
+         avoid flooding the Changelogs, DENIED OPEN entries are rate limited:
+         no more than one entry per user per file per time interval, this time
+         interval (in seconds) being configurable via
+         <literal>mdd.&lt;mdtname&gt;.changelog_deniednext</literal>
+         (default value is 60 seconds).</para>
+         <screen>
+mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120
+mdd.seb-MDT0000.changelog_deniednext=120
+mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext
+mdd.seb-MDT0000.changelog_deniednext=120</screen>
+       </section>
+      </section>
+    </section>
+  </section>
+  <section xml:id="dbdoclet.jobstats">
+      <title><indexterm><primary>jobstats</primary><see>monitoring</see></indexterm>
+<indexterm><primary>monitoring</primary></indexterm>
+<indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
+
+Lustre Jobstats</title>
+    <para>The Lustre jobstats feature collects file system operation statistics
+      for user processes running on Lustre clients, and exposes on the server
+      using the unique Job Identifier (JobID) provided by the job scheduler for
+      each job. Job schedulers known to be able to work with jobstats include:
+      SLURM, SGE, LSF, Loadleveler, PBS and Maui/MOAB.</para>
+    <para>Since jobstats is implemented in a scheduler-agnostic manner, it is
+    likely that it will be able to work with other schedulers also, and also
+    in environments that do not use a job scheduler, by storing custom format
+    strings in the <literal>jobid_name</literal>.</para>
+    <section remap="h3">
+      <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
+      How Jobstats Works</title>
+      <para>The Lustre jobstats code on the client extracts the unique JobID
+      from an environment variable within the user process, and sends this
+      JobID to the server with the I/O operation.  The server tracks
+      statistics for operations whose JobID is given, indexed by that
+      ID.</para>
+      
+      <para>A Lustre setting on the client, <literal>jobid_var</literal>,
+      specifies which environment variable to holds the JobID for that process
+      Any environment variable can be specified.  For example, SLURM sets the
+      <literal>SLURM_JOB_ID</literal> environment variable with the unique
+      job ID on each client when the job is first launched on a node, and
+      the <literal>SLURM_JOB_ID</literal> will be inherited by all child
+      processes started below that process.</para>
+      
+      <para>Lustre can be configured to generate a synthetic JobID from
+      the client's process name and numeric UID, by setting
+      <literal>jobid_var=procname_uid</literal>.  This will generate a
+      uniform JobID when running the same binary across multiple client
+      nodes, but cannot distinguish whether the binary is part of a single
+      distributed process or multiple independent processes.
+      </para>
+      
+      <para condition="l28">In Lustre 2.8 and later it is possible to set
+      <literal>jobid_var=nodelocal</literal> and then also set
+      <literal>jobid_name=</literal><replaceable>name</replaceable>, which
+      <emphasis>all</emphasis> processes on that client node will use.  This
+      is useful if only a single job is run on a client at one time, but if
+      multiple jobs are run on a client concurrently, the per-session JobID
+      should be used.
+      </para>
+
+      <para condition="l2C">In Lustre 2.12 and later, it is possible to
+      specify more complex JobID values for <literal>jobid_name</literal>
+      by using a string that contains format codes that are evaluated for
+      each process, in order to generate a site- or node-specific JobID string.
+      </para>
+      <itemizedlist>
+        <listitem>
+         <para><emphasis>%e</emphasis> print executable name</para>
+        </listitem>
+        <listitem>
+         <para><emphasis>%g</emphasis> print group ID number</para>
+        </listitem>
+        <listitem>
+         <para><emphasis>%h</emphasis> print hostname</para>
+        </listitem>
+        <listitem>
+         <para><emphasis>%j</emphasis> print JobID from process environment
+         variable named by the <emphasis>jobid_var</emphasis> parameter
+         </para>
+        </listitem>
+        <listitem>
+         <para><emphasis>%p</emphasis> print numeric process ID</para>
+        </listitem>
+        <listitem>
+         <para><emphasis>%u</emphasis> print user ID number</para>
+        </listitem>
+      </itemizedlist>
+
+      <para condition="l2D">In Lustre 2.13 and later, it is possible to
+      set a per-session JobID by setting the
+      <literal>jobid_this_session</literal> parameter.  This will be
+      inherited by all processes that are started in this login session,
+      but there can be a different JobID for each login session.
+      </para>
+
+      <para>The setting of <literal>jobid_var</literal> need not be the same
+      on all clients.  For example, one could use
+      <literal>SLURM_JOB_ID</literal> on all clients managed by SLURM, and
+      use <literal>procname_uid</literal> on clients not managed by SLURM,
+      such as interactive login nodes.</para>
+      
+      <para>It is not possible to have different
+      <literal>jobid_var</literal> settings on a single node, since it is
+      unlikely that multiple job schedulers are active on one client.
+      However, the actual JobID value is local to each process environment
+      and it is possible for multiple jobs with different JobIDs to be
+      active on a single client at one time.</para>
+    </section>
+
+    <section remap="h3">
+      <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
+Enable/Disable Jobstats</title>
+      <para>Jobstats are disabled by default.  The current state of jobstats
+      can be verified by checking <literal>lctl get_param jobid_var</literal>
+      on a client:</para>
+      <screen>
+$ lctl get_param jobid_var
+jobid_var=disable
+      </screen>
+      <para>
+      To enable jobstats on the <literal>testfs</literal> file system with SLURM:</para>
+      <screen># lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID</screen>
+      <para>The <literal>lctl conf_param</literal> command to enable or disable
+      jobstats should be run on the MGS as root. The change is persistent, and
+      will be propagated to the MDS, OSS, and client nodes automatically when
+      it is set on the MGS and for each new client mount.</para>
+      <para>To temporarily enable jobstats on a client, or to use a different
+      jobid_var on a subset of nodes, such as nodes in a remote cluster that
+      use a different job scheduler, or interactive login nodes that do not
+      use a job scheduler at all, run the <literal>lctl set_param</literal>
+      command directly on the client node(s) after the filesystem is mounted.
+      For example, to enable the <literal>procname_uid</literal> synthetic
+      JobID on a login node run:
+      <screen># lctl set_param jobid_var=procname_uid</screen>
+      The <literal>lctl set_param</literal> setting is not persistent, and will
+      be reset if the global <literal>jobid_var</literal> is set on the MGS or
+      if the filesystem is unmounted.</para>
+      <para>The following table shows the environment variables which are set
+      by various job schedulers.  Set <literal>jobid_var</literal> to the value
+      for your job scheduler to collect statistics on a per job basis.</para>
+    <informaltable frame="all">
+      <tgroup cols="2">
+        <colspec colname="c1" colwidth="50*"/>
+        <colspec colname="c2" colwidth="50*"/>
+        <thead>
+          <row>
+            <entry>
+              <para><emphasis role="bold">Job Scheduler</emphasis></para>
+            </entry>
+            <entry>
+              <para><emphasis role="bold">Environment Variable</emphasis></para>
+            </entry>
+          </row>
+        </thead>
+        <tbody>
+          <row>
+            <entry>
+              <para>Simple Linux Utility for Resource Management (SLURM)</para>
+            </entry>
+            <entry>
+              <para>SLURM_JOB_ID</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>Sun Grid Engine (SGE)</para>
+            </entry>
+            <entry>
+              <para>JOB_ID</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>Load Sharing Facility (LSF)</para>
+            </entry>
+            <entry>
+              <para>LSB_JOBID</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>Loadleveler</para>
+            </entry>
+            <entry>
+              <para>LOADL_STEP_ID</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>Portable Batch Scheduler (PBS)/MAUI</para>
+            </entry>
+            <entry>
+              <para>PBS_JOBID</para>
+            </entry>
+          </row>
+          <row>
+            <entry>
+              <para>Cray Application Level Placement Scheduler (ALPS)</para>
+            </entry>
+            <entry>
+              <para>ALPS_APP_ID</para>
+            </entry>
+          </row>
+        </tbody>
+      </tgroup>
+    </informaltable>
+    <para>There are two special values for <literal>jobid_var</literal>:
+    <literal>disable</literal> and <literal>procname_uid</literal>. To disable
+    jobstats, specify <literal>jobid_var</literal> as <literal>disable</literal>:</para>
+    <screen># lctl conf_param testfs.sys.jobid_var=disable</screen>
+    <para>To track job stats per process name and user ID (for debugging, or
+    if no job scheduler is in use on some nodes such as login nodes), specify
+    <literal>jobid_var</literal> as <literal>procname_uid</literal>:</para>
+    <screen># lctl conf_param testfs.sys.jobid_var=procname_uid</screen>
+    </section>
+    <section remap="h3">
+      <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
+Check Job Stats</title>
+    <para>Metadata operation statistics are collected on MDTs. These statistics can be accessed for
+        all file systems and all jobs on the MDT via the <literal>lctl get_param
+          mdt.*.job_stats</literal>. For example, clients running with
+          <literal>jobid_var=procname_uid</literal>:</para>
+    <screen>
+# lctl get_param mdt.*.job_stats
+job_stats:
+- job_id:          bash.0
+  snapshot_time:   1352084992
+  open:            { samples:     2, unit:  reqs }
+  close:           { samples:     2, unit:  reqs }
+  mknod:           { samples:     0, unit:  reqs }
+  link:            { samples:     0, unit:  reqs }
+  unlink:          { samples:     0, unit:  reqs }
+  mkdir:           { samples:     0, unit:  reqs }
+  rmdir:           { samples:     0, unit:  reqs }
+  rename:          { samples:     0, unit:  reqs }
+  getattr:         { samples:     3, unit:  reqs }
+  setattr:         { samples:     0, unit:  reqs }
+  getxattr:        { samples:     0, unit:  reqs }
+  setxattr:        { samples:     0, unit:  reqs }
+  statfs:          { samples:     0, unit:  reqs }
+  sync:            { samples:     0, unit:  reqs }
+  samedir_rename:  { samples:     0, unit:  reqs }
+  crossdir_rename: { samples:     0, unit:  reqs }
+- job_id:          mythbackend.0
+  snapshot_time:   1352084996
+  open:            { samples:    72, unit:  reqs }
+  close:           { samples:    73, unit:  reqs }
+  mknod:           { samples:     0, unit:  reqs }
+  link:            { samples:     0, unit:  reqs }
+  unlink:          { samples:    22, unit:  reqs }
+  mkdir:           { samples:     0, unit:  reqs }
+  rmdir:           { samples:     0, unit:  reqs }
+  rename:          { samples:     0, unit:  reqs }
+  getattr:         { samples:   778, unit:  reqs }
+  setattr:         { samples:    22, unit:  reqs }
+  getxattr:        { samples:     0, unit:  reqs }
+  setxattr:        { samples:     0, unit:  reqs }
+  statfs:          { samples: 19840, unit:  reqs }
+  sync:            { samples: 33190, unit:  reqs }
+  samedir_rename:  { samples:     0, unit:  reqs }
+  crossdir_rename: { samples:     0, unit:  reqs }
+    </screen>
+    <para>Data operation statistics are collected on OSTs. Data operations
+    statistics can be accessed via
+    <literal>lctl get_param obdfilter.*.job_stats</literal>, for example:</para>
+    <screen>
+$ lctl get_param obdfilter.*.job_stats
+obdfilter.myth-OST0000.job_stats=
+job_stats:
+- job_id:          mythcommflag.0
+  snapshot_time:   1429714922
+  read:    { samples: 974, unit: bytes, min: 4096, max: 1048576, sum: 91530035 }
+  write:   { samples:   0, unit: bytes, min:    0, max:       0, sum:        0 }
+  setattr: { samples:   0, unit:  reqs }
+  punch:   { samples:   0, unit:  reqs }
+  sync:    { samples:   0, unit:  reqs }
+obdfilter.myth-OST0001.job_stats=
+job_stats:
+- job_id:          mythbackend.0
+  snapshot_time:   1429715270
+  read:    { samples:   0, unit: bytes, min:     0, max:      0, sum:        0 }
+  write:   { samples:   1, unit: bytes, min: 96899, max:  96899, sum:    96899 }
+  setattr: { samples:   0, unit:  reqs }
+  punch:   { samples:   1, unit:  reqs }
+  sync:    { samples:   0, unit:  reqs }
+obdfilter.myth-OST0002.job_stats=job_stats:
+obdfilter.myth-OST0003.job_stats=job_stats:
+obdfilter.myth-OST0004.job_stats=
+job_stats:
+- job_id:          mythfrontend.500
+  snapshot_time:   1429692083
+  read:    { samples:   9, unit: bytes, min: 16384, max: 1048576, sum: 4444160 }
+  write:   { samples:   0, unit: bytes, min:     0, max:       0, sum:       0 }
+  setattr: { samples:   0, unit:  reqs }
+  punch:   { samples:   0, unit:  reqs }
+  sync:    { samples:   0, unit:  reqs }
+- job_id:          mythbackend.500
+  snapshot_time:   1429692129
+  read:    { samples:   0, unit: bytes, min:     0, max:       0, sum:       0 }
+  write:   { samples:   1, unit: bytes, min: 56231, max:   56231, sum:   56231 }
+  setattr: { samples:   0, unit:  reqs }
+  punch:   { samples:   1, unit:  reqs }
+  sync:    { samples:   0, unit:  reqs }
+    </screen>
     </section>
-    <section xml:id="dbdoclet.50438273_80593">
-      <title>12.3 Collect<anchor xml:id="dbdoclet.50438273_marker-1297391" xreflabel=""/>L</title>
-      <para>CollectL is another tool that can be used to monitor Lustre. You can run CollectL on a Lustre system that has any combination of MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and played back at a later time. It can also be converted to a format suitable for plotting.</para>
-      <para>For more information about CollectL, see:</para>
-      <para><link xl:href="http://collectl.sourceforge.net">http://collectl.sourceforge.net</link></para>
-      <para>Lustre-specific documentation is also available. See:</para>
-      <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
+    <section remap="h3">
+      <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
+Clear Job Stats</title>
+    <para>Accumulated job statistics can be reset by writing proc file <literal>job_stats</literal>.</para>
+    <para>Clear statistics for all jobs on the local node:</para>
+    <screen># lctl set_param obdfilter.*.job_stats=clear</screen>
+    <para>Clear statistics only for job 'bash.0' on lustre-MDT0000:</para>
+    <screen># lctl set_param mdt.lustre-MDT0000.job_stats=bash.0</screen>
     </section>
-    <section xml:id="dbdoclet.50438273_44185">
-      <title>12.4 Other Monitoring Options</title>
-      <para>A variety of standard tools are available publically.</para>
-      <para>Another option is to script a simple monitoring solution that looks at various reports from ipconfig, as well as the procfs files generated by Lustre.</para>
-      <para><anchor xml:id="dbdoclet.50438273_67514" xreflabel=""/> </para>
+    <section remap="h3">
+      <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
+Configure Auto-cleanup Interval</title>
+    <para>By default, if a job is inactive for 600 seconds (10 minutes) statistics for this job will be dropped. This expiration value can be changed temporarily via:</para>
+    <screen># lctl set_param *.*.job_cleanup_interval={max_age}</screen>
+    <para>It can also be changed permanently, for example to 700 seconds via:</para>
+    <screen># lctl conf_param testfs.mdt.job_cleanup_interval=700</screen>
+    <para>The <literal>job_cleanup_interval</literal> can be set as 0 to disable the auto-cleanup. Note that if auto-cleanup of Jobstats is disabled, then all statistics will be kept in memory forever, which may eventually consume all memory on the servers. In this case, any monitoring tool should explicitly clear individual job statistics as they are processed, as shown above.</para>
     </section>
+  </section>
+  <section xml:id="dbdoclet.50438273_81684">
+    <title><indexterm>
+        <primary>monitoring</primary>
+        <secondary>Lustre Monitoring Tool</secondary>
+      </indexterm> Lustre Monitoring Tool (LMT)</title>
+    <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed system that provides a
+        <literal>top</literal>-like display of activity on server-side nodes (MDS, OSS and portals
+      routers) on one or more Lustre file systems. It does not provide support for monitoring
+      clients. For more information on LMT, including the setup procedure, see:</para>
+    <para><link xl:href="http://code.google.com/p/lmt/"
+      >https://github.com/chaos/lmt/wiki</link></para>
+    <para>LMT questions can be directed to:</para>
+    <para><link xl:href="mailto:lmt-discuss@googlegroups.com">lmt-discuss@googlegroups.com</link></para>
+  </section>
+  <section xml:id="dbdoclet.50438273_80593">
+    <title>
+      <literal>CollectL</literal>
+    </title>
+    <para><literal>CollectL</literal> is another tool that can be used to monitor a Lustre file
+      system. You can run <literal>CollectL</literal> on a Lustre system that has any combination of
+      MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and
+      played back at a later time. It can also be converted to a format suitable for
+      plotting.</para>
+    <para>For more information about <literal>CollectL</literal>, see:</para>
+    <para><link xl:href="http://collectl.sourceforge.net">http://collectl.sourceforge.net</link></para>
+    <para>Lustre-specific documentation is also available. See:</para>
+    <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
+  </section>
+  <section xml:id="dbdoclet.50438273_44185">
+    <title><indexterm><primary>monitoring</primary><secondary>additional tools</secondary></indexterm>
+Other Monitoring Options</title>
+    <para>A variety of standard tools are available publicly including the following:<itemizedlist>
+        <listitem>
+          <para><literal>lltop</literal> - Lustre load monitor with batch scheduler integration.
+              <link xmlns:xlink="http://www.w3.org/1999/xlink"
+              xlink:href="https://github.com/jhammond/lltop"
+              >https://github.com/jhammond/lltop</link></para>
+        </listitem>
+        <listitem>
+          <para><literal>tacc_stats</literal> - A job-oriented system monitor, analyzation, and
+            visualization tool that probes Lustre interfaces and collects statistics. <link
+              xmlns:xlink="http://www.w3.org/1999/xlink"
+              xlink:href="https://github.com/jhammond/tacc_stats"/></para>
+        </listitem>
+        <listitem>
+          <para><literal>xltop</literal> - A continuous Lustre monitor with batch scheduler
+            integration. <link xmlns:xlink="http://www.w3.org/1999/xlink"
+              xlink:href="https://github.com/jhammond/xltop"/></para>
+        </listitem>
+      </itemizedlist></para>
+    <para>Another option is to script a simple monitoring solution that looks at various reports
+      from <literal>ipconfig</literal>, as well as the <literal>procfs</literal> files generated by
+      the Lustre software.</para>
+  </section>
 </chapter>
+<!--
+  vim:expandtab:shiftwidth=2:tabstop=8:
+  -->