LustreMonitoring.xml

   1 <?xml version='1.0' encoding='UTF-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3  xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4  xml:id="lustremonitoring">
   5   <title xml:id="lustremonitoring.title">Monitoring a Lustre File System</title>
   6   <para>This chapter provides information on monitoring a Lustre file system and includes the
   7     following sections:</para>
   8   <itemizedlist>
   9     <listitem>
  10       <para><xref linkend="dbdoclet.50438273_18711"/>Lustre Changelogs</para>
  11     </listitem>
  12     <listitem>
  13       <para><xref linkend="dbdoclet.jobstats"/>Lustre Jobstats</para>
  14     </listitem>
  15     <listitem>
  16       <para><xref linkend="dbdoclet.50438273_81684"/>Lustre Monitoring Tool</para>
  17     </listitem>
  18     <listitem>
  19       <para><xref linkend="dbdoclet.50438273_80593"/>CollectL</para>
  20     </listitem>
  21     <listitem>
  22       <para><xref linkend="dbdoclet.50438273_44185"/>Other Monitoring Options</para>
  23     </listitem>
  24   </itemizedlist>
  25   <section xml:id="dbdoclet.50438273_18711">
  26       <title><indexterm><primary>change logs</primary><see>monitoring</see></indexterm>
  27 <indexterm><primary>monitoring</primary></indexterm>
  28 <indexterm><primary>monitoring</primary><secondary>change logs</secondary></indexterm>
  29
  30 Lustre Changelogs</title>
  31     <para>The changelogs feature records events that change the file system
  32     namespace or file metadata. Changes such as file creation, deletion,
  33     renaming, attribute changes, etc. are recorded with the target and parent
  34     file identifiers (FIDs), the name of the target, a timestamp, and user
  35     information. These records can be used for a variety of purposes:</para>
  36     <itemizedlist>
  37       <listitem>
  38         <para>Capture recent changes to feed into an archiving system.</para>
  39       </listitem>
  40       <listitem>
  41         <para>Use changelog entries to exactly replicate changes in a file
  42         system mirror.</para>
  43       </listitem>
  44       <listitem>
  45         <para>Set up &quot;watch scripts&quot; that take action on certain
  46         events or directories.</para>
  47       </listitem>
  48       <listitem>
  49         <para>Audit activity on Lustre, thanks to user information associated to
  50         file/directory changes with timestamps.</para>
  51       </listitem>
  52     </itemizedlist>
  53     <para>Changelogs record types are:</para>
  54     <informaltable frame="all">
  55       <tgroup cols="2">
  56         <colspec colname="c1" colwidth="50*"/>
  57         <colspec colname="c2" colwidth="50*"/>
  58         <thead>
  59           <row>
  60             <entry>
  61               <para><emphasis role="bold">Value</emphasis></para>
  62             </entry>
  63             <entry>
  64               <para><emphasis role="bold">Description</emphasis></para>
  65             </entry>
  66           </row>
  67         </thead>
  68         <tbody>
  69           <row>
  70             <entry>
  71               <para> MARK</para>
  72             </entry>
  73             <entry>
  74               <para> Internal recordkeeping</para>
  75             </entry>
  76           </row>
  77           <row>
  78             <entry>
  79               <para> CREAT</para>
  80             </entry>
  81             <entry>
  82               <para> Regular file creation</para>
  83             </entry>
  84           </row>
  85           <row>
  86             <entry>
  87               <para> MKDIR</para>
  88             </entry>
  89             <entry>
  90               <para> Directory creation</para>
  91             </entry>
  92           </row>
  93           <row>
  94             <entry>
  95               <para> HLINK</para>
  96             </entry>
  97             <entry>
  98               <para> Hard link</para>
  99             </entry>
 100           </row>
 101           <row>
 102             <entry>
 103               <para> SLINK</para>
 104             </entry>
 105             <entry>
 106               <para> Soft link</para>
 107             </entry>
 108           </row>
 109           <row>
 110             <entry>
 111               <para> MKNOD</para>
 112             </entry>
 113             <entry>
 114               <para> Other file creation</para>
 115             </entry>
 116           </row>
 117           <row>
 118             <entry>
 119               <para> UNLNK</para>
 120             </entry>
 121             <entry>
 122               <para> Regular file removal</para>
 123             </entry>
 124           </row>
 125           <row>
 126             <entry>
 127               <para> RMDIR</para>
 128             </entry>
 129             <entry>
 130               <para> Directory removal</para>
 131             </entry>
 132           </row>
 133           <row>
 134             <entry>
 135               <para> RENME</para>
 136             </entry>
 137             <entry>
 138               <para> Rename, original</para>
 139             </entry>
 140           </row>
 141           <row>
 142             <entry>
 143               <para> RNMTO</para>
 144             </entry>
 145             <entry>
 146               <para> Rename, final</para>
 147             </entry>
 148           </row>
 149           <row>
 150             <entry>
 151               <para> OPEN *</para>
 152             </entry>
 153             <entry>
 154               <para> Open</para>
 155             </entry>
 156           </row>
 157           <row>
 158             <entry>
 159               <para> CLOSE</para>
 160             </entry>
 161             <entry>
 162               <para> Close</para>
 163             </entry>
 164           </row>
 165           <row>
 166             <entry>
 167               <para> LYOUT</para>
 168             </entry>
 169             <entry>
 170               <para> Layout change</para>
 171             </entry>
 172           </row>
 173           <row>
 174             <entry>
 175               <para> TRUNC</para>
 176             </entry>
 177             <entry>
 178               <para> Regular file truncated</para>
 179             </entry>
 180           </row>
 181           <row>
 182             <entry>
 183               <para> SATTR</para>
 184             </entry>
 185             <entry>
 186               <para> Attribute change</para>
 187             </entry>
 188           </row>
 189           <row>
 190             <entry>
 191               <para> XATTR</para>
 192             </entry>
 193             <entry>
 194               <para> Extended attribute change (setxattr)</para>
 195             </entry>
 196           </row>
 197           <row>
 198             <entry>
 199               <para> HSM</para>
 200             </entry>
 201             <entry>
 202               <para> HSM specific event</para>
 203             </entry>
 204           </row>
 205           <row>
 206             <entry>
 207               <para> MTIME</para>
 208             </entry>
 209             <entry>
 210               <para> MTIME change</para>
 211             </entry>
 212           </row>
 213           <row>
 214             <entry>
 215               <para> CTIME</para>
 216             </entry>
 217             <entry>
 218               <para> CTIME change</para>
 219             </entry>
 220           </row>
 221           <row>
 222             <entry>
 223               <para> ATIME *</para>
 224             </entry>
 225             <entry>
 226               <para> ATIME change</para>
 227             </entry>
 228           </row>
 229           <row>
 230             <entry>
 231               <para> MIGRT</para>
 232             </entry>
 233             <entry>
 234               <para> Migration event</para>
 235             </entry>
 236           </row>
 237           <row>
 238             <entry>
 239               <para> FLRW</para>
 240             </entry>
 241             <entry>
 242               <para> File Level Replication: file initially written</para>
 243             </entry>
 244           </row>
 245           <row>
 246             <entry>
 247               <para> RESYNC</para>
 248             </entry>
 249             <entry>
 250               <para> File Level Replication: file re-synced</para>
 251             </entry>
 252           </row>
 253           <row>
 254             <entry>
 255               <para> GXATR *</para>
 256             </entry>
 257             <entry>
 258               <para> Extended attribute access (getxattr)</para>
 259             </entry>
 260           </row>
 261           <row>
 262             <entry>
 263               <para> NOPEN *</para>
 264             </entry>
 265             <entry>
 266               <para> Denied open</para>
 267             </entry>
 268           </row>
 269         </tbody>
 270       </tgroup>
 271     </informaltable>
 272     <note><para>Event types marked with * are not recorded by default. Refer to
 273     <xref linkend="dbdoclet.modifyChangelogMask" /> for instructions on
 274     modifying the Changelogs mask.</para></note>
 275     <para>FID-to-full-pathname and pathname-to-FID functions are also included
 276     to map target and parent FIDs into the file system namespace.</para>
 277     <section remap="h3">
 278       <title><indexterm><primary>monitoring</primary><secondary>change logs
 279     </secondary></indexterm>
 280 Working with Changelogs</title>
 281       <para>Several commands are available to work with changelogs.</para>
 282       <section remap="h5">
 283         <title>
 284           <literal>lctl changelog_register</literal>
 285         </title>
 286         <para>Because changelog records take up space on the MDT, the system
 287         administration must register changelog users. As soon as a changelog
 288         user is registered, the Changelogs feature is enabled. The registrants
 289         specify which records they are &quot;done with&quot;, and the system
 290         purges up to the greatest common record.</para>
 291         <para>To register a new changelog user, run:</para>
 292         <screen>mds# lctl --device <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> changelog_register
 293 </screen>
 294         <para>Changelog entries are not purged beyond a registered user&apos;s
 295         set point (see <literal>lfs changelog_clear</literal>).</para>
 296       </section>
 297       <section remap="h5">
 298         <title>
 299           <literal>lfs changelog</literal>
 300         </title>
 301         <para>To display the metadata changes on an MDT (the changelog records),
 302         run:</para>
 303         <screen>lfs changelog <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> [startrec [endrec]] </screen>
 304         <para>It is optional whether to specify the start and end
 305         records.</para>
 306         <para>These are sample changelog records:</para>
 307         <screen>1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \
 308 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 309 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \
 310 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 311 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \
 312 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 313 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \
 314 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics </screen>
 315       </section>
 316       <section remap="h5">
 317         <title>
 318           <literal>lfs changelog_clear</literal>
 319         </title>
 320         <para>To clear old changelog records for a specific user (records that
 321         the user no longer needs), run:</para>
 322         <screen>lfs changelog_clear <replaceable>mdt_name</replaceable> <replaceable>userid</replaceable> <replaceable>endrec</replaceable></screen>
 323         <para>The <literal>changelog_clear</literal> command indicates that
 324         changelog records previous to <replaceable>endrec</replaceable> are no
 325         longer of interest to a particular user
 326         <replaceable>userid</replaceable>, potentially allowing the MDT to free
 327         up disk space. An <literal><replaceable>endrec</replaceable></literal>
 328         value of 0 indicates the current last record. To run
 329         <literal>changelog_clear</literal>, the changelog user must be
 330         registered on the MDT node using <literal>lctl</literal>.</para>
 331         <para>When all changelog users are done with records &lt; X, the records
 332         are deleted.</para>
 333       </section>
 334       <section remap="h5">
 335         <title>
 336           <literal>lctl changelog_deregister</literal>
 337         </title>
 338         <para>To deregister (unregister) a changelog user, run:</para>
 339         <screen>mds# lctl --device <replaceable>mdt_device</replaceable> changelog_deregister <replaceable>userid</replaceable>       </screen>
 340         <para> <literal>changelog_deregister cl1</literal> effectively does a
 341         <literal>lfs changelog_clear cl1 0</literal> as it deregisters.</para>
 342       </section>
 343     </section>
 344     <section remap="h3">
 345       <title>Changelog Examples</title>
 346       <para>This section provides examples of different changelog
 347       commands.</para>
 348       <section remap="h5">
 349         <title>Registering a Changelog User</title>
 350         <para>To register a new changelog user for a device
 351         (<literal>lustre-MDT0000</literal>):</para>
 352         <screen>mds# lctl --device lustre-MDT0000 changelog_register
 353 lustre-MDT0000: Registered changelog userid &apos;cl1&apos;</screen>
 354       </section>
 355       <section remap="h5">
 356         <title>Displaying Changelog Records</title>
 357         <para>To display changelog records on an MDT
 358         (<literal>lustre-MDT0000</literal>):</para>
 359         <screen>$ lfs changelog lustre-MDT0000
 360 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \
 361 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 362 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
 363 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 364 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \
 365 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 366 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
 367 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics</screen>
 368         <para>Changelog records include this information:</para>
 369         <screen>rec#
 370 operation_type(numerical/text)
 371 timestamp
 372 datestamp
 373 flags
 374 t=target_FID
 375 ef=extended_flags
 376 u=uid:gid
 377 nid=client_NID
 378 p=parent_FID
 379 target_name</screen>
 380         <para>Displayed in this format:</para>
 381         <screen>rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
 382 ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name</screen>
 383         <para>For example:</para>
 384         <screen>2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
 385 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg</screen>
 386       </section>
 387       <section remap="h5">
 388         <title>Clearing Changelog Records</title>
 389         <para>To notify a device that a specific user (<literal>cl1</literal>)
 390         no longer needs records (up to and including 3):</para>
 391         <screen>$ lfs changelog_clear  lustre-MDT0000 cl1 3</screen>
 392         <para>To confirm that the <literal>changelog_clear</literal> operation
 393         was successful, run <literal>lfs changelog</literal>; only records after
 394         id-3 are listed:</para>
 395         <screen>$ lfs changelog lustre-MDT0000
 396 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
 397 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics</screen>
 398       </section>
 399       <section remap="h5">
 400         <title>Deregistering a Changelog User</title>
 401         <para>To deregister a changelog user (<literal>cl1</literal>) for a
 402         specific device (<literal>lustre-MDT0000</literal>):</para>
 403         <screen>mds# lctl --device lustre-MDT0000 changelog_deregister cl1
 404 lustre-MDT0000: Deregistered changelog user &apos;cl1&apos;</screen>
 405         <para>The deregistration operation clears all changelog records for the
 406         specified user (<literal>cl1</literal>).</para>
 407         <screen>$ lfs changelog lustre-MDT0000
 408 5 00MARK  15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \
 409 u=500:500 nid=0@&lt;0:0&gt; p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0
 410 </screen>
 411         <note>
 412           <para>MARK records typically indicate changelog recording status
 413           changes.</para>
 414         </note>
 415       </section>
 416       <section remap="h5">
 417         <title>Displaying the Changelog Index and Registered Users</title>
 418         <para>To display the current, maximum changelog index and registered
 419         changelog users for a specific device
 420         (<literal>lustre-MDT0000</literal>):</para>
 421         <screen>mds# lctl get_param  mdd.lustre-MDT0000.changelog_users
 422 mdd.lustre-MDT0000.changelog_users=current index: 8
 423 ID    index (idle seconds)
 424 cl2   8 (180)
 425 </screen>
 426       </section>
 427       <section remap="h5">
 428         <title>Displaying the Changelog Mask</title>
 429         <para>To show the current changelog mask on a specific device
 430         (<literal>lustre-MDT0000</literal>):</para>
 431         <screen>mds# lctl get_param  mdd.lustre-MDT0000.changelog_mask
 432
 433 mdd.lustre-MDT0000.changelog_mask=
 434 MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \
 435 TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
 436 </screen>
 437       </section>
 438       <section xml:id="dbdoclet.modifyChangelogMask" remap="h5">
 439         <title>Setting the Changelog Mask</title>
 440         <para>To set the current changelog mask on a specific device
 441         (<literal>lustre-MDT0000</literal>):</para>
 442         <screen>mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK
 443 mdd.lustre-MDT0000.changelog_mask=HLINK
 444 $ lfs changelog_clear lustre-MDT0000 cl1 0
 445 $ mkdir /mnt/lustre/mydir/foo
 446 $ cp /etc/hosts /mnt/lustre/mydir/foo/file
 447 $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink
 448 </screen>
 449         <para>Only item types that are in the mask show up in the
 450         changelog.</para>
 451         <screen>$ lfs changelog lustre-MDT0000
 452 9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \
 453 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink
 454 </screen>
 455         <para></para>
 456       </section>
 457     </section>
 458     <section remap="h3" condition='l2B'>
 459       <title><indexterm><primary>audit</primary>
 460       <secondary>change logs</secondary></indexterm>
 461 Audit with Changelogs</title>
 462       <para>A specific use case for Lustre Changelogs is audit. According to a
 463       definition found on <link xmlns:xlink="http://www.w3.org/1999/xlink"
 464       xlink:href="https://en.wikipedia.org/wiki/Information_technology_audit">
 465       Wikipedia</link>, information technology audits are used to evaluate the
 466       organization's ability to protect its information assets and to properly
 467       dispense information to authorized parties. Basically, audit consists in
 468       controlling that all data accesses made were done according to the access
 469       control policy in place. And usually, this is done by analyzing access
 470       logs.</para>
 471       <para>Audit can be used as a proof of security in place. But Audit can
 472       also be a requirement to comply with regulations.</para>
 473       <para>Lustre Changelogs are a good mechanism for audit, because this is a
 474       centralized facility, and it is designed to be transactional. Changelog
 475       records contain all information necessary for auditing purposes:</para>
 476       <itemizedlist>
 477         <listitem>
 478           <para>ability to identify object of action thanks to file identifiers
 479           (FIDs) and name of targets</para>
 480         </listitem>
 481         <listitem>
 482           <para>ability to identify subject of action thanks to UID/GID and NID
 483           information</para>
 484         </listitem>
 485         <listitem>
 486           <para>ability to identify time of action thanks to timestamp</para>
 487         </listitem>
 488       </itemizedlist>
 489       <section remap="h5">
 490         <title>Enabling Audit</title>
 491         <para>To have a fully functional Changelogs-based audit facility, some
 492         additional Changelog record types must be enabled, to be able to record
 493         events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that
 494         enabling these record types may have some performance impact. For
 495         instance, recording OPEN and GETXATTR events generate writes in the
 496         Changelog records for a read operation from a file-system
 497         standpoint.</para>
 498         <para>Being able to record events such as OPEN or DENIED OPEN is
 499         important from an audit perspective. For instance, if Lustre file system
 500         is used to store medical records on a system dedicated to Life Sciences,
 501         data privacy is crucial. Administrators may need to know which doctors
 502         accessed, or tried to access, a given medical record and when. And
 503         conversely, they might need to know which medical records a given doctor
 504         accessed.</para>
 505         <para>To enable all changelog entry types, do:</para>
 506         <screen>mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL
 507 mdd.seb-MDT0000.changelog_mask=ALL</screen>
 508         <para>Once all required record types have been enabled, just register a
 509         Changelogs user and the audit facility is operational.</para>
 510         <para>Note that, however, it is possible to control which Lustre client
 511         nodes can trigger the recording of file system access events to the
 512         Changelogs, thanks to the <literal>audit_mode</literal> flag on nodemap
 513         entries. The reason to disable audit on a per-nodemap basis is to
 514         prevent some nodes (e.g. backup, HSM agent nodes) from flooding the
 515         audit logs. When <literal>audit_mode</literal> flag is
 516         set to 1 on a nodemap entry, a client pertaining to this nodemap will be
 517         able to record file system access events to the Changelogs, if
 518         Changelogs are otherwise activated. When set to 0, events are not logged
 519         into the Changelogs, no matter if Changelogs are activated or not. By
 520         default, <literal>audit_mode</literal> flag is set to 1 in newly created
 521         nodemap entries. And it is also set to 1 in 'default' nodemap.</para>
 522         <para>To prevent nodes pertaining to a nodemap to generate Changelog
 523         entries, do:</para>
 524         <screen>
 525 mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0</screen>
 526       </section>
 527       <section remap="h5">
 528         <title>Audit examples</title>
 529         <section remap="h5">
 530           <title>
 531             <literal>OPEN</literal>
 532           </title>
 533           <para>An OPEN changelog entry is in the form:</para>
 534           <screen>
 535 7 10OPEN  13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \
 536 ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w-</screen>
 537           <para>It includes information about the open mode, in the form
 538           m=rwx.</para>
 539           <para>OPEN entries are recorded only once per UID/GID, for a given
 540           open mode, as long as the file is not closed by this UID/GID. It
 541           avoids flooding the Changelogs for instance if there is an MPI job
 542           opening the same file thousands of times from different threads. It
 543           reduces the ChangeLog load significantly, without significantly
 544           affecting the audit information. Similarly, only the last CLOSE per
 545           UID/GID is recorded.</para>
 546         </section>
 547         <section remap="h5">
 548           <title>
 549             <literal>GETXATTR</literal>
 550           </title>
 551           <para>A GETXATTR changelog entry is in the form:</para>
 552           <screen>
 553 8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \
 554 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0</screen>
 555           <para>It includes information about the name of the extended attribute
 556           being accessed, in the form <literal>x=&lt;xattr name&gt;</literal>.
 557           </para>
 558         </section>
 559         <section remap="h5">
 560           <title>
 561             <literal>SETXATTR</literal>
 562           </title>
 563           <para>A SETXATTR changelog entry is in the form:</para>
 564           <screen>
 565 4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \
 566 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0</screen>
 567           <para>It includes information about the name of the extended attribute
 568           being modified, in the form <literal>x=&lt;xattr name&gt;</literal>.
 569           </para>
 570         </section>
 571         <section remap="h5">
 572           <title>
 573             <literal>DENIED OPEN</literal>
 574           </title>
 575           <para>A DENIED OPEN changelog entry is in the form:</para>
 576           <screen>
 577 4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \
 578 ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w-</screen>
 579           <para>It has the same information as a regular OPEN entry. In order to
 580           avoid flooding the Changelogs, DENIED OPEN entries are rate limited:
 581           no more than one entry per user per file per time interval, this time
 582           interval (in seconds) being configurable via
 583           <literal>mdd.&lt;mdtname&gt;.changelog_deniednext</literal>
 584           (default value is 60 seconds).</para>
 585           <screen>
 586 mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120
 587 mdd.seb-MDT0000.changelog_deniednext=120
 588 mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext
 589 mdd.seb-MDT0000.changelog_deniednext=120</screen>
 590         </section>
 591       </section>
 592     </section>
 593   </section>
 594   <section xml:id="dbdoclet.jobstats">
 595       <title><indexterm><primary>jobstats</primary><see>monitoring</see></indexterm>
 596 <indexterm><primary>monitoring</primary></indexterm>
 597 <indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 598
 599 Lustre Jobstats</title>
 600     <para>The Lustre jobstats feature collects file system operation statistics
 601       for user processes running on Lustre clients, and exposes on the server
 602       using the unique Job Identifier (JobID) provided by the job scheduler for
 603       each job. Job schedulers known to be able to work with jobstats include:
 604       SLURM, SGE, LSF, Loadleveler, PBS and Maui/MOAB.</para>
 605     <para>Since jobstats is implemented in a scheduler-agnostic manner, it is
 606     likely that it will be able to work with other schedulers also, and also
 607     in environments that do not use a job scheduler, by storing custom format
 608     strings in the <literal>jobid_name</literal>.</para>
 609     <section remap="h3">
 610       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 611       How Jobstats Works</title>
 612       <para>The Lustre jobstats code on the client extracts the unique JobID
 613       from an environment variable within the user process, and sends this
 614       JobID to the server with the I/O operation.  The server tracks
 615       statistics for operations whose JobID is given, indexed by that
 616       ID.</para>
 617
 618       <para>A Lustre setting on the client, <literal>jobid_var</literal>,
 619       specifies which environment variable to holds the JobID for that process
 620       Any environment variable can be specified.  For example, SLURM sets the
 621       <literal>SLURM_JOB_ID</literal> environment variable with the unique
 622       job ID on each client when the job is first launched on a node, and
 623       the <literal>SLURM_JOB_ID</literal> will be inherited by all child
 624       processes started below that process.</para>
 625
 626       <para>Lustre can be configured to generate a synthetic JobID from
 627       the client's process name and numeric UID, by setting
 628       <literal>jobid_var=procname_uid</literal>.  This will generate a
 629       uniform JobID when running the same binary across multiple client
 630       nodes, but cannot distinguish whether the binary is part of a single
 631       distributed process or multiple independent processes.
 632       </para>
 633
 634       <para condition="l28">In Lustre 2.8 and later it is possible to set
 635       <literal>jobid_var=nodelocal</literal> and then also set
 636       <literal>jobid_name=</literal><replaceable>name</replaceable>, which
 637       <emphasis>all</emphasis> processes on that client node will use.  This
 638       is useful if only a single job is run on a client at one time, but if
 639       multiple jobs are run on a client concurrently, the per-session JobID
 640       should be used.
 641       </para>
 642
 643       <para condition="l2C">In Lustre 2.12 and later, it is possible to
 644       specify more complex JobID values for <literal>jobid_name</literal>
 645       by using a string that contains format codes that are evaluated for
 646       each process, in order to generate a site- or node-specific JobID string.
 647       </para>
 648       <itemizedlist>
 649         <listitem>
 650           <para><emphasis>%e</emphasis> print executable name</para>
 651         </listitem>
 652         <listitem>
 653           <para><emphasis>%g</emphasis> print group ID number</para>
 654         </listitem>
 655         <listitem>
 656           <para><emphasis>%h</emphasis> print hostname</para>
 657         </listitem>
 658         <listitem>
 659           <para><emphasis>%j</emphasis> print JobID from process environment
 660           variable named by the <emphasis>jobid_var</emphasis> parameter
 661           </para>
 662         </listitem>
 663         <listitem>
 664           <para><emphasis>%p</emphasis> print numeric process ID</para>
 665         </listitem>
 666         <listitem>
 667           <para><emphasis>%u</emphasis> print user ID number</para>
 668         </listitem>
 669       </itemizedlist>
 670
 671       <para condition="l2D">In Lustre 2.13 and later, it is possible to
 672       set a per-session JobID by setting the
 673       <literal>jobid_this_session</literal> parameter.  This will be
 674       inherited by all processes that are started in this login session,
 675       but there can be a different JobID for each login session.
 676       </para>
 677
 678       <para>The setting of <literal>jobid_var</literal> need not be the same
 679       on all clients.  For example, one could use
 680       <literal>SLURM_JOB_ID</literal> on all clients managed by SLURM, and
 681       use <literal>procname_uid</literal> on clients not managed by SLURM,
 682       such as interactive login nodes.</para>
 683
 684       <para>It is not possible to have different
 685       <literal>jobid_var</literal> settings on a single node, since it is
 686       unlikely that multiple job schedulers are active on one client.
 687       However, the actual JobID value is local to each process environment
 688       and it is possible for multiple jobs with different JobIDs to be
 689       active on a single client at one time.</para>
 690     </section>
 691
 692     <section remap="h3">
 693       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 694 Enable/Disable Jobstats</title>
 695       <para>Jobstats are disabled by default.  The current state of jobstats
 696       can be verified by checking <literal>lctl get_param jobid_var</literal>
 697       on a client:</para>
 698       <screen>
 699 $ lctl get_param jobid_var
 700 jobid_var=disable
 701       </screen>
 702       <para>
 703       To enable jobstats on the <literal>testfs</literal> file system with SLURM:</para>
 704       <screen># lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID</screen>
 705       <para>The <literal>lctl conf_param</literal> command to enable or disable
 706       jobstats should be run on the MGS as root. The change is persistent, and
 707       will be propagated to the MDS, OSS, and client nodes automatically when
 708       it is set on the MGS and for each new client mount.</para>
 709       <para>To temporarily enable jobstats on a client, or to use a different
 710       jobid_var on a subset of nodes, such as nodes in a remote cluster that
 711       use a different job scheduler, or interactive login nodes that do not
 712       use a job scheduler at all, run the <literal>lctl set_param</literal>
 713       command directly on the client node(s) after the filesystem is mounted.
 714       For example, to enable the <literal>procname_uid</literal> synthetic
 715       JobID on a login node run:
 716       <screen># lctl set_param jobid_var=procname_uid</screen>
 717       The <literal>lctl set_param</literal> setting is not persistent, and will
 718       be reset if the global <literal>jobid_var</literal> is set on the MGS or
 719       if the filesystem is unmounted.</para>
 720       <para>The following table shows the environment variables which are set
 721       by various job schedulers.  Set <literal>jobid_var</literal> to the value
 722       for your job scheduler to collect statistics on a per job basis.</para>
 723     <informaltable frame="all">
 724       <tgroup cols="2">
 725         <colspec colname="c1" colwidth="50*"/>
 726         <colspec colname="c2" colwidth="50*"/>
 727         <thead>
 728           <row>
 729             <entry>
 730               <para><emphasis role="bold">Job Scheduler</emphasis></para>
 731             </entry>
 732             <entry>
 733               <para><emphasis role="bold">Environment Variable</emphasis></para>
 734             </entry>
 735           </row>
 736         </thead>
 737         <tbody>
 738           <row>
 739             <entry>
 740               <para>Simple Linux Utility for Resource Management (SLURM)</para>
 741             </entry>
 742             <entry>
 743               <para>SLURM_JOB_ID</para>
 744             </entry>
 745           </row>
 746           <row>
 747             <entry>
 748               <para>Sun Grid Engine (SGE)</para>
 749             </entry>
 750             <entry>
 751               <para>JOB_ID</para>
 752             </entry>
 753           </row>
 754           <row>
 755             <entry>
 756               <para>Load Sharing Facility (LSF)</para>
 757             </entry>
 758             <entry>
 759               <para>LSB_JOBID</para>
 760             </entry>
 761           </row>
 762           <row>
 763             <entry>
 764               <para>Loadleveler</para>
 765             </entry>
 766             <entry>
 767               <para>LOADL_STEP_ID</para>
 768             </entry>
 769           </row>
 770           <row>
 771             <entry>
 772               <para>Portable Batch Scheduler (PBS)/MAUI</para>
 773             </entry>
 774             <entry>
 775               <para>PBS_JOBID</para>
 776             </entry>
 777           </row>
 778           <row>
 779             <entry>
 780               <para>Cray Application Level Placement Scheduler (ALPS)</para>
 781             </entry>
 782             <entry>
 783               <para>ALPS_APP_ID</para>
 784             </entry>
 785           </row>
 786         </tbody>
 787       </tgroup>
 788     </informaltable>
 789     <para>There are two special values for <literal>jobid_var</literal>:
 790     <literal>disable</literal> and <literal>procname_uid</literal>. To disable
 791     jobstats, specify <literal>jobid_var</literal> as <literal>disable</literal>:</para>
 792     <screen># lctl conf_param testfs.sys.jobid_var=disable</screen>
 793     <para>To track job stats per process name and user ID (for debugging, or
 794     if no job scheduler is in use on some nodes such as login nodes), specify
 795     <literal>jobid_var</literal> as <literal>procname_uid</literal>:</para>
 796     <screen># lctl conf_param testfs.sys.jobid_var=procname_uid</screen>
 797     </section>
 798     <section remap="h3">
 799       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 800 Check Job Stats</title>
 801     <para>Metadata operation statistics are collected on MDTs. These statistics can be accessed for
 802         all file systems and all jobs on the MDT via the <literal>lctl get_param
 803           mdt.*.job_stats</literal>. For example, clients running with
 804           <literal>jobid_var=procname_uid</literal>:</para>
 805     <screen>
 806 # lctl get_param mdt.*.job_stats
 807 job_stats:
 808 - job_id:          bash.0
 809   snapshot_time:   1352084992
 810   open:            { samples:     2, unit:  reqs }
 811   close:           { samples:     2, unit:  reqs }
 812   mknod:           { samples:     0, unit:  reqs }
 813   link:            { samples:     0, unit:  reqs }
 814   unlink:          { samples:     0, unit:  reqs }
 815   mkdir:           { samples:     0, unit:  reqs }
 816   rmdir:           { samples:     0, unit:  reqs }
 817   rename:          { samples:     0, unit:  reqs }
 818   getattr:         { samples:     3, unit:  reqs }
 819   setattr:         { samples:     0, unit:  reqs }
 820   getxattr:        { samples:     0, unit:  reqs }
 821   setxattr:        { samples:     0, unit:  reqs }
 822   statfs:          { samples:     0, unit:  reqs }
 823   sync:            { samples:     0, unit:  reqs }
 824   samedir_rename:  { samples:     0, unit:  reqs }
 825   crossdir_rename: { samples:     0, unit:  reqs }
 826 - job_id:          mythbackend.0
 827   snapshot_time:   1352084996
 828   open:            { samples:    72, unit:  reqs }
 829   close:           { samples:    73, unit:  reqs }
 830   mknod:           { samples:     0, unit:  reqs }
 831   link:            { samples:     0, unit:  reqs }
 832   unlink:          { samples:    22, unit:  reqs }
 833   mkdir:           { samples:     0, unit:  reqs }
 834   rmdir:           { samples:     0, unit:  reqs }
 835   rename:          { samples:     0, unit:  reqs }
 836   getattr:         { samples:   778, unit:  reqs }
 837   setattr:         { samples:    22, unit:  reqs }
 838   getxattr:        { samples:     0, unit:  reqs }
 839   setxattr:        { samples:     0, unit:  reqs }
 840   statfs:          { samples: 19840, unit:  reqs }
 841   sync:            { samples: 33190, unit:  reqs }
 842   samedir_rename:  { samples:     0, unit:  reqs }
 843   crossdir_rename: { samples:     0, unit:  reqs }
 844     </screen>
 845     <para>Data operation statistics are collected on OSTs. Data operations
 846     statistics can be accessed via
 847     <literal>lctl get_param obdfilter.*.job_stats</literal>, for example:</para>
 848     <screen>
 849 $ lctl get_param obdfilter.*.job_stats
 850 obdfilter.myth-OST0000.job_stats=
 851 job_stats:
 852 - job_id:          mythcommflag.0
 853   snapshot_time:   1429714922
 854   read:    { samples: 974, unit: bytes, min: 4096, max: 1048576, sum: 91530035 }
 855   write:   { samples:   0, unit: bytes, min:    0, max:       0, sum:        0 }
 856   setattr: { samples:   0, unit:  reqs }
 857   punch:   { samples:   0, unit:  reqs }
 858   sync:    { samples:   0, unit:  reqs }
 859 obdfilter.myth-OST0001.job_stats=
 860 job_stats:
 861 - job_id:          mythbackend.0
 862   snapshot_time:   1429715270
 863   read:    { samples:   0, unit: bytes, min:     0, max:      0, sum:        0 }
 864   write:   { samples:   1, unit: bytes, min: 96899, max:  96899, sum:    96899 }
 865   setattr: { samples:   0, unit:  reqs }
 866   punch:   { samples:   1, unit:  reqs }
 867   sync:    { samples:   0, unit:  reqs }
 868 obdfilter.myth-OST0002.job_stats=job_stats:
 869 obdfilter.myth-OST0003.job_stats=job_stats:
 870 obdfilter.myth-OST0004.job_stats=
 871 job_stats:
 872 - job_id:          mythfrontend.500
 873   snapshot_time:   1429692083
 874   read:    { samples:   9, unit: bytes, min: 16384, max: 1048576, sum: 4444160 }
 875   write:   { samples:   0, unit: bytes, min:     0, max:       0, sum:       0 }
 876   setattr: { samples:   0, unit:  reqs }
 877   punch:   { samples:   0, unit:  reqs }
 878   sync:    { samples:   0, unit:  reqs }
 879 - job_id:          mythbackend.500
 880   snapshot_time:   1429692129
 881   read:    { samples:   0, unit: bytes, min:     0, max:       0, sum:       0 }
 882   write:   { samples:   1, unit: bytes, min: 56231, max:   56231, sum:   56231 }
 883   setattr: { samples:   0, unit:  reqs }
 884   punch:   { samples:   1, unit:  reqs }
 885   sync:    { samples:   0, unit:  reqs }
 886     </screen>
 887     </section>
 888     <section remap="h3">
 889       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 890 Clear Job Stats</title>
 891     <para>Accumulated job statistics can be reset by writing proc file <literal>job_stats</literal>.</para>
 892     <para>Clear statistics for all jobs on the local node:</para>
 893     <screen># lctl set_param obdfilter.*.job_stats=clear</screen>
 894     <para>Clear statistics only for job 'bash.0' on lustre-MDT0000:</para>
 895     <screen># lctl set_param mdt.lustre-MDT0000.job_stats=bash.0</screen>
 896     </section>
 897     <section remap="h3">
 898       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 899 Configure Auto-cleanup Interval</title>
 900     <para>By default, if a job is inactive for 600 seconds (10 minutes) statistics for this job will be dropped. This expiration value can be changed temporarily via:</para>
 901     <screen># lctl set_param *.*.job_cleanup_interval={max_age}</screen>
 902     <para>It can also be changed permanently, for example to 700 seconds via:</para>
 903     <screen># lctl conf_param testfs.mdt.job_cleanup_interval=700</screen>
 904     <para>The <literal>job_cleanup_interval</literal> can be set as 0 to disable the auto-cleanup. Note that if auto-cleanup of Jobstats is disabled, then all statistics will be kept in memory forever, which may eventually consume all memory on the servers. In this case, any monitoring tool should explicitly clear individual job statistics as they are processed, as shown above.</para>
 905     </section>
 906   </section>
 907   <section xml:id="dbdoclet.50438273_81684">
 908     <title><indexterm>
 909         <primary>monitoring</primary>
 910         <secondary>Lustre Monitoring Tool</secondary>
 911       </indexterm> Lustre Monitoring Tool (LMT)</title>
 912     <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed system that provides a
 913         <literal>top</literal>-like display of activity on server-side nodes (MDS, OSS and portals
 914       routers) on one or more Lustre file systems. It does not provide support for monitoring
 915       clients. For more information on LMT, including the setup procedure, see:</para>
 916     <para><link xl:href="http://code.google.com/p/lmt/"
 917       >https://github.com/chaos/lmt/wiki</link></para>
 918     <para>LMT questions can be directed to:</para>
 919     <para><link xl:href="mailto:lmt-discuss@googlegroups.com">lmt-discuss@googlegroups.com</link></para>
 920   </section>
 921   <section xml:id="dbdoclet.50438273_80593">
 922     <title>
 923       <literal>CollectL</literal>
 924     </title>
 925     <para><literal>CollectL</literal> is another tool that can be used to monitor a Lustre file
 926       system. You can run <literal>CollectL</literal> on a Lustre system that has any combination of
 927       MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and
 928       played back at a later time. It can also be converted to a format suitable for
 929       plotting.</para>
 930     <para>For more information about <literal>CollectL</literal>, see:</para>
 931     <para><link xl:href="http://collectl.sourceforge.net">http://collectl.sourceforge.net</link></para>
 932     <para>Lustre-specific documentation is also available. See:</para>
 933     <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
 934   </section>
 935   <section xml:id="dbdoclet.50438273_44185">
 936     <title><indexterm><primary>monitoring</primary><secondary>additional tools</secondary></indexterm>
 937 Other Monitoring Options</title>
 938     <para>A variety of standard tools are available publicly including the following:<itemizedlist>
 939         <listitem>
 940           <para><literal>lltop</literal> - Lustre load monitor with batch scheduler integration.
 941               <link xmlns:xlink="http://www.w3.org/1999/xlink"
 942               xlink:href="https://github.com/jhammond/lltop"
 943               >https://github.com/jhammond/lltop</link></para>
 944         </listitem>
 945         <listitem>
 946           <para><literal>tacc_stats</literal> - A job-oriented system monitor, analyzation, and
 947             visualization tool that probes Lustre interfaces and collects statistics. <link
 948               xmlns:xlink="http://www.w3.org/1999/xlink"
 949               xlink:href="https://github.com/jhammond/tacc_stats"/></para>
 950         </listitem>
 951         <listitem>
 952           <para><literal>xltop</literal> - A continuous Lustre monitor with batch scheduler
 953             integration. <link xmlns:xlink="http://www.w3.org/1999/xlink"
 954               xlink:href="https://github.com/jhammond/xltop"/></para>
 955         </listitem>
 956       </itemizedlist></para>
 957     <para>Another option is to script a simple monitoring solution that looks at various reports
 958       from <literal>ipconfig</literal>, as well as the <literal>procfs</literal> files generated by
 959       the Lustre software.</para>
 960   </section>
 961 </chapter>
 962 <!--
 963   vim:expandtab:shiftwidth=2:tabstop=8:
 964   -->