LustreMonitoring.xml

   1 <?xml version='1.0' encoding='UTF-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3  xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4  xml:id="lustremonitoring">
   5   <title xml:id="lustremonitoring.title">Monitoring a Lustre File System</title>
   6   <para>This chapter provides information on monitoring a Lustre file system and includes the
   7     following sections:</para>
   8   <itemizedlist>
   9     <listitem>
  10       <para><xref linkend="lustre_changelogs"/>Lustre Changelogs</para>
  11     </listitem>
  12     <listitem>
  13       <para><xref linkend="jobstats"/>Lustre Jobstats</para>
  14     </listitem>
  15     <listitem>
  16       <para><xref linkend="lmt"/>Lustre Monitoring Tool</para>
  17     </listitem>
  18     <listitem>
  19       <para><xref linkend="collectl"/>CollectL</para>
  20     </listitem>
  21     <listitem>
  22       <para><xref linkend="other_monitoring_options"/>Other Monitoring Options</para>
  23     </listitem>
  24   </itemizedlist>
  25   <section xml:id="lustre_changelogs">
  26       <title><indexterm><primary>change logs</primary><see>monitoring</see></indexterm>
  27 <indexterm><primary>monitoring</primary></indexterm>
  28 <indexterm><primary>monitoring</primary><secondary>change logs</secondary></indexterm>
  29
  30 Lustre Changelogs</title>
  31     <para>The changelogs feature records events that change the file system
  32     namespace or file metadata. Changes such as file creation, deletion,
  33     renaming, attribute changes, etc. are recorded with the target and parent
  34     file identifiers (FIDs), the name of the target, a timestamp, and user
  35     information. These records can be used for a variety of purposes:</para>
  36     <itemizedlist>
  37       <listitem>
  38         <para>Capture recent changes to feed into an archiving system.</para>
  39       </listitem>
  40       <listitem>
  41         <para>Use changelog entries to exactly replicate changes in a file
  42         system mirror.</para>
  43       </listitem>
  44       <listitem>
  45         <para>Set up &quot;watch scripts&quot; that take action on certain
  46         events or directories.</para>
  47       </listitem>
  48       <listitem>
  49         <para>Audit activity on Lustre, thanks to user information associated to
  50         file/directory changes with timestamps.</para>
  51       </listitem>
  52     </itemizedlist>
  53     <para>Changelogs record types are:</para>
  54     <informaltable frame="all">
  55       <tgroup cols="2">
  56         <colspec colname="c1" colwidth="50*"/>
  57         <colspec colname="c2" colwidth="50*"/>
  58         <thead>
  59           <row>
  60             <entry>
  61               <para><emphasis role="bold">Value</emphasis></para>
  62             </entry>
  63             <entry>
  64               <para><emphasis role="bold">Description</emphasis></para>
  65             </entry>
  66           </row>
  67         </thead>
  68         <tbody>
  69           <row>
  70             <entry>
  71               <para> MARK</para>
  72             </entry>
  73             <entry>
  74               <para> Internal recordkeeping</para>
  75             </entry>
  76           </row>
  77           <row>
  78             <entry>
  79               <para> CREAT</para>
  80             </entry>
  81             <entry>
  82               <para> Regular file creation</para>
  83             </entry>
  84           </row>
  85           <row>
  86             <entry>
  87               <para> MKDIR</para>
  88             </entry>
  89             <entry>
  90               <para> Directory creation</para>
  91             </entry>
  92           </row>
  93           <row>
  94             <entry>
  95               <para> HLINK</para>
  96             </entry>
  97             <entry>
  98               <para> Hard link</para>
  99             </entry>
 100           </row>
 101           <row>
 102             <entry>
 103               <para> SLINK</para>
 104             </entry>
 105             <entry>
 106               <para> Soft link</para>
 107             </entry>
 108           </row>
 109           <row>
 110             <entry>
 111               <para> MKNOD</para>
 112             </entry>
 113             <entry>
 114               <para> Other file creation</para>
 115             </entry>
 116           </row>
 117           <row>
 118             <entry>
 119               <para> UNLNK</para>
 120             </entry>
 121             <entry>
 122               <para> Regular file removal</para>
 123             </entry>
 124           </row>
 125           <row>
 126             <entry>
 127               <para> RMDIR</para>
 128             </entry>
 129             <entry>
 130               <para> Directory removal</para>
 131             </entry>
 132           </row>
 133           <row>
 134             <entry>
 135               <para> RENME</para>
 136             </entry>
 137             <entry>
 138               <para> Rename, original</para>
 139             </entry>
 140           </row>
 141           <row>
 142             <entry>
 143               <para> RNMTO</para>
 144             </entry>
 145             <entry>
 146               <para> Rename, final</para>
 147             </entry>
 148           </row>
 149           <row>
 150             <entry>
 151               <para> OPEN *</para>
 152             </entry>
 153             <entry>
 154               <para> Open</para>
 155             </entry>
 156           </row>
 157           <row>
 158             <entry>
 159               <para> CLOSE</para>
 160             </entry>
 161             <entry>
 162               <para> Close</para>
 163             </entry>
 164           </row>
 165           <row>
 166             <entry>
 167               <para> LYOUT</para>
 168             </entry>
 169             <entry>
 170               <para> Layout change</para>
 171             </entry>
 172           </row>
 173           <row>
 174             <entry>
 175               <para> TRUNC</para>
 176             </entry>
 177             <entry>
 178               <para> Regular file truncated</para>
 179             </entry>
 180           </row>
 181           <row>
 182             <entry>
 183               <para> SATTR</para>
 184             </entry>
 185             <entry>
 186               <para> Attribute change</para>
 187             </entry>
 188           </row>
 189           <row>
 190             <entry>
 191               <para> XATTR</para>
 192             </entry>
 193             <entry>
 194               <para> Extended attribute change (setxattr)</para>
 195             </entry>
 196           </row>
 197           <row>
 198             <entry>
 199               <para> HSM</para>
 200             </entry>
 201             <entry>
 202               <para> HSM specific event</para>
 203             </entry>
 204           </row>
 205           <row>
 206             <entry>
 207               <para> MTIME</para>
 208             </entry>
 209             <entry>
 210               <para> MTIME change</para>
 211             </entry>
 212           </row>
 213           <row>
 214             <entry>
 215               <para> CTIME</para>
 216             </entry>
 217             <entry>
 218               <para> CTIME change</para>
 219             </entry>
 220           </row>
 221           <row>
 222             <entry>
 223               <para> ATIME *</para>
 224             </entry>
 225             <entry>
 226               <para> ATIME change</para>
 227             </entry>
 228           </row>
 229           <row>
 230             <entry>
 231               <para> MIGRT</para>
 232             </entry>
 233             <entry>
 234               <para> Migration event</para>
 235             </entry>
 236           </row>
 237           <row>
 238             <entry>
 239               <para> FLRW</para>
 240             </entry>
 241             <entry>
 242               <para> File Level Replication: file initially written</para>
 243             </entry>
 244           </row>
 245           <row>
 246             <entry>
 247               <para> RESYNC</para>
 248             </entry>
 249             <entry>
 250               <para> File Level Replication: file re-synced</para>
 251             </entry>
 252           </row>
 253           <row>
 254             <entry>
 255               <para> GXATR *</para>
 256             </entry>
 257             <entry>
 258               <para> Extended attribute access (getxattr)</para>
 259             </entry>
 260           </row>
 261           <row>
 262             <entry>
 263               <para> NOPEN *</para>
 264             </entry>
 265             <entry>
 266               <para> Denied open</para>
 267             </entry>
 268           </row>
 269         </tbody>
 270       </tgroup>
 271     </informaltable>
 272     <note><para>Event types marked with * are not recorded by default. Refer to
 273     <xref linkend="modifyChangelogMask" /> for instructions on
 274     modifying the Changelogs mask.</para></note>
 275     <para>FID-to-full-pathname and pathname-to-FID functions are also included
 276     to map target and parent FIDs into the file system namespace.</para>
 277     <section remap="h3">
 278       <title><indexterm><primary>monitoring</primary><secondary>change logs
 279     </secondary></indexterm>
 280 Working with Changelogs</title>
 281       <para>Several commands are available to work with changelogs.</para>
 282       <section remap="h5">
 283         <title>
 284           <literal>lctl changelog_register</literal>
 285         </title>
 286         <para>Because changelog records take up space on the MDT, the system
 287         administration must register changelog users. As soon as a changelog
 288         user is registered, the Changelogs feature is enabled. The registrants
 289         specify which records they are &quot;done with&quot;, and the system
 290         purges up to the greatest common record.</para>
 291         <para>To register a new changelog user, run:</para>
 292 <screen>
 293 mds# lctl --device <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> changelog_register
 294 </screen>
 295         <para>Changelog entries are not purged beyond a registered user&apos;s
 296         set point (see <literal>lfs changelog_clear</literal>).</para>
 297       </section>
 298       <section remap="h5">
 299         <title>
 300           <literal>lfs changelog</literal>
 301         </title>
 302         <para>To display the metadata changes on an MDT (the changelog records),
 303         run:</para>
 304 <screen>
 305 client# lfs changelog <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> [startrec [endrec]]
 306 </screen>
 307         <para>It is optional whether to specify the start and end
 308         records.</para>
 309         <para>These are sample changelog records:</para>
 310 <screen>
 311 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \
 312 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 313 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \
 314 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 315 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \
 316 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 317 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \
 318 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 319 </screen>
 320       </section>
 321       <section remap="h5">
 322         <title>
 323           <literal>lfs changelog_clear</literal>
 324         </title>
 325         <para>To clear old changelog records for a specific user (records that
 326         the user no longer needs), run:</para>
 327 <screen>
 328 client# lfs changelog_clear <replaceable>mdt_name</replaceable> <replaceable>userid</replaceable> <replaceable>endrec</replaceable>
 329 </screen>
 330         <para>The <literal>changelog_clear</literal> command indicates that
 331         changelog records previous to <replaceable>endrec</replaceable> are no
 332         longer of interest to a particular user
 333         <replaceable>userid</replaceable>, potentially allowing the MDT to free
 334         up disk space. An <literal><replaceable>endrec</replaceable></literal>
 335         value of 0 indicates the current last record. To run
 336         <literal>changelog_clear</literal>, the changelog user must be
 337         registered on the MDT node using <literal>lctl</literal>.</para>
 338         <para>When all changelog users are done with records &lt; X, the records
 339         are deleted.</para>
 340       </section>
 341       <section remap="h5">
 342         <title>
 343           <literal>lctl changelog_deregister</literal>
 344         </title>
 345         <para>To deregister (unregister) a changelog user, run:</para>
 346 <screen>
 347 mds# lctl --device <replaceable>mdt_device</replaceable> changelog_deregister <replaceable>userid</replaceable>
 348 </screen>
 349         <para> <literal>changelog_deregister cl1</literal> effectively does a
 350         <literal>lfs changelog_clear cl1 0</literal> as it deregisters.</para>
 351       </section>
 352     </section>
 353     <section remap="h3">
 354       <title>Changelog Examples</title>
 355       <para>This section provides examples of different changelog
 356       commands.</para>
 357       <section remap="h5">
 358         <title>Registering a Changelog User</title>
 359         <para>To register a new changelog user for a device
 360         (<literal>lustre-MDT0000</literal>):</para>
 361 <screen>
 362 mds# lctl --device lustre-MDT0000 changelog_register
 363 lustre-MDT0000: Registered changelog userid &apos;cl1&apos;
 364 </screen>
 365       </section>
 366       <section remap="h5">
 367         <title>Displaying Changelog Records</title>
 368         <para>To display changelog records for an MDT
 369         (e.g. <literal>lustre-MDT0000</literal>):</para>
 370 <screen>
 371 client# lfs changelog lustre-MDT0000
 372 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \
 373 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 374 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
 375 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 376 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \
 377 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 378 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
 379 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 380 </screen>
 381         <para>Changelog records include this information:</para>
 382 <screen>
 383 rec# operation_type(numerical/text) timestamp datestamp flags
 384 t=target_FID ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name
 385 </screen>
 386         <para>Displayed in this format:</para>
 387 <screen>
 388 rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
 389 ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name
 390 </screen>
 391         <para>For example:</para>
 392 <screen>
 393 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
 394 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
 395 </screen>
 396       </section>
 397       <section remap="h5">
 398         <title>Clearing Changelog Records</title>
 399         <para>To notify a device that a specific user (<literal>cl1</literal>)
 400         no longer needs records (up to and including 3):</para>
 401 <screen>
 402 # lfs changelog_clear  lustre-MDT0000 cl1 3
 403 </screen>
 404         <para>To confirm that the <literal>changelog_clear</literal> operation
 405         was successful, run <literal>lfs changelog</literal>; only records after
 406         id-3 are listed:</para>
 407 <screen>
 408 # lfs changelog lustre-MDT0000
 409 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
 410 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
 411 </screen>
 412       </section>
 413       <section remap="h5">
 414         <title>Deregistering a Changelog User</title>
 415         <para>To deregister a changelog user (<literal>cl1</literal>) for a
 416         specific device (<literal>lustre-MDT0000</literal>):</para>
 417 <screen>
 418 mds# lctl --device lustre-MDT0000 changelog_deregister cl1
 419 lustre-MDT0000: Deregistered changelog user &apos;cl1&apos;
 420 </screen>
 421         <para>The deregistration operation clears all changelog records for the
 422         specified user (<literal>cl1</literal>).</para>
 423 <screen>
 424 client# lfs changelog lustre-MDT0000
 425 5 00MARK  15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \
 426 u=500:500 nid=0@&lt;0:0&gt; p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0
 427 </screen>
 428         <note>
 429           <para>MARK records typically indicate changelog recording status
 430           changes.</para>
 431         </note>
 432       </section>
 433       <section remap="h5">
 434         <title>Displaying the Changelog Index and Registered Users</title>
 435         <para>To display the current, maximum changelog index and registered
 436         changelog users for a specific device
 437         (<literal>lustre-MDT0000</literal>):</para>
 438 <screen>
 439 mds# lctl get_param  mdd.lustre-MDT0000.changelog_users
 440 mdd.lustre-MDT0000.changelog_users=current index: 8
 441 ID    index (idle seconds)
 442 cl2   8 (180)
 443 </screen>
 444       </section>
 445       <section remap="h5">
 446         <title>Displaying the Changelog Mask</title>
 447         <para>To show the current changelog mask on a specific device
 448         (<literal>lustre-MDT0000</literal>):</para>
 449 <screen>
 450 mds# lctl get_param  mdd.lustre-MDT0000.changelog_mask
 451
 452 mdd.lustre-MDT0000.changelog_mask=
 453 MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \
 454 TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
 455 </screen>
 456       </section>
 457       <section xml:id="modifyChangelogMask" remap="h5">
 458         <title>Setting the Changelog Mask</title>
 459         <para>To set the current changelog mask on a specific device
 460         (<literal>lustre-MDT0000</literal>):</para>
 461 <screen>
 462 mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK
 463 mdd.lustre-MDT0000.changelog_mask=HLINK
 464 $ lfs changelog_clear lustre-MDT0000 cl1 0
 465 $ mkdir /mnt/lustre/mydir/foo
 466 $ cp /etc/hosts /mnt/lustre/mydir/foo/file
 467 $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink
 468 </screen>
 469         <para>Only item types that are in the mask show up in the
 470         changelog.</para>
 471 <screen>
 472 # lfs changelog lustre-MDT0000
 473 9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \
 474 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink
 475 </screen>
 476         <para></para>
 477       </section>
 478     </section>
 479     <section remap="h3" condition='l2B'>
 480       <title><indexterm><primary>audit</primary>
 481       <secondary>change logs</secondary></indexterm>
 482 Audit with Changelogs</title>
 483       <para>A specific use case for Lustre Changelogs is audit. According to a
 484       definition found on <link xmlns:xlink="http://www.w3.org/1999/xlink"
 485       xlink:href="https://en.wikipedia.org/wiki/Information_technology_audit">
 486       Wikipedia</link>, information technology audits are used to evaluate the
 487       organization's ability to protect its information assets and to properly
 488       dispense information to authorized parties. Basically, audit consists in
 489       controlling that all data accesses made were done according to the access
 490       control policy in place. And usually, this is done by analyzing access
 491       logs.</para>
 492       <para>Audit can be used as a proof of security in place. But Audit can
 493       also be a requirement to comply with regulations.</para>
 494       <para>Lustre Changelogs are a good mechanism for audit, because this is a
 495       centralized facility, and it is designed to be transactional. Changelog
 496       records contain all information necessary for auditing purposes:</para>
 497       <itemizedlist>
 498         <listitem>
 499           <para>ability to identify object of action thanks to file identifiers
 500           (FIDs) and name of targets</para>
 501         </listitem>
 502         <listitem>
 503           <para>ability to identify subject of action thanks to UID/GID and NID
 504           information</para>
 505         </listitem>
 506         <listitem>
 507           <para>ability to identify time of action thanks to timestamp</para>
 508         </listitem>
 509       </itemizedlist>
 510       <section remap="h5">
 511         <title>Enabling Audit</title>
 512         <para>To have a fully functional Changelogs-based audit facility, some
 513         additional Changelog record types must be enabled, to be able to record
 514         events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that
 515         enabling these record types may have some performance impact. For
 516         instance, recording OPEN and GETXATTR events generate writes in the
 517         Changelog records for a read operation from a file-system
 518         standpoint.</para>
 519         <para>Being able to record events such as OPEN or DENIED OPEN is
 520         important from an audit perspective. For instance, if Lustre file system
 521         is used to store medical records on a system dedicated to Life Sciences,
 522         data privacy is crucial. Administrators may need to know which doctors
 523         accessed, or tried to access, a given medical record and when. And
 524         conversely, they might need to know which medical records a given doctor
 525         accessed.</para>
 526         <para>To enable all changelog entry types, do:</para>
 527 <screen>
 528 mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL
 529 mdd.seb-MDT0000.changelog_mask=ALL
 530 </screen>
 531         <para>Once all required record types have been enabled, just register a
 532         Changelogs user and the audit facility is operational.</para>
 533         <para>Note that, however, it is possible to control which Lustre client
 534         nodes can trigger the recording of file system access events to the
 535         Changelogs, thanks to the <literal>audit_mode</literal> flag on nodemap
 536         entries. The reason to disable audit on a per-nodemap basis is to
 537         prevent some nodes (e.g. backup, HSM agent nodes) from flooding the
 538         audit logs. When <literal>audit_mode</literal> flag is
 539         set to 1 on a nodemap entry, a client pertaining to this nodemap will be
 540         able to record file system access events to the Changelogs, if
 541         Changelogs are otherwise activated. When set to 0, events are not logged
 542         into the Changelogs, no matter if Changelogs are activated or not. By
 543         default, <literal>audit_mode</literal> flag is set to 1 in newly created
 544         nodemap entries. And it is also set to 1 in 'default' nodemap.</para>
 545         <para>To prevent nodes pertaining to a nodemap to generate Changelog
 546         entries, do:</para>
 547 <screen>
 548 mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0
 549 </screen>
 550       </section>
 551       <section remap="h5">
 552         <title>Audit examples</title>
 553         <section remap="h5">
 554           <title>
 555             <literal>OPEN</literal>
 556           </title>
 557           <para>An OPEN changelog entry is in the form:</para>
 558 <screen>
 559 7 10OPEN  13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \
 560 ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w-
 561 </screen>
 562           <para>It includes information about the open mode, in the form
 563           m=rwx.</para>
 564           <para>OPEN entries are recorded only once per UID/GID, for a given
 565           open mode, as long as the file is not closed by this UID/GID. It
 566           avoids flooding the Changelogs for instance if there is an MPI job
 567           opening the same file thousands of times from different threads. It
 568           reduces the ChangeLog load significantly, without significantly
 569           affecting the audit information. Similarly, only the last CLOSE per
 570           UID/GID is recorded.</para>
 571         </section>
 572         <section remap="h5">
 573           <title>
 574             <literal>GETXATTR</literal>
 575           </title>
 576           <para>A GETXATTR changelog entry is in the form:</para>
 577 <screen>
 578 8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \
 579 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0
 580 </screen>
 581           <para>It includes information about the name of the extended attribute
 582           being accessed, in the form <literal>x=&lt;xattr name&gt;</literal>.
 583           </para>
 584         </section>
 585         <section remap="h5">
 586           <title>
 587             <literal>SETXATTR</literal>
 588           </title>
 589           <para>A SETXATTR changelog entry is in the form:</para>
 590 <screen>
 591 4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \
 592 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0
 593 </screen>
 594           <para>It includes information about the name of the extended attribute
 595           being modified, in the form <literal>x=&lt;xattr name&gt;</literal>.
 596           </para>
 597         </section>
 598         <section remap="h5">
 599           <title>
 600             <literal>DENIED OPEN</literal>
 601           </title>
 602           <para>A DENIED OPEN changelog entry is in the form:</para>
 603 <screen>
 604 4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \
 605 ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w-
 606 </screen>
 607           <para>It has the same information as a regular OPEN entry. In order to
 608           avoid flooding the Changelogs, DENIED OPEN entries are rate limited:
 609           no more than one entry per user per file per time interval, this time
 610           interval (in seconds) being configurable via
 611           <literal>mdd.&lt;mdtname&gt;.changelog_deniednext</literal>
 612           (default value is 60 seconds).</para>
 613 <screen>
 614 mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120
 615 mdd.seb-MDT0000.changelog_deniednext=120
 616 mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext
 617 mdd.seb-MDT0000.changelog_deniednext=120
 618 </screen>
 619         </section>
 620       </section>
 621     </section>
 622   </section>
 623   <section xml:id="jobstats">
 624       <title><indexterm><primary>jobstats</primary><see>monitoring</see></indexterm>
 625 <indexterm><primary>monitoring</primary></indexterm>
 626 <indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 627
 628 Lustre Jobstats</title>
 629     <para>The Lustre jobstats feature collects file system operation statistics
 630       for user processes running on Lustre clients, and exposes on the server
 631       using the unique Job Identifier (JobID) provided by the job scheduler for
 632       each job. Job schedulers known to be able to work with jobstats include:
 633       SLURM, SGE, LSF, Loadleveler, PBS and Maui/MOAB.</para>
 634     <para>Since jobstats is implemented in a scheduler-agnostic manner, it is
 635     likely that it will be able to work with other schedulers also, and also
 636     in environments that do not use a job scheduler, by storing custom format
 637     strings in the <literal>jobid_name</literal>.</para>
 638     <section remap="h3">
 639       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 640       How Jobstats Works</title>
 641       <para>The Lustre jobstats code on the client extracts the unique JobID
 642       from an environment variable within the user process, and sends this
 643       JobID to the server with all RPCs.  This allows the server to tracks
 644       statistics for operations specific to each application/command running
 645       on the client, and can be useful to identify the source high I/O load.
 646       </para>
 647
 648       <para>A Lustre setting on the client, <literal>jobid_var</literal>,
 649       specifies an environment variable or other client-local source that
 650       to holds a (relatively) unique the JobID for the running application.
 651       Any environment variable can be specified.  For example, SLURM sets the
 652       <literal>SLURM_JOB_ID</literal> environment variable with the unique
 653       JobID for all clients running a particular job launched on one or
 654       more nodes, and
 655       the <literal>SLURM_JOB_ID</literal> will be inherited by all child
 656       processes started below that process.</para>
 657
 658       <para>There are several reserved values for <literal>jobid_var</literal>:
 659       <itemizedlist>
 660         <listitem>
 661           <para><literal>disable</literal> - disables sending a JobID from
 662             this client</para>
 663         </listitem>
 664         <listitem>
 665           <para><literal>procname_uid</literal> - uses the process name and UID,
 666             equivalent to setting <literal>jobid_name=%e.%u</literal></para>
 667         </listitem>
 668         <listitem>
 669           <para><literal>nodelocal</literal> - use only the JobID format from
 670             <literal>jobid_name</literal></para>
 671         </listitem>
 672         <listitem>
 673           <para><literal>session</literal> - extract the JobID from
 674             <literal>jobid_this_session</literal></para>
 675         </listitem>
 676       </itemizedlist>
 677       </para>
 678
 679       <para>Lustre can also be configured to generate a synthetic JobID from
 680       the client's process name and numeric UID, by setting
 681       <literal>jobid_var=procname_uid</literal>.  This will generate a
 682       uniform JobID when running the same binary across multiple client
 683       nodes, but cannot distinguish whether the binary is part of a single
 684       distributed process or multiple independent processes.  This can be
 685       useful on login nodes where interactive commands are run.
 686       </para>
 687
 688       <para condition="l28">In Lustre 2.8 and later it is possible to set
 689       <literal>jobid_var=nodelocal</literal> and then also set
 690       <literal>jobid_name=</literal><replaceable>name</replaceable>, which
 691       <emphasis>all</emphasis> processes on that client node will use.  This
 692       is useful if only a single job is run on a client at one time, but if
 693       multiple jobs are run on a client concurrently, the
 694       <literal>session</literal> JobID should be used.
 695       </para>
 696
 697       <para condition="l2C">In Lustre 2.12 and later, it is possible to
 698       specify more complex JobID values for <literal>jobid_name</literal>
 699       by using a string that contains format codes that are evaluated for
 700       each process, in order to generate a site- or node-specific JobID string.
 701       </para>
 702       <itemizedlist>
 703         <listitem>
 704           <para><emphasis>%e</emphasis> print executable name</para>
 705         </listitem>
 706         <listitem>
 707           <para><emphasis>%g</emphasis> print group ID number</para>
 708         </listitem>
 709         <listitem>
 710           <para><emphasis>%h</emphasis> print fully-qualified hostname</para>
 711         </listitem>
 712         <listitem>
 713           <para><emphasis>%H</emphasis> print short hostname</para>
 714         </listitem>
 715         <listitem>
 716           <para><emphasis>%j</emphasis> print JobID from the source named
 717             by the <emphasis>jobid_var</emphasis> parameter
 718           </para>
 719         </listitem>
 720         <listitem>
 721           <para><emphasis>%p</emphasis> print numeric process ID</para>
 722         </listitem>
 723         <listitem>
 724           <para><emphasis>%u</emphasis> print user ID number</para>
 725         </listitem>
 726       </itemizedlist>
 727
 728       <para condition="l2D">In Lustre 2.13 and later, it is possible to
 729       set a per-session JobID via the <literal>jobid_this_session</literal>
 730       parameter <emphasis>instead</emphasis> of getting the JobID from an
 731       environment variable.  This session ID will be
 732       inherited by all processes that are started in this login session,
 733       though there can be a different JobID for each login session.  This
 734       is enabled by setting <literal>jobid_var=session</literal> instead
 735       of setting it to an environment variable.  The session ID will be
 736       substituted for <literal>%j</literal> in <literal>jobid_name</literal>.
 737       </para>
 738
 739       <para>The setting of <literal>jobid_var</literal> need not be the same
 740       on all clients.  For example, one could use
 741       <literal>SLURM_JOB_ID</literal> on all clients managed by SLURM, and
 742       use <literal>procname_uid</literal> on clients not managed by SLURM,
 743       such as interactive login nodes.</para>
 744
 745       <para>It is not possible to have different
 746       <literal>jobid_var</literal> settings on a single node, since it is
 747       unlikely that multiple job schedulers are active on one client.
 748       However, the actual JobID value is local to each process environment
 749       and it is possible for multiple jobs with different JobIDs to be
 750       active on a single client at one time.</para>
 751     </section>
 752
 753     <section remap="h3">
 754       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 755 Enable/Disable Jobstats</title>
 756       <para>Jobstats are disabled by default.  The current state of jobstats
 757       can be verified by checking <literal>lctl get_param jobid_var</literal>
 758       on a client:</para>
 759 <screen>
 760 clieht# lctl get_param jobid_var
 761 jobid_var=disable
 762 </screen>
 763       <para>
 764       To enable jobstats on all clients for SLURM:</para>
 765 <screen>
 766 mgs# lctl set_param -P jobid_var=SLURM_JOB_ID
 767 </screen>
 768       <para>The <literal>lctl set_param</literal> command to enable or disable
 769       jobstats should be run on the MGS as root. The change is persistent, and
 770       will be propagated to the MDS, OSS, and client nodes automatically when
 771       it is set on the MGS and for each new client mount.</para>
 772       <para>To temporarily enable jobstats on a client, or to use a different
 773       jobid_var on a subset of nodes, such as nodes in a remote cluster that
 774       use a different job scheduler, or interactive login nodes that do not
 775       use a job scheduler at all, run the <literal>lctl set_param</literal>
 776       command directly on the client node(s) after the filesystem is mounted.
 777       For example, to enable the <literal>procname_uid</literal> synthetic
 778       JobID locally on a login node run:
 779 <screen>
 780 client# lctl set_param jobid_var=procname_uid
 781 </screen>
 782       The <literal>lctl set_param</literal> setting is not persistent, and will
 783       be reset if the global <literal>jobid_var</literal> is set on the MGS or
 784       if the filesystem is unmounted.</para>
 785       <para>The following table shows the environment variables which are set
 786       by various job schedulers.  Set <literal>jobid_var</literal> to the value
 787       for your job scheduler to collect statistics on a per job basis.</para>
 788     <informaltable frame="all">
 789       <tgroup cols="2">
 790         <colspec colname="c1" colwidth="50*"/>
 791         <colspec colname="c2" colwidth="50*"/>
 792         <thead>
 793           <row>
 794             <entry>
 795               <para><emphasis role="bold">Job Scheduler</emphasis></para>
 796             </entry>
 797             <entry>
 798               <para><emphasis role="bold">Environment Variable</emphasis></para>
 799             </entry>
 800           </row>
 801         </thead>
 802         <tbody>
 803           <row>
 804             <entry>
 805               <para>Simple Linux Utility for Resource Management (SLURM)</para>
 806             </entry>
 807             <entry>
 808               <para>SLURM_JOB_ID</para>
 809             </entry>
 810           </row>
 811           <row>
 812             <entry>
 813               <para>Sun Grid Engine (SGE)</para>
 814             </entry>
 815             <entry>
 816               <para>JOB_ID</para>
 817             </entry>
 818           </row>
 819           <row>
 820             <entry>
 821               <para>Load Sharing Facility (LSF)</para>
 822             </entry>
 823             <entry>
 824               <para>LSB_JOBID</para>
 825             </entry>
 826           </row>
 827           <row>
 828             <entry>
 829               <para>Loadleveler</para>
 830             </entry>
 831             <entry>
 832               <para>LOADL_STEP_ID</para>
 833             </entry>
 834           </row>
 835           <row>
 836             <entry>
 837               <para>Portable Batch Scheduler (PBS)/MAUI</para>
 838             </entry>
 839             <entry>
 840               <para>PBS_JOBID</para>
 841             </entry>
 842           </row>
 843           <row>
 844             <entry>
 845               <para>Cray Application Level Placement Scheduler (ALPS)</para>
 846             </entry>
 847             <entry>
 848               <para>ALPS_APP_ID</para>
 849             </entry>
 850           </row>
 851         </tbody>
 852       </tgroup>
 853     </informaltable>
 854 <screen>
 855 mgs# lctl set_param -P jobid_var=disable
 856 </screen>
 857     <para>To track job stats per process name and user ID (for debugging, or
 858     if no job scheduler is in use on some nodes such as login nodes), specify
 859     <literal>jobid_var</literal> as <literal>procname_uid</literal>:</para>
 860 <screen>
 861 client# lctl set_param jobid_var=procname_uid
 862 </screen>
 863     </section>
 864     <section remap="h3">
 865       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 866 Check Job Stats</title>
 867     <para>Metadata operation statistics are collected on MDTs. These statistics
 868       can be accessed for all file systems and all jobs on the MDT via the
 869       <literal>lctl get_param mdt.*.job_stats</literal>. For example, clients
 870       running with <literal>jobid_var=procname_uid</literal>:
 871     </para>
 872 <screen>
 873 mds# lctl get_param mdt.*.job_stats
 874 job_stats:
 875 - job_id:          bash.0
 876   snapshot_time:   1352084992
 877   open:            { samples:     2, unit:  reqs }
 878   close:           { samples:     2, unit:  reqs }
 879   getattr:         { samples:     3, unit:  reqs }
 880 - job_id:          mythbackend.0
 881   snapshot_time:   1352084996
 882   open:            { samples:    72, unit:  reqs }
 883   close:           { samples:    73, unit:  reqs }
 884   unlink:          { samples:    22, unit:  reqs }
 885   getattr:         { samples:   778, unit:  reqs }
 886   setattr:         { samples:    22, unit:  reqs }
 887   statfs:          { samples: 19840, unit:  reqs }
 888   sync:            { samples: 33190, unit:  reqs }
 889 </screen>
 890     <para>Data operation statistics are collected on OSTs. Data operations
 891     statistics can be accessed via
 892     <literal>lctl get_param obdfilter.*.job_stats</literal>, for example:</para>
 893 <screen>
 894 oss# lctl get_param obdfilter.*.job_stats
 895 obdfilter.myth-OST0000.job_stats=
 896 job_stats:
 897 - job_id:          mythcommflag.0
 898   snapshot_time:   1429714922
 899   read:    { samples: 974, unit: bytes, min: 4096, max: 1048576, sum: 91530035 }
 900   write:   { samples:   0, unit: bytes, min:    0, max:       0, sum:        0 }
 901 obdfilter.myth-OST0001.job_stats=
 902 job_stats:
 903 - job_id:          mythbackend.0
 904   snapshot_time:   1429715270
 905   read:    { samples:   0, unit: bytes, min:     0, max:      0, sum:        0 }
 906   write:   { samples:   1, unit: bytes, min: 96899, max:  96899, sum:    96899 }
 907   punch:   { samples:   1, unit:  reqs }
 908 obdfilter.myth-OST0002.job_stats=job_stats:
 909 obdfilter.myth-OST0003.job_stats=job_stats:
 910 obdfilter.myth-OST0004.job_stats=
 911 job_stats:
 912 - job_id:          mythfrontend.500
 913   snapshot_time:   1429692083
 914   read:    { samples:   9, unit: bytes, min: 16384, max: 1048576, sum: 4444160 }
 915   write:   { samples:   0, unit: bytes, min:     0, max:       0, sum:       0 }
 916 - job_id:          mythbackend.500
 917   snapshot_time:   1429692129
 918   read:    { samples:   0, unit: bytes, min:     0, max:       0, sum:       0 }
 919   write:   { samples:   1, unit: bytes, min: 56231, max:   56231, sum:   56231 }
 920   punch:   { samples:   1, unit:  reqs }
 921 </screen>
 922     </section>
 923     <section remap="h3">
 924       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 925 Clear Job Stats</title>
 926     <para>Accumulated job statistics can be reset by writing proc file
 927       <literal>job_stats</literal>.</para>
 928     <para>Clear statistics for all jobs on the local node:</para>
 929 <screen>
 930 oss# lctl set_param obdfilter.*.job_stats=clear
 931 </screen>
 932     <para>Clear statistics only for job 'bash.0' on lustre-MDT0000:</para>
 933 <screen>
 934 mds# lctl set_param mdt.lustre-MDT0000.job_stats=bash.0
 935 </screen>
 936     </section>
 937     <section remap="h3">
 938       <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
 939 Configure Auto-cleanup Interval</title>
 940     <para>By default, if a job is inactive for 600 seconds (10 minutes)
 941       statistics for this job will be dropped. This expiration value
 942       can be changed temporarily via:
 943     </para>
 944 <screen>
 945 mds# lctl set_param *.*.job_cleanup_interval={max_age}
 946 </screen>
 947     <para>It can also be changed permanently, for example to 700 seconds via:
 948     </para>
 949 <screen>
 950 mgs# lctl set_param -P mdt.testfs-*.job_cleanup_interval=700
 951 </screen>
 952     <para>The <literal>job_cleanup_interval</literal> can be set
 953       as 0 to disable the auto-cleanup. Note that if auto-cleanup of
 954       Jobstats is disabled, then all statistics will be kept in memory
 955       forever, which may eventually consume all memory on the servers.
 956       In this case, any monitoring tool should explicitly clear
 957       individual job statistics as they are processed, as shown above.
 958     </para>
 959     </section>
 960     <section remap="h3" condition='l2E'>
 961       <title><indexterm><primary>monitoring</primary><secondary>lljobstat</secondary></indexterm>
 962 Identifying Top Jobs</title>
 963       <para>Since Lustre 2.15 the <literal>lljobstat</literal>
 964         utility can be used to monitor and identify the top JobIDs generating
 965         load on a particular server.  This allows the administrator to quickly
 966         see which applications/users/clients (depending on how the JobID is
 967         conigured) are generating the most filesystem RPCs and take appropriate
 968         action if needed.
 969       </para>
 970 <screen>
 971 mds# lljobstat -c 10
 972 ---
 973     timestamp: 1665984678
 974     top_jobs:
 975     - ls.500:          {ops: 64, ga: 64}
 976     - touch.500:       {ops: 6, op: 1, cl: 1, mn: 1, ga: 1, sa: 2}
 977     - bash.0:          {ops: 3, ga: 3}
 978     ...
 979 </screen>
 980       <para>It is possible to specify the number of top jobs to monitor as
 981         well as the refresh interval, among other options.</para>
 982     </section>
 983   </section>
 984   <section xml:id="lmt">
 985     <title><indexterm>
 986         <primary>monitoring</primary>
 987         <secondary>Lustre Monitoring Tool</secondary>
 988       </indexterm> Lustre Monitoring Tool (LMT)</title>
 989     <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed
 990       system that provides a <literal>top</literal>-like display of activity
 991       on server-side nodes (MDS, OSS and portals routers) on one or more
 992       Lustre file systems. It does not provide support for monitoring
 993       clients. For more information on LMT, including the setup procedure,
 994       see:</para>
 995     <para><link xl:href="https://github.com/chaos/lmt/wiki">
 996       https://github.com/chaos/lmt/wiki</link></para>
 997   </section>
 998   <section xml:id="collectl">
 999     <title>
1000       <literal>CollectL</literal>
1001     </title>
1002     <para><literal>CollectL</literal> is another tool that can be used to monitor a Lustre file
1003       system. You can run <literal>CollectL</literal> on a Lustre system that has any combination of
1004       MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and
1005       played back at a later time. It can also be converted to a format suitable for
1006       plotting.</para>
1007     <para>For more information about <literal>CollectL</literal>, see:</para>
1008     <para><link xl:href="http://collectl.sourceforge.net">
1009     http://collectl.sourceforge.net</link></para>
1010     <para>Lustre-specific documentation is also available. See:</para>
1011     <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">
1012       http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
1013   </section>
1014   <section xml:id="other_monitoring_options">
1015     <title><indexterm><primary>monitoring</primary><secondary>additional tools</secondary></indexterm>
1016 Other Monitoring Options</title>
1017     <para>A variety of standard tools are available publicly including the following:<itemizedlist>
1018         <listitem>
1019           <para><literal>lltop</literal> - Lustre load monitor with batch scheduler integration.
1020               <link xmlns:xlink="http://www.w3.org/1999/xlink"
1021               xlink:href="https://github.com/jhammond/lltop"
1022               >https://github.com/jhammond/lltop</link></para>
1023         </listitem>
1024         <listitem>
1025           <para><literal>tacc_stats</literal> - A job-oriented system monitor, analyzation, and
1026             visualization tool that probes Lustre interfaces and collects statistics. <link
1027               xmlns:xlink="http://www.w3.org/1999/xlink"
1028               xlink:href="https://github.com/jhammond/tacc_stats"/></para>
1029         </listitem>
1030         <listitem>
1031           <para><literal>xltop</literal> - A continuous Lustre monitor with batch scheduler
1032             integration. <link xmlns:xlink="http://www.w3.org/1999/xlink"
1033               xlink:href="https://github.com/jhammond/xltop"/></para>
1034         </listitem>
1035       </itemizedlist></para>
1036     <para>Another option is to script a simple monitoring solution that looks at various reports
1037       from <literal>ipconfig</literal>, as well as the <literal>procfs</literal> files generated by
1038       the Lustre software.</para>
1039   </section>
1040 </chapter>
1041 <!--
1042   vim:expandtab:shiftwidth=2:tabstop=8:
1043   -->