1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="lustremonitoring">
5 <title xml:id="lustremonitoring.title">Monitoring a Lustre File System</title>
6 <para>This chapter provides information on monitoring a Lustre file system and includes the
7 following sections:</para>
10 <para><xref linkend="lustre_changelogs"/>Lustre Changelogs</para>
13 <para><xref linkend="jobstats"/>Lustre Jobstats</para>
16 <para><xref linkend="lmt"/>Lustre Monitoring Tool</para>
19 <para><xref linkend="collectl"/>CollectL</para>
22 <para><xref linkend="other_monitoring_options"/>Other Monitoring Options</para>
25 <section xml:id="lustre_changelogs">
26 <title><indexterm><primary>change logs</primary><see>monitoring</see></indexterm>
27 <indexterm><primary>monitoring</primary></indexterm>
28 <indexterm><primary>monitoring</primary><secondary>change logs</secondary></indexterm>
30 Lustre Changelogs</title>
31 <para>The changelogs feature records events that change the file system
32 namespace or file metadata. Changes such as file creation, deletion,
33 renaming, attribute changes, etc. are recorded with the target and parent
34 file identifiers (FIDs), the name of the target, a timestamp, and user
35 information. These records can be used for a variety of purposes:</para>
38 <para>Capture recent changes to feed into an archiving system.</para>
41 <para>Use changelog entries to exactly replicate changes in a file
45 <para>Set up "watch scripts" that take action on certain
46 events or directories.</para>
49 <para>Audit activity on Lustre, thanks to user information associated to
50 file/directory changes with timestamps.</para>
53 <para>Changelogs record types are:</para>
54 <informaltable frame="all">
56 <colspec colname="c1" colwidth="50*"/>
57 <colspec colname="c2" colwidth="50*"/>
61 <para><emphasis role="bold">Value</emphasis></para>
64 <para><emphasis role="bold">Description</emphasis></para>
74 <para> Internal recordkeeping</para>
82 <para> Regular file creation</para>
90 <para> Directory creation</para>
98 <para> Hard link</para>
106 <para> Soft link</para>
114 <para> Other file creation</para>
122 <para> Regular file removal</para>
130 <para> Directory removal</para>
138 <para> Rename, original</para>
146 <para> Rename, final</para>
170 <para> Layout change</para>
178 <para> Regular file truncated</para>
186 <para> Attribute change</para>
194 <para> Extended attribute change (setxattr)</para>
202 <para> HSM specific event</para>
210 <para> MTIME change</para>
218 <para> CTIME change</para>
223 <para> ATIME *</para>
226 <para> ATIME change</para>
234 <para> Migration event</para>
242 <para> File Level Replication: file initially written</para>
250 <para> File Level Replication: file re-synced</para>
255 <para> GXATR *</para>
258 <para> Extended attribute access (getxattr)</para>
263 <para> NOPEN *</para>
266 <para> Denied open</para>
272 <note><para>Event types marked with * are not recorded by default. Refer to
273 <xref linkend="modifyChangelogMask" /> for instructions on
274 modifying the Changelogs mask.</para></note>
275 <para>FID-to-full-pathname and pathname-to-FID functions are also included
276 to map target and parent FIDs into the file system namespace.</para>
278 <title><indexterm><primary>monitoring</primary><secondary>change logs
279 </secondary></indexterm>
280 Working with Changelogs</title>
281 <para>Several commands are available to work with changelogs.</para>
284 <literal>lctl changelog_register</literal>
286 <para>Because changelog records take up space on the MDT, the system
287 administration must register changelog users. As soon as a changelog
288 user is registered, the Changelogs feature is enabled. The registrants
289 specify which records they are "done with", and the system
290 purges up to the greatest common record.</para>
291 <para>To register a new changelog user, run:</para>
293 mds# lctl --device <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> changelog_register
295 <para>Changelog entries are not purged beyond a registered user's
296 set point (see <literal>lfs changelog_clear</literal>).</para>
300 <literal>lfs changelog</literal>
302 <para>To display the metadata changes on an MDT (the changelog records),
305 client# lfs changelog <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> [startrec [endrec]]
307 <para>It is optional whether to specify the start and end
309 <para>These are sample changelog records:</para>
311 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \
312 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
313 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \
314 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
315 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \
316 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
317 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \
318 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
323 <literal>lfs changelog_clear</literal>
325 <para>To clear old changelog records for a specific user (records that
326 the user no longer needs), run:</para>
328 client# lfs changelog_clear <replaceable>mdt_name</replaceable> <replaceable>userid</replaceable> <replaceable>endrec</replaceable>
330 <para>The <literal>changelog_clear</literal> command indicates that
331 changelog records previous to <replaceable>endrec</replaceable> are no
332 longer of interest to a particular user
333 <replaceable>userid</replaceable>, potentially allowing the MDT to free
334 up disk space. An <literal><replaceable>endrec</replaceable></literal>
335 value of 0 indicates the current last record. To run
336 <literal>changelog_clear</literal>, the changelog user must be
337 registered on the MDT node using <literal>lctl</literal>.</para>
338 <para>When all changelog users are done with records < X, the records
343 <literal>lctl changelog_deregister</literal>
345 <para>To deregister (unregister) a changelog user, run:</para>
347 mds# lctl --device <replaceable>mdt_device</replaceable> changelog_deregister <replaceable>userid</replaceable>
349 <para> <literal>changelog_deregister cl1</literal> effectively does a
350 <literal>lfs changelog_clear cl1 0</literal> as it deregisters.</para>
354 <title>Changelog Examples</title>
355 <para>This section provides examples of different changelog
358 <title>Registering a Changelog User</title>
359 <para>To register a new changelog user for a device
360 (<literal>lustre-MDT0000</literal>):</para>
362 mds# lctl --device lustre-MDT0000 changelog_register
363 lustre-MDT0000: Registered changelog userid 'cl1'
367 <title>Displaying Changelog Records</title>
368 <para>To display changelog records for an MDT
369 (e.g. <literal>lustre-MDT0000</literal>):</para>
371 client# lfs changelog lustre-MDT0000
372 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \
373 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
374 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
375 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
376 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \
377 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
378 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
379 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
381 <para>Changelog records include this information:</para>
383 rec# operation_type(numerical/text) timestamp datestamp flags
384 t=target_FID ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name
386 <para>Displayed in this format:</para>
388 rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
389 ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name
391 <para>For example:</para>
393 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
394 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
398 <title>Clearing Changelog Records</title>
399 <para>To notify a device that a specific user (<literal>cl1</literal>)
400 no longer needs records (up to and including 3):</para>
402 # lfs changelog_clear lustre-MDT0000 cl1 3
404 <para>To confirm that the <literal>changelog_clear</literal> operation
405 was successful, run <literal>lfs changelog</literal>; only records after
406 id-3 are listed:</para>
408 # lfs changelog lustre-MDT0000
409 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
410 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
414 <title>Deregistering a Changelog User</title>
415 <para>To deregister a changelog user (<literal>cl1</literal>) for a
416 specific device (<literal>lustre-MDT0000</literal>):</para>
418 mds# lctl --device lustre-MDT0000 changelog_deregister cl1
419 lustre-MDT0000: Deregistered changelog user 'cl1'
421 <para>The deregistration operation clears all changelog records for the
422 specified user (<literal>cl1</literal>).</para>
424 client# lfs changelog lustre-MDT0000
425 5 00MARK 15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \
426 u=500:500 nid=0@<0:0> p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0
429 <para>MARK records typically indicate changelog recording status
434 <title>Displaying the Changelog Index and Registered Users</title>
435 <para>To display the current, maximum changelog index and registered
436 changelog users for a specific device
437 (<literal>lustre-MDT0000</literal>):</para>
439 mds# lctl get_param mdd.lustre-MDT0000.changelog_users
440 mdd.lustre-MDT0000.changelog_users=current index: 8
441 ID index (idle seconds)
446 <title>Displaying the Changelog Mask</title>
447 <para>To show the current changelog mask on a specific device
448 (<literal>lustre-MDT0000</literal>):</para>
450 mds# lctl get_param mdd.lustre-MDT0000.changelog_mask
452 mdd.lustre-MDT0000.changelog_mask=
453 MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \
454 TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
457 <section xml:id="modifyChangelogMask" remap="h5">
458 <title>Setting the Changelog Mask</title>
459 <para>To set the current changelog mask on a specific device
460 (<literal>lustre-MDT0000</literal>):</para>
462 mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK
463 mdd.lustre-MDT0000.changelog_mask=HLINK
464 $ lfs changelog_clear lustre-MDT0000 cl1 0
465 $ mkdir /mnt/lustre/mydir/foo
466 $ cp /etc/hosts /mnt/lustre/mydir/foo/file
467 $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink
469 <para>Only item types that are in the mask show up in the
472 # lfs changelog lustre-MDT0000
473 9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \
474 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink
479 <section remap="h3" condition='l2B'>
480 <title><indexterm><primary>audit</primary>
481 <secondary>change logs</secondary></indexterm>
482 Audit with Changelogs</title>
483 <para>A specific use case for Lustre Changelogs is audit. According to a
484 definition found on <link xmlns:xlink="http://www.w3.org/1999/xlink"
485 xlink:href="https://en.wikipedia.org/wiki/Information_technology_audit">
486 Wikipedia</link>, information technology audits are used to evaluate the
487 organization's ability to protect its information assets and to properly
488 dispense information to authorized parties. Basically, audit consists in
489 controlling that all data accesses made were done according to the access
490 control policy in place. And usually, this is done by analyzing access
492 <para>Audit can be used as a proof of security in place. But Audit can
493 also be a requirement to comply with regulations.</para>
494 <para>Lustre Changelogs are a good mechanism for audit, because this is a
495 centralized facility, and it is designed to be transactional. Changelog
496 records contain all information necessary for auditing purposes:</para>
499 <para>ability to identify object of action thanks to file identifiers
500 (FIDs) and name of targets</para>
503 <para>ability to identify subject of action thanks to UID/GID and NID
507 <para>ability to identify time of action thanks to timestamp</para>
511 <title>Enabling Audit</title>
512 <para>To have a fully functional Changelogs-based audit facility, some
513 additional Changelog record types must be enabled, to be able to record
514 events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that
515 enabling these record types may have some performance impact. For
516 instance, recording OPEN and GETXATTR events generate writes in the
517 Changelog records for a read operation from a file-system
519 <para>Being able to record events such as OPEN or DENIED OPEN is
520 important from an audit perspective. For instance, if Lustre file system
521 is used to store medical records on a system dedicated to Life Sciences,
522 data privacy is crucial. Administrators may need to know which doctors
523 accessed, or tried to access, a given medical record and when. And
524 conversely, they might need to know which medical records a given doctor
526 <para>To enable all changelog entry types, do:</para>
528 mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL
529 mdd.seb-MDT0000.changelog_mask=ALL
531 <para>Once all required record types have been enabled, just register a
532 Changelogs user and the audit facility is operational.</para>
533 <para>Note that, however, it is possible to control which Lustre client
534 nodes can trigger the recording of file system access events to the
535 Changelogs, thanks to the <literal>audit_mode</literal> flag on nodemap
536 entries. The reason to disable audit on a per-nodemap basis is to
537 prevent some nodes (e.g. backup, HSM agent nodes) from flooding the
538 audit logs. When <literal>audit_mode</literal> flag is
539 set to 1 on a nodemap entry, a client pertaining to this nodemap will be
540 able to record file system access events to the Changelogs, if
541 Changelogs are otherwise activated. When set to 0, events are not logged
542 into the Changelogs, no matter if Changelogs are activated or not. By
543 default, <literal>audit_mode</literal> flag is set to 1 in newly created
544 nodemap entries. And it is also set to 1 in 'default' nodemap.</para>
545 <para>To prevent nodes pertaining to a nodemap to generate Changelog
548 mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0
552 <title>Audit examples</title>
555 <literal>OPEN</literal>
557 <para>An OPEN changelog entry is in the form:</para>
559 7 10OPEN 13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \
560 ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w-
562 <para>It includes information about the open mode, in the form
564 <para>OPEN entries are recorded only once per UID/GID, for a given
565 open mode, as long as the file is not closed by this UID/GID. It
566 avoids flooding the Changelogs for instance if there is an MPI job
567 opening the same file thousands of times from different threads. It
568 reduces the ChangeLog load significantly, without significantly
569 affecting the audit information. Similarly, only the last CLOSE per
570 UID/GID is recorded.</para>
574 <literal>GETXATTR</literal>
576 <para>A GETXATTR changelog entry is in the form:</para>
578 8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \
579 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0
581 <para>It includes information about the name of the extended attribute
582 being accessed, in the form <literal>x=<xattr name></literal>.
587 <literal>SETXATTR</literal>
589 <para>A SETXATTR changelog entry is in the form:</para>
591 4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \
592 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0
594 <para>It includes information about the name of the extended attribute
595 being modified, in the form <literal>x=<xattr name></literal>.
600 <literal>DENIED OPEN</literal>
602 <para>A DENIED OPEN changelog entry is in the form:</para>
604 4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \
605 ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w-
607 <para>It has the same information as a regular OPEN entry. In order to
608 avoid flooding the Changelogs, DENIED OPEN entries are rate limited:
609 no more than one entry per user per file per time interval, this time
610 interval (in seconds) being configurable via
611 <literal>mdd.<mdtname>.changelog_deniednext</literal>
612 (default value is 60 seconds).</para>
614 mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120
615 mdd.seb-MDT0000.changelog_deniednext=120
616 mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext
617 mdd.seb-MDT0000.changelog_deniednext=120
623 <section xml:id="jobstats">
624 <title><indexterm><primary>jobstats</primary><see>monitoring</see></indexterm>
625 <indexterm><primary>monitoring</primary></indexterm>
626 <indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
628 Lustre Jobstats</title>
629 <para>The Lustre jobstats feature collects file system operation statistics
630 for user processes running on Lustre clients, and exposes on the server
631 using the unique Job Identifier (JobID) provided by the job scheduler for
632 each job. Job schedulers known to be able to work with jobstats include:
633 SLURM, SGE, LSF, Loadleveler, PBS and Maui/MOAB.</para>
634 <para>Since jobstats is implemented in a scheduler-agnostic manner, it is
635 likely that it will be able to work with other schedulers also, and also
636 in environments that do not use a job scheduler, by storing custom format
637 strings in the <literal>jobid_name</literal>.</para>
639 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
640 How Jobstats Works</title>
641 <para>The Lustre jobstats code on the client extracts the unique JobID
642 from an environment variable within the user process, and sends this
643 JobID to the server with all RPCs. This allows the server to tracks
644 statistics for operations specific to each application/command running
645 on the client, and can be useful to identify the source high I/O load.
648 <para>A Lustre setting on the client, <literal>jobid_var</literal>,
649 specifies an environment variable or other client-local source that
650 to holds a (relatively) unique the JobID for the running application.
651 Any environment variable can be specified. For example, SLURM sets the
652 <literal>SLURM_JOB_ID</literal> environment variable with the unique
653 JobID for all clients running a particular job launched on one or
655 the <literal>SLURM_JOB_ID</literal> will be inherited by all child
656 processes started below that process.</para>
658 <para>There are several reserved values for <literal>jobid_var</literal>:
661 <para><literal>disable</literal> - disables sending a JobID from
665 <para><literal>procname_uid</literal> - uses the process name and UID,
666 equivalent to setting <literal>jobid_name=%e.%u</literal></para>
669 <para><literal>nodelocal</literal> - use only the JobID format from
670 <literal>jobid_name</literal></para>
673 <para><literal>session</literal> - extract the JobID from
674 <literal>jobid_this_session</literal></para>
679 <para>Lustre can also be configured to generate a synthetic JobID from
680 the client's process name and numeric UID, by setting
681 <literal>jobid_var=procname_uid</literal>. This will generate a
682 uniform JobID when running the same binary across multiple client
683 nodes, but cannot distinguish whether the binary is part of a single
684 distributed process or multiple independent processes. This can be
685 useful on login nodes where interactive commands are run.
688 <para condition="l28">In Lustre 2.8 and later it is possible to set
689 <literal>jobid_var=nodelocal</literal> and then also set
690 <literal>jobid_name=</literal><replaceable>name</replaceable>, which
691 <emphasis>all</emphasis> processes on that client node will use. This
692 is useful if only a single job is run on a client at one time, but if
693 multiple jobs are run on a client concurrently, the
694 <literal>session</literal> JobID should be used.
697 <para condition="l2C">In Lustre 2.12 and later, it is possible to
698 specify more complex JobID values for <literal>jobid_name</literal>
699 by using a string that contains format codes that are evaluated for
700 each process, in order to generate a site- or node-specific JobID string.
704 <para><emphasis>%e</emphasis> print executable name</para>
707 <para><emphasis>%g</emphasis> print group ID number</para>
710 <para><emphasis>%h</emphasis> print fully-qualified hostname</para>
713 <para><emphasis>%H</emphasis> print short hostname</para>
716 <para><emphasis>%j</emphasis> print JobID from the source named
717 by the <emphasis>jobid_var</emphasis> parameter
721 <para><emphasis>%p</emphasis> print numeric process ID</para>
724 <para><emphasis>%u</emphasis> print user ID number</para>
728 <para condition="l2D">In Lustre 2.13 and later, it is possible to
729 set a per-session JobID via the <literal>jobid_this_session</literal>
730 parameter <emphasis>instead</emphasis> of getting the JobID from an
731 environment variable. This session ID will be
732 inherited by all processes that are started in this login session,
733 though there can be a different JobID for each login session. This
734 is enabled by setting <literal>jobid_var=session</literal> instead
735 of setting it to an environment variable. The session ID will be
736 substituted for <literal>%j</literal> in <literal>jobid_name</literal>.
739 <para>The setting of <literal>jobid_var</literal> need not be the same
740 on all clients. For example, one could use
741 <literal>SLURM_JOB_ID</literal> on all clients managed by SLURM, and
742 use <literal>procname_uid</literal> on clients not managed by SLURM,
743 such as interactive login nodes.</para>
745 <para>It is not possible to have different
746 <literal>jobid_var</literal> settings on a single node, since it is
747 unlikely that multiple job schedulers are active on one client.
748 However, the actual JobID value is local to each process environment
749 and it is possible for multiple jobs with different JobIDs to be
750 active on a single client at one time.</para>
754 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
755 Enable/Disable Jobstats</title>
756 <para>Jobstats are disabled by default. The current state of jobstats
757 can be verified by checking <literal>lctl get_param jobid_var</literal>
760 clieht# lctl get_param jobid_var
764 To enable jobstats on all clients for SLURM:</para>
766 mgs# lctl set_param -P jobid_var=SLURM_JOB_ID
768 <para>The <literal>lctl set_param</literal> command to enable or disable
769 jobstats should be run on the MGS as root. The change is persistent, and
770 will be propagated to the MDS, OSS, and client nodes automatically when
771 it is set on the MGS and for each new client mount.</para>
772 <para>To temporarily enable jobstats on a client, or to use a different
773 jobid_var on a subset of nodes, such as nodes in a remote cluster that
774 use a different job scheduler, or interactive login nodes that do not
775 use a job scheduler at all, run the <literal>lctl set_param</literal>
776 command directly on the client node(s) after the filesystem is mounted.
777 For example, to enable the <literal>procname_uid</literal> synthetic
778 JobID locally on a login node run:
780 client# lctl set_param jobid_var=procname_uid
782 The <literal>lctl set_param</literal> setting is not persistent, and will
783 be reset if the global <literal>jobid_var</literal> is set on the MGS or
784 if the filesystem is unmounted.</para>
785 <para>The following table shows the environment variables which are set
786 by various job schedulers. Set <literal>jobid_var</literal> to the value
787 for your job scheduler to collect statistics on a per job basis.</para>
788 <informaltable frame="all">
790 <colspec colname="c1" colwidth="50*"/>
791 <colspec colname="c2" colwidth="50*"/>
795 <para><emphasis role="bold">Job Scheduler</emphasis></para>
798 <para><emphasis role="bold">Environment Variable</emphasis></para>
805 <para>Simple Linux Utility for Resource Management (SLURM)</para>
808 <para>SLURM_JOB_ID</para>
813 <para>Sun Grid Engine (SGE)</para>
821 <para>Load Sharing Facility (LSF)</para>
824 <para>LSB_JOBID</para>
829 <para>Loadleveler</para>
832 <para>LOADL_STEP_ID</para>
837 <para>Portable Batch Scheduler (PBS)/MAUI</para>
840 <para>PBS_JOBID</para>
845 <para>Cray Application Level Placement Scheduler (ALPS)</para>
848 <para>ALPS_APP_ID</para>
855 mgs# lctl set_param -P jobid_var=disable
857 <para>To track job stats per process name and user ID (for debugging, or
858 if no job scheduler is in use on some nodes such as login nodes), specify
859 <literal>jobid_var</literal> as <literal>procname_uid</literal>:</para>
861 client# lctl set_param jobid_var=procname_uid
865 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
866 Check Job Stats</title>
867 <para>Metadata operation statistics are collected on MDTs. These statistics
868 can be accessed for all file systems and all jobs on the MDT via the
869 <literal>lctl get_param mdt.*.job_stats</literal>. For example, clients
870 running with <literal>jobid_var=procname_uid</literal>:
873 mds# lctl get_param mdt.*.job_stats
876 snapshot_time: 1352084992
877 open: { samples: 2, unit: reqs }
878 close: { samples: 2, unit: reqs }
879 getattr: { samples: 3, unit: reqs }
880 - job_id: mythbackend.0
881 snapshot_time: 1352084996
882 open: { samples: 72, unit: reqs }
883 close: { samples: 73, unit: reqs }
884 unlink: { samples: 22, unit: reqs }
885 getattr: { samples: 778, unit: reqs }
886 setattr: { samples: 22, unit: reqs }
887 statfs: { samples: 19840, unit: reqs }
888 sync: { samples: 33190, unit: reqs }
890 <para>Data operation statistics are collected on OSTs. Data operations
891 statistics can be accessed via
892 <literal>lctl get_param obdfilter.*.job_stats</literal>, for example:</para>
894 oss# lctl get_param obdfilter.*.job_stats
895 obdfilter.myth-OST0000.job_stats=
897 - job_id: mythcommflag.0
898 snapshot_time: 1429714922
899 read: { samples: 974, unit: bytes, min: 4096, max: 1048576, sum: 91530035 }
900 write: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
901 obdfilter.myth-OST0001.job_stats=
903 - job_id: mythbackend.0
904 snapshot_time: 1429715270
905 read: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
906 write: { samples: 1, unit: bytes, min: 96899, max: 96899, sum: 96899 }
907 punch: { samples: 1, unit: reqs }
908 obdfilter.myth-OST0002.job_stats=job_stats:
909 obdfilter.myth-OST0003.job_stats=job_stats:
910 obdfilter.myth-OST0004.job_stats=
912 - job_id: mythfrontend.500
913 snapshot_time: 1429692083
914 read: { samples: 9, unit: bytes, min: 16384, max: 1048576, sum: 4444160 }
915 write: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
916 - job_id: mythbackend.500
917 snapshot_time: 1429692129
918 read: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
919 write: { samples: 1, unit: bytes, min: 56231, max: 56231, sum: 56231 }
920 punch: { samples: 1, unit: reqs }
924 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
925 Clear Job Stats</title>
926 <para>Accumulated job statistics can be reset by writing proc file
927 <literal>job_stats</literal>.</para>
928 <para>Clear statistics for all jobs on the local node:</para>
930 oss# lctl set_param obdfilter.*.job_stats=clear
932 <para>Clear statistics only for job 'bash.0' on lustre-MDT0000:</para>
934 mds# lctl set_param mdt.lustre-MDT0000.job_stats=bash.0
938 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
939 Configure Auto-cleanup Interval</title>
940 <para>By default, if a job is inactive for 600 seconds (10 minutes)
941 statistics for this job will be dropped. This expiration value
942 can be changed temporarily via:
945 mds# lctl set_param *.*.job_cleanup_interval={max_age}
947 <para>It can also be changed permanently, for example to 700 seconds via:
950 mgs# lctl set_param -P mdt.testfs-*.job_cleanup_interval=700
952 <para>The <literal>job_cleanup_interval</literal> can be set
953 as 0 to disable the auto-cleanup. Note that if auto-cleanup of
954 Jobstats is disabled, then all statistics will be kept in memory
955 forever, which may eventually consume all memory on the servers.
956 In this case, any monitoring tool should explicitly clear
957 individual job statistics as they are processed, as shown above.
960 <section remap="h3" condition='l2E'>
961 <title><indexterm><primary>monitoring</primary><secondary>lljobstat</secondary></indexterm>
962 Identifying Top Jobs</title>
963 <para>Since Lustre 2.15 the <literal>lljobstat</literal>
964 utility can be used to monitor and identify the top JobIDs generating
965 load on a particular server. This allows the administrator to quickly
966 see which applications/users/clients (depending on how the JobID is
967 conigured) are generating the most filesystem RPCs and take appropriate
973 timestamp: 1665984678
975 - ls.500: {ops: 64, ga: 64}
976 - touch.500: {ops: 6, op: 1, cl: 1, mn: 1, ga: 1, sa: 2}
977 - bash.0: {ops: 3, ga: 3}
980 <para>It is possible to specify the number of top jobs to monitor as
981 well as the refresh interval, among other options.</para>
984 <section xml:id="lmt">
986 <primary>monitoring</primary>
987 <secondary>Lustre Monitoring Tool</secondary>
988 </indexterm> Lustre Monitoring Tool (LMT)</title>
989 <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed
990 system that provides a <literal>top</literal>-like display of activity
991 on server-side nodes (MDS, OSS and portals routers) on one or more
992 Lustre file systems. It does not provide support for monitoring
993 clients. For more information on LMT, including the setup procedure,
995 <para><link xl:href="https://github.com/chaos/lmt/wiki">
996 https://github.com/chaos/lmt/wiki</link></para>
998 <section xml:id="collectl">
1000 <literal>CollectL</literal>
1002 <para><literal>CollectL</literal> is another tool that can be used to monitor a Lustre file
1003 system. You can run <literal>CollectL</literal> on a Lustre system that has any combination of
1004 MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and
1005 played back at a later time. It can also be converted to a format suitable for
1007 <para>For more information about <literal>CollectL</literal>, see:</para>
1008 <para><link xl:href="http://collectl.sourceforge.net">
1009 http://collectl.sourceforge.net</link></para>
1010 <para>Lustre-specific documentation is also available. See:</para>
1011 <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">
1012 http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
1014 <section xml:id="other_monitoring_options">
1015 <title><indexterm><primary>monitoring</primary><secondary>additional tools</secondary></indexterm>
1016 Other Monitoring Options</title>
1017 <para>A variety of standard tools are available publicly including the following:<itemizedlist>
1019 <para><literal>lltop</literal> - Lustre load monitor with batch scheduler integration.
1020 <link xmlns:xlink="http://www.w3.org/1999/xlink"
1021 xlink:href="https://github.com/jhammond/lltop"
1022 >https://github.com/jhammond/lltop</link></para>
1025 <para><literal>tacc_stats</literal> - A job-oriented system monitor, analyzation, and
1026 visualization tool that probes Lustre interfaces and collects statistics. <link
1027 xmlns:xlink="http://www.w3.org/1999/xlink"
1028 xlink:href="https://github.com/jhammond/tacc_stats"/></para>
1031 <para><literal>xltop</literal> - A continuous Lustre monitor with batch scheduler
1032 integration. <link xmlns:xlink="http://www.w3.org/1999/xlink"
1033 xlink:href="https://github.com/jhammond/xltop"/></para>
1035 </itemizedlist></para>
1036 <para>Another option is to script a simple monitoring solution that looks at various reports
1037 from <literal>ipconfig</literal>, as well as the <literal>procfs</literal> files generated by
1038 the Lustre software.</para>
1042 vim:expandtab:shiftwidth=2:tabstop=8: