1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="lustremonitoring">
5 <title xml:id="lustremonitoring.title">Monitoring a Lustre File System</title>
6 <para>This chapter provides information on monitoring a Lustre file system and includes the
7 following sections:</para>
10 <para><xref linkend="dbdoclet.50438273_18711"/>Lustre Changelogs</para>
13 <para><xref linkend="dbdoclet.jobstats"/>Lustre Jobstats</para>
16 <para><xref linkend="dbdoclet.50438273_81684"/>Lustre Monitoring Tool</para>
19 <para><xref linkend="dbdoclet.50438273_80593"/>CollectL</para>
22 <para><xref linkend="dbdoclet.50438273_44185"/>Other Monitoring Options</para>
25 <section xml:id="dbdoclet.50438273_18711">
26 <title><indexterm><primary>change logs</primary><see>monitoring</see></indexterm>
27 <indexterm><primary>monitoring</primary></indexterm>
28 <indexterm><primary>monitoring</primary><secondary>change logs</secondary></indexterm>
30 Lustre Changelogs</title>
31 <para>The changelogs feature records events that change the file system
32 namespace or file metadata. Changes such as file creation, deletion,
33 renaming, attribute changes, etc. are recorded with the target and parent
34 file identifiers (FIDs), the name of the target, a timestamp, and user
35 information. These records can be used for a variety of purposes:</para>
38 <para>Capture recent changes to feed into an archiving system.</para>
41 <para>Use changelog entries to exactly replicate changes in a file
45 <para>Set up "watch scripts" that take action on certain
46 events or directories.</para>
49 <para>Audit activity on Lustre, thanks to user information associated to
50 file/directory changes with timestamps.</para>
53 <para>Changelogs record types are:</para>
54 <informaltable frame="all">
56 <colspec colname="c1" colwidth="50*"/>
57 <colspec colname="c2" colwidth="50*"/>
61 <para><emphasis role="bold">Value</emphasis></para>
64 <para><emphasis role="bold">Description</emphasis></para>
74 <para> Internal recordkeeping</para>
82 <para> Regular file creation</para>
90 <para> Directory creation</para>
98 <para> Hard link</para>
106 <para> Soft link</para>
114 <para> Other file creation</para>
122 <para> Regular file removal</para>
130 <para> Directory removal</para>
138 <para> Rename, original</para>
146 <para> Rename, final</para>
170 <para> Layout change</para>
178 <para> Regular file truncated</para>
186 <para> Attribute change</para>
194 <para> Extended attribute change (setxattr)</para>
202 <para> HSM specific event</para>
210 <para> MTIME change</para>
218 <para> CTIME change</para>
223 <para> ATIME *</para>
226 <para> ATIME change</para>
234 <para> Migration event</para>
242 <para> File Level Replication: file initially written</para>
250 <para> File Level Replication: file re-synced</para>
255 <para> GXATR *</para>
258 <para> Extended attribute access (getxattr)</para>
263 <para> NOPEN *</para>
266 <para> Denied open</para>
272 <note><para>Event types marked with * are not recorded by default. Refer to
273 <xref linkend="dbdoclet.modifyChangelogMask" /> for instructions on
274 modifying the Changelogs mask.</para></note>
275 <para>FID-to-full-pathname and pathname-to-FID functions are also included
276 to map target and parent FIDs into the file system namespace.</para>
278 <title><indexterm><primary>monitoring</primary><secondary>change logs
279 </secondary></indexterm>
280 Working with Changelogs</title>
281 <para>Several commands are available to work with changelogs.</para>
284 <literal>lctl changelog_register</literal>
286 <para>Because changelog records take up space on the MDT, the system
287 administration must register changelog users. As soon as a changelog
288 user is registered, the Changelogs feature is enabled. The registrants
289 specify which records they are "done with", and the system
290 purges up to the greatest common record.</para>
291 <para>To register a new changelog user, run:</para>
292 <screen>mds# lctl --device <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> changelog_register
294 <para>Changelog entries are not purged beyond a registered user's
295 set point (see <literal>lfs changelog_clear</literal>).</para>
299 <literal>lfs changelog</literal>
301 <para>To display the metadata changes on an MDT (the changelog records),
303 <screen>lfs changelog <replaceable>fsname</replaceable>-<replaceable>MDTnumber</replaceable> [startrec [endrec]] </screen>
304 <para>It is optional whether to specify the start and end
306 <para>These are sample changelog records:</para>
307 <screen>1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \
308 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
309 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \
310 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
311 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \
312 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
313 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \
314 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics </screen>
318 <literal>lfs changelog_clear</literal>
320 <para>To clear old changelog records for a specific user (records that
321 the user no longer needs), run:</para>
322 <screen>lfs changelog_clear <replaceable>mdt_name</replaceable> <replaceable>userid</replaceable> <replaceable>endrec</replaceable></screen>
323 <para>The <literal>changelog_clear</literal> command indicates that
324 changelog records previous to <replaceable>endrec</replaceable> are no
325 longer of interest to a particular user
326 <replaceable>userid</replaceable>, potentially allowing the MDT to free
327 up disk space. An <literal><replaceable>endrec</replaceable></literal>
328 value of 0 indicates the current last record. To run
329 <literal>changelog_clear</literal>, the changelog user must be
330 registered on the MDT node using <literal>lctl</literal>.</para>
331 <para>When all changelog users are done with records < X, the records
336 <literal>lctl changelog_deregister</literal>
338 <para>To deregister (unregister) a changelog user, run:</para>
339 <screen>mds# lctl --device <replaceable>mdt_device</replaceable> changelog_deregister <replaceable>userid</replaceable> </screen>
340 <para> <literal>changelog_deregister cl1</literal> effectively does a
341 <literal>lfs changelog_clear cl1 0</literal> as it deregisters.</para>
345 <title>Changelog Examples</title>
346 <para>This section provides examples of different changelog
349 <title>Registering a Changelog User</title>
350 <para>To register a new changelog user for a device
351 (<literal>lustre-MDT0000</literal>):</para>
352 <screen>mds# lctl --device lustre-MDT0000 changelog_register
353 lustre-MDT0000: Registered changelog userid 'cl1'</screen>
356 <title>Displaying Changelog Records</title>
357 <para>To display changelog records on an MDT
358 (<literal>lustre-MDT0000</literal>):</para>
359 <screen>$ lfs changelog lustre-MDT0000
360 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \
361 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
362 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
363 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
364 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \
365 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
366 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
367 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics</screen>
368 <para>Changelog records include this information:</para>
370 operation_type(numerical/text)
380 <para>Displayed in this format:</para>
381 <screen>rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
382 ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name</screen>
383 <para>For example:</para>
384 <screen>2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \
385 u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg</screen>
388 <title>Clearing Changelog Records</title>
389 <para>To notify a device that a specific user (<literal>cl1</literal>)
390 no longer needs records (up to and including 3):</para>
391 <screen>$ lfs changelog_clear lustre-MDT0000 cl1 3</screen>
392 <para>To confirm that the <literal>changelog_clear</literal> operation
393 was successful, run <literal>lfs changelog</literal>; only records after
394 id-3 are listed:</para>
395 <screen>$ lfs changelog lustre-MDT0000
396 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \
397 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics</screen>
400 <title>Deregistering a Changelog User</title>
401 <para>To deregister a changelog user (<literal>cl1</literal>) for a
402 specific device (<literal>lustre-MDT0000</literal>):</para>
403 <screen>mds# lctl --device lustre-MDT0000 changelog_deregister cl1
404 lustre-MDT0000: Deregistered changelog user 'cl1'</screen>
405 <para>The deregistration operation clears all changelog records for the
406 specified user (<literal>cl1</literal>).</para>
407 <screen>$ lfs changelog lustre-MDT0000
408 5 00MARK 15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \
409 u=500:500 nid=0@<0:0> p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0
412 <para>MARK records typically indicate changelog recording status
417 <title>Displaying the Changelog Index and Registered Users</title>
418 <para>To display the current, maximum changelog index and registered
419 changelog users for a specific device
420 (<literal>lustre-MDT0000</literal>):</para>
421 <screen>mds# lctl get_param mdd.lustre-MDT0000.changelog_users
422 mdd.lustre-MDT0000.changelog_users=current index: 8
423 ID index (idle seconds)
428 <title>Displaying the Changelog Mask</title>
429 <para>To show the current changelog mask on a specific device
430 (<literal>lustre-MDT0000</literal>):</para>
431 <screen>mds# lctl get_param mdd.lustre-MDT0000.changelog_mask
433 mdd.lustre-MDT0000.changelog_mask=
434 MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \
435 TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
438 <section xml:id="dbdoclet.modifyChangelogMask" remap="h5">
439 <title>Setting the Changelog Mask</title>
440 <para>To set the current changelog mask on a specific device
441 (<literal>lustre-MDT0000</literal>):</para>
442 <screen>mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK
443 mdd.lustre-MDT0000.changelog_mask=HLINK
444 $ lfs changelog_clear lustre-MDT0000 cl1 0
445 $ mkdir /mnt/lustre/mydir/foo
446 $ cp /etc/hosts /mnt/lustre/mydir/foo/file
447 $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink
449 <para>Only item types that are in the mask show up in the
451 <screen>$ lfs changelog lustre-MDT0000
452 9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \
453 u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink
458 <section remap="h3" condition='l2B'>
459 <title><indexterm><primary>audit</primary>
460 <secondary>change logs</secondary></indexterm>
461 Audit with Changelogs</title>
462 <para>A specific use case for Lustre Changelogs is audit. According to a
463 definition found on <link xmlns:xlink="http://www.w3.org/1999/xlink"
464 xlink:href="https://en.wikipedia.org/wiki/Information_technology_audit">
465 Wikipedia</link>, information technology audits are used to evaluate the
466 organization's ability to protect its information assets and to properly
467 dispense information to authorized parties. Basically, audit consists in
468 controlling that all data accesses made were done according to the access
469 control policy in place. And usually, this is done by analyzing access
471 <para>Audit can be used as a proof of security in place. But Audit can
472 also be a requirement to comply with regulations.</para>
473 <para>Lustre Changelogs are a good mechanism for audit, because this is a
474 centralized facility, and it is designed to be transactional. Changelog
475 records contain all information necessary for auditing purposes:</para>
478 <para>ability to identify object of action thanks to file identifiers
479 (FIDs) and name of targets</para>
482 <para>ability to identify subject of action thanks to UID/GID and NID
486 <para>ability to identify time of action thanks to timestamp</para>
490 <title>Enabling Audit</title>
491 <para>To have a fully functional Changelogs-based audit facility, some
492 additional Changelog record types must be enabled, to be able to record
493 events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that
494 enabling these record types may have some performance impact. For
495 instance, recording OPEN and GETXATTR events generate writes in the
496 Changelog records for a read operation from a file-system
498 <para>Being able to record events such as OPEN or DENIED OPEN is
499 important from an audit perspective. For instance, if Lustre file system
500 is used to store medical records on a system dedicated to Life Sciences,
501 data privacy is crucial. Administrators may need to know which doctors
502 accessed, or tried to access, a given medical record and when. And
503 conversely, they might need to know which medical records a given doctor
505 <para>To enable all changelog entry types, do:</para>
506 <screen>mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL
507 mdd.seb-MDT0000.changelog_mask=ALL</screen>
508 <para>Once all required record types have been enabled, just register a
509 Changelogs user and the audit facility is operational.</para>
510 <para>Note that, however, it is possible to control which Lustre client
511 nodes can trigger the recording of file system access events to the
512 Changelogs, thanks to the <literal>audit_mode</literal> flag on nodemap
513 entries. The reason to disable audit on a per-nodemap basis is to
514 prevent some nodes (e.g. backup, HSM agent nodes) from flooding the
515 audit logs. When <literal>audit_mode</literal> flag is
516 set to 1 on a nodemap entry, a client pertaining to this nodemap will be
517 able to record file system access events to the Changelogs, if
518 Changelogs are otherwise activated. When set to 0, events are not logged
519 into the Changelogs, no matter if Changelogs are activated or not. By
520 default, <literal>audit_mode</literal> flag is set to 1 in newly created
521 nodemap entries. And it is also set to 1 in 'default' nodemap.</para>
522 <para>To prevent nodes pertaining to a nodemap to generate Changelog
525 mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0</screen>
528 <title>Audit examples</title>
531 <literal>OPEN</literal>
533 <para>An OPEN changelog entry is in the form:</para>
535 7 10OPEN 13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \
536 ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w-</screen>
537 <para>It includes information about the open mode, in the form
539 <para>OPEN entries are recorded only once per UID/GID, for a given
540 open mode, as long as the file is not closed by this UID/GID. It
541 avoids flooding the Changelogs for instance if there is an MPI job
542 opening the same file thousands of times from different threads. It
543 reduces the ChangeLog load significantly, without significantly
544 affecting the audit information. Similarly, only the last CLOSE per
545 UID/GID is recorded.</para>
549 <literal>GETXATTR</literal>
551 <para>A GETXATTR changelog entry is in the form:</para>
553 8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \
554 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0</screen>
555 <para>It includes information about the name of the extended attribute
556 being accessed, in the form <literal>x=<xattr name></literal>.
561 <literal>SETXATTR</literal>
563 <para>A SETXATTR changelog entry is in the form:</para>
565 4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \
566 ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0</screen>
567 <para>It includes information about the name of the extended attribute
568 being modified, in the form <literal>x=<xattr name></literal>.
573 <literal>DENIED OPEN</literal>
575 <para>A DENIED OPEN changelog entry is in the form:</para>
577 4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \
578 ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w-</screen>
579 <para>It has the same information as a regular OPEN entry. In order to
580 avoid flooding the Changelogs, DENIED OPEN entries are rate limited:
581 no more than one entry per user per file per time interval, this time
582 interval (in seconds) being configurable via
583 <literal>mdd.<mdtname>.changelog_deniednext</literal>
584 (default value is 60 seconds).</para>
586 mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120
587 mdd.seb-MDT0000.changelog_deniednext=120
588 mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext
589 mdd.seb-MDT0000.changelog_deniednext=120</screen>
594 <section xml:id="dbdoclet.jobstats">
595 <title><indexterm><primary>jobstats</primary><see>monitoring</see></indexterm>
596 <indexterm><primary>monitoring</primary></indexterm>
597 <indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
599 Lustre Jobstats</title>
600 <para>The Lustre jobstats feature collects file system operation statistics
601 for user processes running on Lustre clients, and exposes on the server
602 using the unique Job Identifier (JobID) provided by the job scheduler for
603 each job. Job schedulers known to be able to work with jobstats include:
604 SLURM, SGE, LSF, Loadleveler, PBS and Maui/MOAB.</para>
605 <para>Since jobstats is implemented in a scheduler-agnostic manner, it is
606 likely that it will be able to work with other schedulers also, and also
607 in environments that do not use a job scheduler, by storing custom format
608 strings in the <literal>jobid_name</literal>.</para>
610 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
611 How Jobstats Works</title>
612 <para>The Lustre jobstats code on the client extracts the unique JobID
613 from an environment variable within the user process, and sends this
614 JobID to the server with the I/O operation. The server tracks
615 statistics for operations whose JobID is given, indexed by that
618 <para>A Lustre setting on the client, <literal>jobid_var</literal>,
619 specifies which environment variable to holds the JobID for that process
620 Any environment variable can be specified. For example, SLURM sets the
621 <literal>SLURM_JOB_ID</literal> environment variable with the unique
622 job ID on each client when the job is first launched on a node, and
623 the <literal>SLURM_JOB_ID</literal> will be inherited by all child
624 processes started below that process.</para>
626 <para>Lustre can be configured to generate a synthetic JobID from
627 the client's process name and numeric UID, by setting
628 <literal>jobid_var=procname_uid</literal>. This will generate a
629 uniform JobID when running the same binary across multiple client
630 nodes, but cannot distinguish whether the binary is part of a single
631 distributed process or multiple independent processes.
634 <para condition="l28">In Lustre 2.8 and later it is possible to set
635 <literal>jobid_var=nodelocal</literal> and then also set
636 <literal>jobid_name=</literal><replaceable>name</replaceable>, which
637 <emphasis>all</emphasis> processes on that client node will use. This
638 is useful if only a single job is run on a client at one time, but if
639 multiple jobs are run on a client concurrently, the per-session JobID
643 <para condition="l2C">In Lustre 2.12 and later, it is possible to
644 specify more complex JobID values for <literal>jobid_name</literal>
645 by using a string that contains format codes that are evaluated for
646 each process, in order to generate a site- or node-specific JobID string.
650 <para><emphasis>%e</emphasis> print executable name</para>
653 <para><emphasis>%g</emphasis> print group ID number</para>
656 <para><emphasis>%h</emphasis> print hostname</para>
659 <para><emphasis>%j</emphasis> print JobID from process environment
660 variable named by the <emphasis>jobid_var</emphasis> parameter
664 <para><emphasis>%p</emphasis> print numeric process ID</para>
667 <para><emphasis>%u</emphasis> print user ID number</para>
671 <para condition="l2D">In Lustre 2.13 and later, it is possible to
672 set a per-session JobID by setting the
673 <literal>jobid_this_session</literal> parameter. This will be
674 inherited by all processes that are started in this login session,
675 but there can be a different JobID for each login session.
678 <para>The setting of <literal>jobid_var</literal> need not be the same
679 on all clients. For example, one could use
680 <literal>SLURM_JOB_ID</literal> on all clients managed by SLURM, and
681 use <literal>procname_uid</literal> on clients not managed by SLURM,
682 such as interactive login nodes.</para>
684 <para>It is not possible to have different
685 <literal>jobid_var</literal> settings on a single node, since it is
686 unlikely that multiple job schedulers are active on one client.
687 However, the actual JobID value is local to each process environment
688 and it is possible for multiple jobs with different JobIDs to be
689 active on a single client at one time.</para>
693 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
694 Enable/Disable Jobstats</title>
695 <para>Jobstats are disabled by default. The current state of jobstats
696 can be verified by checking <literal>lctl get_param jobid_var</literal>
699 $ lctl get_param jobid_var
703 To enable jobstats on the <literal>testfs</literal> file system with SLURM:</para>
704 <screen># lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID</screen>
705 <para>The <literal>lctl conf_param</literal> command to enable or disable
706 jobstats should be run on the MGS as root. The change is persistent, and
707 will be propagated to the MDS, OSS, and client nodes automatically when
708 it is set on the MGS and for each new client mount.</para>
709 <para>To temporarily enable jobstats on a client, or to use a different
710 jobid_var on a subset of nodes, such as nodes in a remote cluster that
711 use a different job scheduler, or interactive login nodes that do not
712 use a job scheduler at all, run the <literal>lctl set_param</literal>
713 command directly on the client node(s) after the filesystem is mounted.
714 For example, to enable the <literal>procname_uid</literal> synthetic
715 JobID on a login node run:
716 <screen># lctl set_param jobid_var=procname_uid</screen>
717 The <literal>lctl set_param</literal> setting is not persistent, and will
718 be reset if the global <literal>jobid_var</literal> is set on the MGS or
719 if the filesystem is unmounted.</para>
720 <para>The following table shows the environment variables which are set
721 by various job schedulers. Set <literal>jobid_var</literal> to the value
722 for your job scheduler to collect statistics on a per job basis.</para>
723 <informaltable frame="all">
725 <colspec colname="c1" colwidth="50*"/>
726 <colspec colname="c2" colwidth="50*"/>
730 <para><emphasis role="bold">Job Scheduler</emphasis></para>
733 <para><emphasis role="bold">Environment Variable</emphasis></para>
740 <para>Simple Linux Utility for Resource Management (SLURM)</para>
743 <para>SLURM_JOB_ID</para>
748 <para>Sun Grid Engine (SGE)</para>
756 <para>Load Sharing Facility (LSF)</para>
759 <para>LSB_JOBID</para>
764 <para>Loadleveler</para>
767 <para>LOADL_STEP_ID</para>
772 <para>Portable Batch Scheduler (PBS)/MAUI</para>
775 <para>PBS_JOBID</para>
780 <para>Cray Application Level Placement Scheduler (ALPS)</para>
783 <para>ALPS_APP_ID</para>
789 <para>There are two special values for <literal>jobid_var</literal>:
790 <literal>disable</literal> and <literal>procname_uid</literal>. To disable
791 jobstats, specify <literal>jobid_var</literal> as <literal>disable</literal>:</para>
792 <screen># lctl conf_param testfs.sys.jobid_var=disable</screen>
793 <para>To track job stats per process name and user ID (for debugging, or
794 if no job scheduler is in use on some nodes such as login nodes), specify
795 <literal>jobid_var</literal> as <literal>procname_uid</literal>:</para>
796 <screen># lctl conf_param testfs.sys.jobid_var=procname_uid</screen>
799 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
800 Check Job Stats</title>
801 <para>Metadata operation statistics are collected on MDTs. These statistics can be accessed for
802 all file systems and all jobs on the MDT via the <literal>lctl get_param
803 mdt.*.job_stats</literal>. For example, clients running with
804 <literal>jobid_var=procname_uid</literal>:</para>
806 # lctl get_param mdt.*.job_stats
809 snapshot_time: 1352084992
810 open: { samples: 2, unit: reqs }
811 close: { samples: 2, unit: reqs }
812 mknod: { samples: 0, unit: reqs }
813 link: { samples: 0, unit: reqs }
814 unlink: { samples: 0, unit: reqs }
815 mkdir: { samples: 0, unit: reqs }
816 rmdir: { samples: 0, unit: reqs }
817 rename: { samples: 0, unit: reqs }
818 getattr: { samples: 3, unit: reqs }
819 setattr: { samples: 0, unit: reqs }
820 getxattr: { samples: 0, unit: reqs }
821 setxattr: { samples: 0, unit: reqs }
822 statfs: { samples: 0, unit: reqs }
823 sync: { samples: 0, unit: reqs }
824 samedir_rename: { samples: 0, unit: reqs }
825 crossdir_rename: { samples: 0, unit: reqs }
826 - job_id: mythbackend.0
827 snapshot_time: 1352084996
828 open: { samples: 72, unit: reqs }
829 close: { samples: 73, unit: reqs }
830 mknod: { samples: 0, unit: reqs }
831 link: { samples: 0, unit: reqs }
832 unlink: { samples: 22, unit: reqs }
833 mkdir: { samples: 0, unit: reqs }
834 rmdir: { samples: 0, unit: reqs }
835 rename: { samples: 0, unit: reqs }
836 getattr: { samples: 778, unit: reqs }
837 setattr: { samples: 22, unit: reqs }
838 getxattr: { samples: 0, unit: reqs }
839 setxattr: { samples: 0, unit: reqs }
840 statfs: { samples: 19840, unit: reqs }
841 sync: { samples: 33190, unit: reqs }
842 samedir_rename: { samples: 0, unit: reqs }
843 crossdir_rename: { samples: 0, unit: reqs }
845 <para>Data operation statistics are collected on OSTs. Data operations
846 statistics can be accessed via
847 <literal>lctl get_param obdfilter.*.job_stats</literal>, for example:</para>
849 $ lctl get_param obdfilter.*.job_stats
850 obdfilter.myth-OST0000.job_stats=
852 - job_id: mythcommflag.0
853 snapshot_time: 1429714922
854 read: { samples: 974, unit: bytes, min: 4096, max: 1048576, sum: 91530035 }
855 write: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
856 setattr: { samples: 0, unit: reqs }
857 punch: { samples: 0, unit: reqs }
858 sync: { samples: 0, unit: reqs }
859 obdfilter.myth-OST0001.job_stats=
861 - job_id: mythbackend.0
862 snapshot_time: 1429715270
863 read: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
864 write: { samples: 1, unit: bytes, min: 96899, max: 96899, sum: 96899 }
865 setattr: { samples: 0, unit: reqs }
866 punch: { samples: 1, unit: reqs }
867 sync: { samples: 0, unit: reqs }
868 obdfilter.myth-OST0002.job_stats=job_stats:
869 obdfilter.myth-OST0003.job_stats=job_stats:
870 obdfilter.myth-OST0004.job_stats=
872 - job_id: mythfrontend.500
873 snapshot_time: 1429692083
874 read: { samples: 9, unit: bytes, min: 16384, max: 1048576, sum: 4444160 }
875 write: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
876 setattr: { samples: 0, unit: reqs }
877 punch: { samples: 0, unit: reqs }
878 sync: { samples: 0, unit: reqs }
879 - job_id: mythbackend.500
880 snapshot_time: 1429692129
881 read: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
882 write: { samples: 1, unit: bytes, min: 56231, max: 56231, sum: 56231 }
883 setattr: { samples: 0, unit: reqs }
884 punch: { samples: 1, unit: reqs }
885 sync: { samples: 0, unit: reqs }
889 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
890 Clear Job Stats</title>
891 <para>Accumulated job statistics can be reset by writing proc file <literal>job_stats</literal>.</para>
892 <para>Clear statistics for all jobs on the local node:</para>
893 <screen># lctl set_param obdfilter.*.job_stats=clear</screen>
894 <para>Clear statistics only for job 'bash.0' on lustre-MDT0000:</para>
895 <screen># lctl set_param mdt.lustre-MDT0000.job_stats=bash.0</screen>
898 <title><indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm>
899 Configure Auto-cleanup Interval</title>
900 <para>By default, if a job is inactive for 600 seconds (10 minutes) statistics for this job will be dropped. This expiration value can be changed temporarily via:</para>
901 <screen># lctl set_param *.*.job_cleanup_interval={max_age}</screen>
902 <para>It can also be changed permanently, for example to 700 seconds via:</para>
903 <screen># lctl conf_param testfs.mdt.job_cleanup_interval=700</screen>
904 <para>The <literal>job_cleanup_interval</literal> can be set as 0 to disable the auto-cleanup. Note that if auto-cleanup of Jobstats is disabled, then all statistics will be kept in memory forever, which may eventually consume all memory on the servers. In this case, any monitoring tool should explicitly clear individual job statistics as they are processed, as shown above.</para>
907 <section xml:id="dbdoclet.50438273_81684">
909 <primary>monitoring</primary>
910 <secondary>Lustre Monitoring Tool</secondary>
911 </indexterm> Lustre Monitoring Tool (LMT)</title>
912 <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed system that provides a
913 <literal>top</literal>-like display of activity on server-side nodes (MDS, OSS and portals
914 routers) on one or more Lustre file systems. It does not provide support for monitoring
915 clients. For more information on LMT, including the setup procedure, see:</para>
916 <para><link xl:href="http://code.google.com/p/lmt/"
917 >https://github.com/chaos/lmt/wiki</link></para>
918 <para>LMT questions can be directed to:</para>
919 <para><link xl:href="mailto:lmt-discuss@googlegroups.com">lmt-discuss@googlegroups.com</link></para>
921 <section xml:id="dbdoclet.50438273_80593">
923 <literal>CollectL</literal>
925 <para><literal>CollectL</literal> is another tool that can be used to monitor a Lustre file
926 system. You can run <literal>CollectL</literal> on a Lustre system that has any combination of
927 MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and
928 played back at a later time. It can also be converted to a format suitable for
930 <para>For more information about <literal>CollectL</literal>, see:</para>
931 <para><link xl:href="http://collectl.sourceforge.net">http://collectl.sourceforge.net</link></para>
932 <para>Lustre-specific documentation is also available. See:</para>
933 <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
935 <section xml:id="dbdoclet.50438273_44185">
936 <title><indexterm><primary>monitoring</primary><secondary>additional tools</secondary></indexterm>
937 Other Monitoring Options</title>
938 <para>A variety of standard tools are available publicly including the following:<itemizedlist>
940 <para><literal>lltop</literal> - Lustre load monitor with batch scheduler integration.
941 <link xmlns:xlink="http://www.w3.org/1999/xlink"
942 xlink:href="https://github.com/jhammond/lltop"
943 >https://github.com/jhammond/lltop</link></para>
946 <para><literal>tacc_stats</literal> - A job-oriented system monitor, analyzation, and
947 visualization tool that probes Lustre interfaces and collects statistics. <link
948 xmlns:xlink="http://www.w3.org/1999/xlink"
949 xlink:href="https://github.com/jhammond/tacc_stats"/></para>
952 <para><literal>xltop</literal> - A continuous Lustre monitor with batch scheduler
953 integration. <link xmlns:xlink="http://www.w3.org/1999/xlink"
954 xlink:href="https://github.com/jhammond/xltop"/></para>
956 </itemizedlist></para>
957 <para>Another option is to script a simple monitoring solution that looks at various reports
958 from <literal>ipconfig</literal>, as well as the <literal>procfs</literal> files generated by
959 the Lustre software.</para>
963 vim:expandtab:shiftwidth=2:tabstop=8: