From e66c06f5b663e49f6e1fbde96a4138eb9521685a Mon Sep 17 00:00:00 2001 From: Sebastien Buisson Date: Wed, 10 Jan 2018 14:41:34 +0100 Subject: [PATCH] LUDOC-391 audit: doc for Lustre Audit with Changelogs This patch adds the documentation for the Lustre Audit with Changelogs feature created in LU-9727. This doc is added under the Lustre Monitoring section, along with the Changelogs documentation. Signed-off-by: Sebastien Buisson Change-Id: Ief72f277f148d226b462f18a06efe59e832fc065 Reviewed-on: https://review.whamcloud.com/30821 Tested-by: Jenkins Reviewed-by: James Nunez Reviewed-by: Joseph Gmitter --- LustreMonitoring.xml | 401 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 332 insertions(+), 69 deletions(-) diff --git a/LustreMonitoring.xml b/LustreMonitoring.xml index 939e2c9..89af577 100644 --- a/LustreMonitoring.xml +++ b/LustreMonitoring.xml @@ -25,19 +25,26 @@ monitoringchange logs Lustre Changelogs - The changelogs feature records events that change the file system namespace or file metadata. Changes such as file creation, deletion, renaming, attribute changes, etc. are recorded with the target and parent file identifiers (FIDs), the name of the target, and a timestamp. These records can be used for a variety of purposes: + The changelogs feature records events that change the file system + namespace or file metadata. Changes such as file creation, deletion, + renaming, attribute changes, etc. are recorded with the target and parent + file identifiers (FIDs), the name of the target, a timestamp, and user + information. These records can be used for a variety of purposes: Capture recent changes to feed into an archiving system. - Use changelog entries to exactly replicate changes in a file system mirror. + Use changelog entries to exactly replicate changes in a file + system mirror. - Set up "watch scripts" that take action on certain events or directories. + Set up "watch scripts" that take action on certain + events or directories. - Maintain a rough audit trail (file/directory changes with timestamps, but no user information). + Audit activity on Lustre, thanks to user information associated to + file/directory changes with timestamps. Changelogs record types are: @@ -122,7 +129,7 @@ Lustre Changelogs - RNMFM + RENME Rename, original @@ -138,10 +145,26 @@ Lustre Changelogs - IOCTL + OPEN * - ioctl on file or directory + Open + + + + + CLOSE + + + Close + + + + + LYOUT + + + Layout change @@ -165,90 +188,180 @@ Lustre Changelogs XATTR - Extended attribute change + Extended attribute change (setxattr) + + + + + HSM + + + HSM specific event + + + + + MTIME + + + MTIME change + + + + + CTIME + + + CTIME change + + + + + ATIME * + + + ATIME change + + + + + MIGRT + + + Migration event + + + + + FLRW + + + File Level Replication: file initially written + + + + + RESYNC + + + File Level Replication: file re-synced + + + + + GXATR * + + + Extended attribute access (getxattr) - UNKNW + NOPEN * - Unknown operation + Denied open - FID-to-full-pathname and pathname-to-FID functions are also included to map target and parent FIDs into the file system namespace. + Event types marked with * are not recorded by default. Refer to + for instructions on + modifying the Changelogs mask. + FID-to-full-pathname and pathname-to-FID functions are also included + to map target and parent FIDs into the file system namespace.
- <indexterm><primary>monitoring</primary><secondary>change logs</secondary></indexterm> + <title><indexterm><primary>monitoring</primary><secondary>change logs + </secondary></indexterm> Working with Changelogs Several commands are available to work with changelogs.
<literal>lctl changelog_register</literal> - Because changelog records take up space on the MDT, the system administration must register changelog users. The registrants specify which records they are "done with", and the system purges up to the greatest common record. - To register a new changelog user, run: - lctl --device fsname-MDTnumber changelog_register + Because changelog records take up space on the MDT, the system + administration must register changelog users. As soon as a changelog + user is registered, the Changelogs feature is enabled. The registrants + specify which records they are "done with", and the system + purges up to the greatest common record. + To register a new changelog user, run: + mds# lctl --device fsname-MDTnumber changelog_register - Changelog entries are not purged beyond a registered user's set point (see lfs changelog_clear). + Changelog entries are not purged beyond a registered user's + set point (see lfs changelog_clear).
<literal>lfs changelog</literal> - To display the metadata changes on an MDT (the changelog records), run: + To display the metadata changes on an MDT (the changelog records), + run: lfs changelog fsname-MDTnumber [startrec [endrec]] - It is optional whether to specify the start and end records. + It is optional whether to specify the start and end + records. These are sample changelog records: - 2 02MKDIR 4298396676 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\ - pics -3 01CREAT 4298402264 0x0 t=[0x200000405:0x15fa:0x0] p=[0x200000405:0x15f9:0\ -x0] chloe.jpg -4 06UNLNK 4298404466 0x0 t=[0x200000405:0x15fa:0x0] p=[0x200000405:0x15f9:0\ -x0] chloe.jpg -5 07RMDIR 4298405394 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\ - pics + 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg +3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg +4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
<literal>lfs changelog_clear</literal> - To clear old changelog records for a specific user (records that the user no longer needs), run: + To clear old changelog records for a specific user (records that + the user no longer needs), run: lfs changelog_clear mdt_name userid endrec - The changelog_clear command indicates that changelog records previous to endrec are no longer of interest to a particular user userid, potentially allowing the MDT to free up disk space. An endrec value of 0 indicates the current last record. To run changelog_clear, the changelog user must be registered on the MDT node using lctl. - When all changelog users are done with records < X, the records are deleted. + The changelog_clear command indicates that + changelog records previous to endrec are no + longer of interest to a particular user + userid, potentially allowing the MDT to free + up disk space. An endrec + value of 0 indicates the current last record. To run + changelog_clear, the changelog user must be + registered on the MDT node using lctl. + When all changelog users are done with records < X, the records + are deleted.
<literal>lctl changelog_deregister</literal> To deregister (unregister) a changelog user, run: - lctl --device mdt_device changelog_deregister userid - changelog_deregister cl1 effectively does a changelog_clear cl1 0 as it deregisters. + mds# lctl --device mdt_device changelog_deregister userid + changelog_deregister cl1 effectively does a + lfs changelog_clear cl1 0 as it deregisters.
Changelog Examples - This section provides examples of different changelog commands. + This section provides examples of different changelog + commands.
Registering a Changelog User - To register a new changelog user for a device (lustre-MDT0000): - # lctl --device lustre-MDT0000 changelog_register + To register a new changelog user for a device + (lustre-MDT0000): + mds# lctl --device lustre-MDT0000 changelog_register lustre-MDT0000: Registered changelog userid 'cl1'
Displaying Changelog Records - To display changelog records on an MDT (lustre-MDT0000): + To display changelog records on an MDT + (lustre-MDT0000): $ lfs changelog lustre-MDT0000 -1 00MARK 19:08:20.890432813 2010.03.24 0x0 t=[0x10001:0x0:0x0] p=[0:0x0:0x\ -0] mdd_obd-lustre-MDT0000-0 -2 02MKDIR 19:10:21.509659173 2010.03.24 0x0 t=[0x200000420:0x3:0x0] p=[0x61\ -b4:0xca2c7dde:0x0] mydir -3 14SATTR 19:10:27.329356533 2010.03.24 0x0 t=[0x200000420:0x3:0x0] -4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\ -0000420:0x3:0x0] hosts +1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg +3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg +4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics Changelog records include this information: rec# operation_type(numerical/text) @@ -256,72 +369,222 @@ timestamp datestamp flags t=target_FID +ef=extended_flags +u=uid:gid +nid=client_NID p=parent_FID target_name Displayed in this format: rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \ -p=parent_FID target_name +ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name For example: - 4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\ -0000420:0x3:0x0] hosts + 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg
Clearing Changelog Records - To notify a device that a specific user (cl1) no longer needs records (up to and including 3): + To notify a device that a specific user (cl1) + no longer needs records (up to and including 3): $ lfs changelog_clear lustre-MDT0000 cl1 3 - To confirm that the changelog_clear operation was successful, run lfs changelog; only records after id-3 are listed: + To confirm that the changelog_clear operation + was successful, run lfs changelog; only records after + id-3 are listed: $ lfs changelog lustre-MDT0000 -4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\ -0000420:0x3:0x0] hosts +4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
Deregistering a Changelog User - To deregister a changelog user (cl1) for a specific device (lustre-MDT0000): - # lctl --device lustre-MDT0000 changelog_deregister cl1 + To deregister a changelog user (cl1) for a + specific device (lustre-MDT0000): + mds# lctl --device lustre-MDT0000 changelog_deregister cl1 lustre-MDT0000: Deregistered changelog user 'cl1' - The deregistration operation clears all changelog records for the specified user (cl1). + The deregistration operation clears all changelog records for the + specified user (cl1). $ lfs changelog lustre-MDT0000 -5 00MARK 19:13:40.858292517 2010.03.24 0x0 t=[0x40001:0x0:0x0] p=[0:0x0:0x\ -0] mdd_obd-lustre-MDT0000-0 +5 00MARK 15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \ +u=500:500 nid=0@<0:0> p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0 - MARK records typically indicate changelog recording status changes. + MARK records typically indicate changelog recording status + changes.
Displaying the Changelog Index and Registered Users - To display the current, maximum changelog index and registered changelog users for a specific device (lustre-MDT0000): - # lctl get_param mdd.lustre-MDT0000.changelog_users + To display the current, maximum changelog index and registered + changelog users for a specific device + (lustre-MDT0000): + mds# lctl get_param mdd.lustre-MDT0000.changelog_users mdd.lustre-MDT0000.changelog_users=current index: 8 -ID index -cl2 8 +ID index (idle seconds) +cl2 8 (180)
Displaying the Changelog Mask - To show the current changelog mask on a specific device (lustre-MDT0000): - # lctl get_param mdd.lustre-MDT0000.changelog_mask + To show the current changelog mask on a specific device + (lustre-MDT0000): + mds# lctl get_param mdd.lustre-MDT0000.changelog_mask mdd.lustre-MDT0000.changelog_mask= -MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RNMFM RNMTO OPEN CLOSE IOCTL\ - TRUNC SATTR XATTR HSM +MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \ +TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
-
+
Setting the Changelog Mask - To set the current changelog mask on a specific device (lustre-MDT0000): - # lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK + To set the current changelog mask on a specific device + (lustre-MDT0000): + mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK mdd.lustre-MDT0000.changelog_mask=HLINK $ lfs changelog_clear lustre-MDT0000 cl1 0 $ mkdir /mnt/lustre/mydir/foo $ cp /etc/hosts /mnt/lustre/mydir/foo/file $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink - Only item types that are in the mask show up in the changelog. + Only item types that are in the mask show up in the + changelog. $ lfs changelog lustre-MDT0000 -9 03HLINK 19:19:35.171867477 2010.03.24 0x0 t=[0x200000420:0x6:0x0] p=[0x20\ -0000420:0x3:0x0] myhardlink +9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink + +
+
+
+ <indexterm><primary>audit</primary> + <secondary>change logs</secondary></indexterm> +Audit with Changelogs + A specific use case for Lustre Changelogs is audit. According to a + definition found on + Wikipedia, information technology audits are used to evaluate the + organization's ability to protect its information assets and to properly + dispense information to authorized parties. Basically, audit consists in + controlling that all data accesses made were done according to the access + control policy in place. And usually, this is done by analyzing access + logs. + Audit can be used as a proof of security in place. But Audit can + also be a requirement to comply with regulations. + Lustre Changelogs are a good mechanism for audit, because this is a + centralized facility, and it is designed to be transactional. Changelog + records contain all information necessary for auditing purposes: + + + ability to identify object of action thanks to file identifiers + (FIDs) and name of targets + + + ability to identify subject of action thanks to UID/GID and NID + information + + + ability to identify time of action thanks to timestamp + + +
+ Enabling Audit + To have a fully functional Changelogs-based audit facility, some + additional Changelog record types must be enabled, to be able to record + events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that + enabling these record types may have some performance impact. For + instance, recording OPEN and GETXATTR events generate writes in the + Changelog records for a read operation from a file-system + standpoint. + Being able to record events such as OPEN or DENIED OPEN is + important from an audit perspective. For instance, if Lustre file system + is used to store medical records on a system dedicated to Life Sciences, + data privacy is crucial. Administrators may need to know which doctors + accessed, or tried to access, a given medical record and when. And + conversely, they might need to know which medical records a given doctor + accessed. + To enable all changelog entry types, do: + mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL +mdd.seb-MDT0000.changelog_mask=ALL + Once all required record types have been enabled, just register a + Changelogs user and the audit facility is operational. + Note that, however, it is possible to control which Lustre client + nodes can trigger the recording of file system access events to the + Changelogs, thanks to the audit_mode flag on nodemap + entries. The reason to disable audit on a per-nodemap basis is to + prevent some nodes (e.g. backup, HSM agent nodes) from flooding the + audit logs. When audit_mode flag is + set to 1 on a nodemap entry, a client pertaining to this nodemap will be + able to record file system access events to the Changelogs, if + Changelogs are otherwise activated. When set to 0, events are not logged + into the Changelogs, no matter if Changelogs are activated or not. By + default, audit_mode flag is set to 1 in newly created + nodemap entries. And it is also set to 1 in 'default' nodemap. + To prevent nodes pertaining to a nodemap to generate Changelog + entries, do: + +mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0 +
+
+ Audit examples +
+ + <literal>OPEN</literal> + + An OPEN changelog entry is in the form: + +7 10OPEN 13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \ +ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w- + It includes information about the open mode, in the form + m=rwx. + OPEN entries are recorded only once per UID/GID, for a given + open mode, as long as the file is not closed by this UID/GID. It + avoids flooding the Changelogs for instance if there is an MPI job + opening the same file thousands of times from different threads. It + reduces the ChangeLog load significantly, without significantly + affecting the audit information. Similarly, only the last CLOSE per + UID/GID is recorded. +
+
+ + <literal>GETXATTR</literal> + + A GETXATTR changelog entry is in the form: + +8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \ +ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0 + It includes information about the name of the extended attribute + being accessed, in the form x=<xattr name>. + +
+
+ + <literal>SETXATTR</literal> + + A SETXATTR changelog entry is in the form: + +4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \ +ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0 + It includes information about the name of the extended attribute + being modified, in the form x=<xattr name>. + +
+
+ + <literal>DENIED OPEN</literal> + + A DENIED OPEN changelog entry is in the form: + +4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \ +ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w- + It has the same information as a regular OPEN entry. In order to + avoid flooding the Changelogs, DENIED OPEN entries are rate limited: + no more than one entry per user per file per time interval, this time + interval (in seconds) being configurable via + mdd.<mdtname>.changelog_deniednext + (default value is 60 seconds). + +mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120 +mdd.seb-MDT0000.changelog_deniednext=120 +mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext +mdd.seb-MDT0000.changelog_deniednext=120 +
-- 1.8.3.1