From: Andreas Dilger Date: Fri, 29 Sep 2023 00:42:10 +0000 (-0600) Subject: LU-7004 utils: prefer 'set_param -P' over 'conf_param' X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=19c2cce1fa5a52aa2536cc75833bc8c0d4593d7d;p=doc%2Fmanual.git LU-7004 utils: prefer 'set_param -P' over 'conf_param' Replace examples using "lctl conf_param" with "lctl set_param -P" since this has been available since Lustre 2.5 and is the easier interface to use. Signed-off-by: Andreas Dilger Change-Id: I9d534355d476a27d1bfba388a8269d3f4b3ebbe5 Reviewed-on: https://review.whamcloud.com/c/doc/manual/+/52553 Tested-by: jenkins --- diff --git a/DataOnMDT.xml b/DataOnMDT.xml index 330fd2e..de77c39 100644 --- a/DataOnMDT.xml +++ b/DataOnMDT.xml @@ -428,19 +428,25 @@ error: setstripe: create composite file '/mnt/lustre/dom' failed: Invalid argument
Persistent Set Command - To persistently set the value of the parameter, the - lctl conf_param command is used: - lctl conf_param <fsname>-MDT<index>.lod.dom_stripesize=<value> - + To persistently set the value of the parameter on a + specific MDT, the + lctl set_param -P command is used: + +lctl set_param -P lod.fsname-MDTindex.dom_stripesize=value + + This can also use a wildcard '*' for the + index to apply to all MDTs. +
Persistent Set Examples - The new value of the parameter is saved in config log - permanently: - mgs# lctl conf_param lustre-MDT0000.lod.dom_stripesize=512K + The new value of the parameter is saved in the MGS + parameters log permanently: + +mgs# lctl set_param -P lod.lustre-MDT0000.dom_stripesize=512K mds# lctl get_param -n lod.*MDT0000*.dom_stripesize -524288 - New settings are applied in few seconds and saved persistently in - server config. +524288 + + and are applied on the matching MDTs within a few seconds.
@@ -453,10 +459,10 @@ mds# lctl get_param -n lod.*MDT0000*.dom_stripesize dom disabledom Disable DoM - When lctl set_param or - lctl conf_param sets + When lctl set_param (whether with + -P or not) sets dom_stripesize to 0, DoM - component creation will be disabled on the selected server, and + component creation will be disabled on the specified server(s), and any new layouts with a specified DoM component will have that component removed from the file layout. Existing files and layouts with DoM components on that MDT are not changed. diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml index 01ed082..162e71a 100644 --- a/LustreMaintenance.xml +++ b/LustreMaintenance.xml @@ -161,9 +161,10 @@ Regenerating Lustre Configuration Logs If the MGS still holds any configuration logs, it may be - possible to dump these logs to save any parameters stored with - lctl conf_param by dumping the config logs on - the MGS and saving the output: + possible to dump these logs to save any parameters stored with + lctl conf_param by dumping the config logs on + the MGS and saving the output (once for each MDT and OST device): + mgs# lctl --device MGS llog_print fsname-client mgs# lctl --device MGS llog_print fsname-MDT0000 @@ -401,7 +402,7 @@ mds# mount –t lustre /dev/mdt4_blockdevice /mnt/mdt client# lfs mkdir -i 3 /mnt/testfs/new_dir_on_mdt3 client# lfs mkdir -i 4 /mnt/testfs/new_dir_on_mdt4 -client# lfs mkdir -c 4 /mnt/testfs/new_directory_striped_across_4_mdts +client# lfs mkdir -c 4 /mnt/testfs/project/new_large_dir_striped_over_4_mdts @@ -572,7 +573,8 @@ client$ lfs getstripe --mdt-index /mnt/lustre/local_dir0 located there, and migrating files on the OST would fail. - Do not use lctl conf_param to + Do not use lctl set_param -P or + lctl conf_param to deactivate the OST if it is still working, as this immediately and permanently deactivates it in the file system configuration on both the MDS and all clients. diff --git a/LustreMonitoring.xml b/LustreMonitoring.xml index 43814a5..3e17de1 100644 --- a/LustreMonitoring.xml +++ b/LustreMonitoring.xml @@ -284,61 +284,70 @@ Working with Changelogs lctl changelog_register Because changelog records take up space on the MDT, the system - administration must register changelog users. As soon as a changelog - user is registered, the Changelogs feature is enabled. The registrants - specify which records they are "done with", and the system - purges up to the greatest common record. - To register a new changelog user, run: - mds# lctl --device fsname-MDTnumber changelog_register + administration must register changelog users. As soon as a changelog + user is registered, the Changelogs feature is enabled. The registrants + specify which records they are "done with", and the system + purges up to the greatest common record. + To register a new changelog user, run: + +mds# lctl --device fsname-MDTnumber changelog_register Changelog entries are not purged beyond a registered user's - set point (see lfs changelog_clear). + set point (see lfs changelog_clear).
<literal>lfs changelog</literal> To display the metadata changes on an MDT (the changelog records), - run: - lfs changelog fsname-MDTnumber [startrec [endrec]] + run: + +client# lfs changelog fsname-MDTnumber [startrec [endrec]] + It is optional whether to specify the start and end - records. + records. These are sample changelog records: - 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \ + +1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] j=mkdir.500 ef=0xf \ u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] j=cp.500 ef=0xf \ u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] j=rm.500 ef=0xf \ u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] j=rmdir.500 ef=0xf \ -u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +
<literal>lfs changelog_clear</literal> To clear old changelog records for a specific user (records that - the user no longer needs), run: - lfs changelog_clear mdt_name userid endrec + the user no longer needs), run: + +client# lfs changelog_clear mdt_name userid endrec + The changelog_clear command indicates that - changelog records previous to endrec are no - longer of interest to a particular user - userid, potentially allowing the MDT to free - up disk space. An endrec - value of 0 indicates the current last record. To run - changelog_clear, the changelog user must be - registered on the MDT node using lctl. + changelog records previous to endrec are no + longer of interest to a particular user + userid, potentially allowing the MDT to free + up disk space. An endrec + value of 0 indicates the current last record. To run + changelog_clear, the changelog user must be + registered on the MDT node using lctl. When all changelog users are done with records < X, the records - are deleted. + are deleted.
<literal>lctl changelog_deregister</literal> To deregister (unregister) a changelog user, run: - mds# lctl --device mdt_device changelog_deregister userid + +mds# lctl --device mdt_device changelog_deregister userid + changelog_deregister cl1 effectively does a - lfs changelog_clear cl1 0 as it deregisters. + lfs changelog_clear cl1 0 as it deregisters.
@@ -348,15 +357,18 @@ u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics
Registering a Changelog User To register a new changelog user for a device - (lustre-MDT0000): - mds# lctl --device lustre-MDT0000 changelog_register -lustre-MDT0000: Registered changelog userid 'cl1' + (lustre-MDT0000): + +mds# lctl --device lustre-MDT0000 changelog_register +lustre-MDT0000: Registered changelog userid 'cl1' +
Displaying Changelog Records - To display changelog records on an MDT - (lustre-MDT0000): - $ lfs changelog lustre-MDT0000 + To display changelog records for an MDT + (e.g. lustre-MDT0000): + +client# lfs changelog lustre-MDT0000 1 02MKDIR 15:15:21.977666834 2018.01.09 0x0 t=[0x200000402:0x1:0x0] ef=0xf \ u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \ @@ -364,62 +376,68 @@ u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg 3 06UNLNK 15:15:41.305116815 2018.01.09 0x1 t=[0x200000402:0x2:0x0] ef=0xf \ u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \ -u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics + Changelog records include this information: - rec# -operation_type(numerical/text) -timestamp -datestamp -flags -t=target_FID -ef=extended_flags -u=uid:gid -nid=client_NID -p=parent_FID -target_name + +rec# operation_type(numerical/text) timestamp datestamp flags +t=target_FID ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name + Displayed in this format: - rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \ -ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name + +rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \ +ef=extended_flags u=uid:gid nid=client_NID p=parent_FID target_name + For example: - 2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \ -u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg + +2 01CREAT 15:15:36.687592024 2018.01.09 0x0 t=[0x200000402:0x2:0x0] ef=0xf \ +u=500:500 nid=10.128.11.159@tcp p=[0x200000402:0x1:0x0] chloe.jpg +
Clearing Changelog Records To notify a device that a specific user (cl1) - no longer needs records (up to and including 3): - $ lfs changelog_clear lustre-MDT0000 cl1 3 + no longer needs records (up to and including 3): + +# lfs changelog_clear lustre-MDT0000 cl1 3 + To confirm that the changelog_clear operation - was successful, run lfs changelog; only records after - id-3 are listed: - $ lfs changelog lustre-MDT0000 + was successful, run lfs changelog; only records after + id-3 are listed: + +# lfs changelog lustre-MDT0000 4 07RMDIR 15:15:46.468790091 2018.01.09 0x1 t=[0x200000402:0x1:0x0] ef=0xf \ -u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x1:0x0] pics +
Deregistering a Changelog User To deregister a changelog user (cl1) for a - specific device (lustre-MDT0000): - mds# lctl --device lustre-MDT0000 changelog_deregister cl1 -lustre-MDT0000: Deregistered changelog user 'cl1' + specific device (lustre-MDT0000): + +mds# lctl --device lustre-MDT0000 changelog_deregister cl1 +lustre-MDT0000: Deregistered changelog user 'cl1' + The deregistration operation clears all changelog records for the - specified user (cl1). - $ lfs changelog lustre-MDT0000 + specified user (cl1). + +client# lfs changelog lustre-MDT0000 5 00MARK 15:56:39.603643887 2018.01.09 0x0 t=[0x20001:0x0:0x0] ef=0xf \ u=500:500 nid=0@<0:0> p=[0:0x50:0xb] mdd_obd-lustre-MDT0000-0 MARK records typically indicate changelog recording status - changes. + changes.
Displaying the Changelog Index and Registered Users To display the current, maximum changelog index and registered - changelog users for a specific device - (lustre-MDT0000): - mds# lctl get_param mdd.lustre-MDT0000.changelog_users -mdd.lustre-MDT0000.changelog_users=current index: 8 + changelog users for a specific device + (lustre-MDT0000): + +mds# lctl get_param mdd.lustre-MDT0000.changelog_users +mdd.lustre-MDT0000.changelog_users=current index: 8 ID index (idle seconds) cl2 8 (180) @@ -427,8 +445,9 @@ cl2 8 (180)
Displaying the Changelog Mask To show the current changelog mask on a specific device - (lustre-MDT0000): - mds# lctl get_param mdd.lustre-MDT0000.changelog_mask + (lustre-MDT0000): + +mds# lctl get_param mdd.lustre-MDT0000.changelog_mask mdd.lustre-MDT0000.changelog_mask= MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT \ @@ -438,17 +457,19 @@ TRUNC SATTR XATTR HSM MTIME CTIME MIGRT
Setting the Changelog Mask To set the current changelog mask on a specific device - (lustre-MDT0000): - mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK -mdd.lustre-MDT0000.changelog_mask=HLINK -$ lfs changelog_clear lustre-MDT0000 cl1 0 + (lustre-MDT0000): + +mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK +mdd.lustre-MDT0000.changelog_mask=HLINK +$ lfs changelog_clear lustre-MDT0000 cl1 0 $ mkdir /mnt/lustre/mydir/foo $ cp /etc/hosts /mnt/lustre/mydir/foo/file $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink Only item types that are in the mask show up in the - changelog. - $ lfs changelog lustre-MDT0000 + changelog. + +# lfs changelog lustre-MDT0000 9 03HLINK 16:06:35.291636498 2018.01.09 0x0 t=[0x200000402:0x4:0x0] ef=0xf \ u=500:500 nid=10.128.11.159@tcp p=[0x200000007:0x3:0x0] myhardlink @@ -474,120 +495,128 @@ Audit with Changelogs centralized facility, and it is designed to be transactional. Changelog records contain all information necessary for auditing purposes: - + ability to identify object of action thanks to file identifiers - (FIDs) and name of targets - - - ability to identify subject of action thanks to UID/GID and NID - information - - - ability to identify time of action thanks to timestamp - + (FIDs) and name of targets + + + ability to identify subject of action thanks to UID/GID and NID + information + + + ability to identify time of action thanks to timestamp +
Enabling Audit - To have a fully functional Changelogs-based audit facility, some - additional Changelog record types must be enabled, to be able to record - events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that - enabling these record types may have some performance impact. For - instance, recording OPEN and GETXATTR events generate writes in the - Changelog records for a read operation from a file-system - standpoint. - Being able to record events such as OPEN or DENIED OPEN is - important from an audit perspective. For instance, if Lustre file system - is used to store medical records on a system dedicated to Life Sciences, - data privacy is crucial. Administrators may need to know which doctors - accessed, or tried to access, a given medical record and when. And - conversely, they might need to know which medical records a given doctor - accessed. - To enable all changelog entry types, do: - mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL -mdd.seb-MDT0000.changelog_mask=ALL - Once all required record types have been enabled, just register a - Changelogs user and the audit facility is operational. - Note that, however, it is possible to control which Lustre client - nodes can trigger the recording of file system access events to the - Changelogs, thanks to the audit_mode flag on nodemap - entries. The reason to disable audit on a per-nodemap basis is to - prevent some nodes (e.g. backup, HSM agent nodes) from flooding the - audit logs. When audit_mode flag is - set to 1 on a nodemap entry, a client pertaining to this nodemap will be - able to record file system access events to the Changelogs, if - Changelogs are otherwise activated. When set to 0, events are not logged - into the Changelogs, no matter if Changelogs are activated or not. By - default, audit_mode flag is set to 1 in newly created - nodemap entries. And it is also set to 1 in 'default' nodemap. - To prevent nodes pertaining to a nodemap to generate Changelog - entries, do: - -mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0 + To have a fully functional Changelogs-based audit facility, some + additional Changelog record types must be enabled, to be able to record + events such as OPEN, ATIME, GETXATTR and DENIED OPEN. Please note that + enabling these record types may have some performance impact. For + instance, recording OPEN and GETXATTR events generate writes in the + Changelog records for a read operation from a file-system + standpoint. + Being able to record events such as OPEN or DENIED OPEN is + important from an audit perspective. For instance, if Lustre file system + is used to store medical records on a system dedicated to Life Sciences, + data privacy is crucial. Administrators may need to know which doctors + accessed, or tried to access, a given medical record and when. And + conversely, they might need to know which medical records a given doctor + accessed. + To enable all changelog entry types, do: + +mds# lctl set_param mdd.lustre-MDT0000.changelog_mask=ALL +mdd.seb-MDT0000.changelog_mask=ALL + + Once all required record types have been enabled, just register a + Changelogs user and the audit facility is operational. + Note that, however, it is possible to control which Lustre client + nodes can trigger the recording of file system access events to the + Changelogs, thanks to the audit_mode flag on nodemap + entries. The reason to disable audit on a per-nodemap basis is to + prevent some nodes (e.g. backup, HSM agent nodes) from flooding the + audit logs. When audit_mode flag is + set to 1 on a nodemap entry, a client pertaining to this nodemap will be + able to record file system access events to the Changelogs, if + Changelogs are otherwise activated. When set to 0, events are not logged + into the Changelogs, no matter if Changelogs are activated or not. By + default, audit_mode flag is set to 1 in newly created + nodemap entries. And it is also set to 1 in 'default' nodemap. + To prevent nodes pertaining to a nodemap to generate Changelog + entries, do: + +mgs# lctl nodemap_modify --name nm1 --property audit_mode --value 0 +
Audit examples -
+
- <literal>OPEN</literal> - - An OPEN changelog entry is in the form: - + OPEN + + An OPEN changelog entry is in the form: + 7 10OPEN 13:38:51.510728296 2017.07.25 0x242 t=[0x200000401:0x2:0x0] \ -ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w- - It includes information about the open mode, in the form - m=rwx. - OPEN entries are recorded only once per UID/GID, for a given - open mode, as long as the file is not closed by this UID/GID. It - avoids flooding the Changelogs for instance if there is an MPI job - opening the same file thousands of times from different threads. It - reduces the ChangeLog load significantly, without significantly - affecting the audit information. Similarly, only the last CLOSE per - UID/GID is recorded. -
-
+ef=0x7 u=500:500 nid=10.128.11.159@tcp m=-w- + + It includes information about the open mode, in the form + m=rwx. + OPEN entries are recorded only once per UID/GID, for a given + open mode, as long as the file is not closed by this UID/GID. It + avoids flooding the Changelogs for instance if there is an MPI job + opening the same file thousands of times from different threads. It + reduces the ChangeLog load significantly, without significantly + affecting the audit information. Similarly, only the last CLOSE per + UID/GID is recorded. +
+
- <literal>GETXATTR</literal> - - A GETXATTR changelog entry is in the form: - + GETXATTR + + A GETXATTR changelog entry is in the form: + 8 23GXATR 09:22:55.886793012 2017.07.27 0x0 t=[0x200000402:0x1:0x0] \ -ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0 - It includes information about the name of the extended attribute - being accessed, in the form x=<xattr name>. - -
-
+ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0 + + It includes information about the name of the extended attribute + being accessed, in the form x=<xattr name>. + +
+
- <literal>SETXATTR</literal> - - A SETXATTR changelog entry is in the form: - + SETXATTR + + A SETXATTR changelog entry is in the form: + 4 15XATTR 09:41:36.157333594 2018.01.10 0x0 t=[0x200000402:0x1:0x0] \ -ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0 - It includes information about the name of the extended attribute - being modified, in the form x=<xattr name>. - -
-
+ef=0xf u=500:500 nid=10.128.11.159@tcp x=user.name0 + + It includes information about the name of the extended attribute + being modified, in the form x=<xattr name>. + +
+
- <literal>DENIED OPEN</literal> - - A DENIED OPEN changelog entry is in the form: - + DENIED OPEN + + A DENIED OPEN changelog entry is in the form: + 4 24NOPEN 15:45:44.947406626 2017.08.31 0x2 t=[0x200000402:0x1:0x0] \ -ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w- - It has the same information as a regular OPEN entry. In order to - avoid flooding the Changelogs, DENIED OPEN entries are rate limited: - no more than one entry per user per file per time interval, this time - interval (in seconds) being configurable via - mdd.<mdtname>.changelog_deniednext - (default value is 60 seconds). - +ef=0xf u=500:500 nid=10.128.11.158@tcp m=-w- + + It has the same information as a regular OPEN entry. In order to + avoid flooding the Changelogs, DENIED OPEN entries are rate limited: + no more than one entry per user per file per time interval, this time + interval (in seconds) being configurable via + mdd.<mdtname>.changelog_deniednext + (default value is 60 seconds). + mds# lctl set_param mdd.lustre-MDT0000.changelog_deniednext=120 mdd.seb-MDT0000.changelog_deniednext=120 mds# lctl get_param mdd.lustre-MDT0000.changelog_deniednext -mdd.seb-MDT0000.changelog_deniednext=120 -
+mdd.seb-MDT0000.changelog_deniednext=120 + +
@@ -614,7 +643,7 @@ Lustre Jobstats JobID to the server with the I/O operation. The server tracks statistics for operations whose JobID is given, indexed by that ID. - + A Lustre setting on the client, jobid_var, specifies which environment variable to holds the JobID for that process Any environment variable can be specified. For example, SLURM sets the @@ -622,7 +651,7 @@ Lustre Jobstats job ID on each client when the job is first launched on a node, and the SLURM_JOB_ID will be inherited by all child processes started below that process. - + Lustre can be configured to generate a synthetic JobID from the client's process name and numeric UID, by setting jobid_var=procname_uid. This will generate a @@ -630,7 +659,7 @@ Lustre Jobstats nodes, but cannot distinguish whether the binary is part of a single distributed process or multiple independent processes. - + In Lustre 2.8 and later it is possible to set jobid_var=nodelocal and then also set jobid_name=name, which @@ -683,7 +712,7 @@ Lustre Jobstats SLURM_JOB_ID on all clients managed by SLURM, and use procname_uid on clients not managed by SLURM, such as interactive login nodes. - + It is not possible to have different jobid_var settings on a single node, since it is unlikely that multiple job schedulers are active on one client. @@ -698,14 +727,16 @@ Enable/Disable Jobstats Jobstats are disabled by default. The current state of jobstats can be verified by checking lctl get_param jobid_var on a client: - -$ lctl get_param jobid_var + +clieht# lctl get_param jobid_var jobid_var=disable - + To enable jobstats on the testfs file system with SLURM: - # lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID - The lctl conf_param command to enable or disable + +mgs# lctl set_param -P jobid_var=SLURM_JOB_ID + + The lctl set_param command to enable or disable jobstats should be run on the MGS as root. The change is persistent, and will be propagated to the MDS, OSS, and client nodes automatically when it is set on the MGS and for each new client mount. @@ -715,8 +746,10 @@ jobid_var=disable use a job scheduler at all, run the lctl set_param command directly on the client node(s) after the filesystem is mounted. For example, to enable the procname_uid synthetic - JobID on a login node run: - # lctl set_param jobid_var=procname_uid + JobID locally on a login node run: + +client# lctl set_param jobid_var=procname_uid + The lctl set_param setting is not persistent, and will be reset if the global jobid_var is set on the MGS or if the filesystem is unmounted. @@ -792,11 +825,15 @@ jobid_var=disable There are two special values for jobid_var: disable and procname_uid. To disable jobstats, specify jobid_var as disable: - # lctl conf_param testfs.sys.jobid_var=disable + +mgs# lctl set_param -P jobid_var=disable + To track job stats per process name and user ID (for debugging, or if no job scheduler is in use on some nodes such as login nodes), specify jobid_var as procname_uid: - # lctl conf_param testfs.sys.jobid_var=procname_uid + +client# lctl set_param jobid_var=procname_uid +
<indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm> @@ -805,8 +842,8 @@ Check Job Stats all file systems and all jobs on the MDT via the lctl get_param mdt.*.job_stats. For example, clients running with jobid_var=procname_uid: - -# lctl get_param mdt.*.job_stats + +mds# lctl get_param mdt.*.job_stats job_stats: - job_id: bash.0 snapshot_time: 1352084992 @@ -844,12 +881,12 @@ job_stats: sync: { samples: 33190, unit: reqs } samedir_rename: { samples: 0, unit: reqs } crossdir_rename: { samples: 0, unit: reqs } - + Data operation statistics are collected on OSTs. Data operations statistics can be accessed via lctl get_param obdfilter.*.job_stats, for example: - -$ lctl get_param obdfilter.*.job_stats + +oss# lctl get_param obdfilter.*.job_stats obdfilter.myth-OST0000.job_stats= job_stats: - job_id: mythcommflag.0 @@ -886,24 +923,32 @@ job_stats: setattr: { samples: 0, unit: reqs } punch: { samples: 1, unit: reqs } sync: { samples: 0, unit: reqs } - +
<indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm> Clear Job Stats Accumulated job statistics can be reset by writing proc file job_stats. Clear statistics for all jobs on the local node: - # lctl set_param obdfilter.*.job_stats=clear + +oss# lctl set_param obdfilter.*.job_stats=clear + Clear statistics only for job 'bash.0' on lustre-MDT0000: - # lctl set_param mdt.lustre-MDT0000.job_stats=bash.0 + +mds# lctl set_param mdt.lustre-MDT0000.job_stats=bash.0 +
<indexterm><primary>monitoring</primary><secondary>jobstats</secondary></indexterm> Configure Auto-cleanup Interval By default, if a job is inactive for 600 seconds (10 minutes) statistics for this job will be dropped. This expiration value can be changed temporarily via: - # lctl set_param *.*.job_cleanup_interval={max_age} + +mds# lctl set_param *.*.job_cleanup_interval={max_age} + It can also be changed permanently, for example to 700 seconds via: - # lctl conf_param testfs.mdt.job_cleanup_interval=700 + +mgs# lctl set_param -P mdt.testfs-*.job_cleanup_interval=700 + The job_cleanup_interval can be set as 0 to disable the auto-cleanup. Note that if auto-cleanup of Jobstats is disabled, then all statistics will be kept in memory forever, which may eventually consume all memory on the servers. In this case, any monitoring tool should explicitly clear individual job statistics as they are processed, as shown above.
diff --git a/LustreOperations.xml b/LustreOperations.xml index 44d737f..a554b2e 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -18,24 +18,22 @@ The file system name is limited to 8 characters. We have encoded the file system and target information in the disk label, so you can mount by label. This allows system administrators to move disks around without - worrying about issues such as SCSI disk reordering or getting the + worrying about issues such as SCSI disk reordering or getting the /dev/device wrong for a shared target. Soon, file system naming will be made as fail-safe as possible. Currently, Linux disk labels are limited to 16 characters. To identify the target within the file system, 8 characters are reserved, leaving 8 characters for the file system name: - -fsname-MDT0000 or + +fsname-MDT0000 or fsname-OST0a19 To mount by label, use this command: - -mount -t lustre -L -file_system_label -/mount_point + +mount -t lustre -L file_system_label /mount_point This is an example of mount-by-label: - + mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt @@ -46,9 +44,8 @@ mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt Although the file system name is internally limited to 8 characters, you can mount the clients at any mount point, so file system users are not subjected to short names. Here is an example: - -client# mount -t lustre mds0@tcp0:/short -/dev/long_mountpoint_name + +client# mount -t lustre mds0@tcp0:/short /dev/long_mountpoint_name
@@ -88,20 +85,20 @@ client# mount -t lustre mds0@tcp0:/short mounting Mounting a Server Starting a Lustre server is straightforward and only involves the - mount command. Lustre servers can be added to - /etc/fstab: - + mount command. Lustre servers can be added to /etc/fstab: + + mount -t lustre The mount command generates output similar to this: - + /dev/sda1 on /mnt/test/mdt type lustre (rw) /dev/sda2 on /mnt/test/ost0 type lustre (rw) 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw) In this example, the MDT, an OST (ost0) and file system (testfs) are mounted. - + LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 @@ -110,22 +107,19 @@ LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 not using failover, make sure that networking has been started before mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Debian operating system (and perhaps others), use - the - _netdev flag to ensure that these disks are mounted after - the network is up, unless you are using systemd 232 or greater, which + the _netdev flag to ensure that these disks are mounted + after the network is up, unless you are using systemd 232 or greater, which recognize lustre as a network filesystem. If you are using lnet.service, use x-systemd.requires=lnet.service regardless of systemd version. We are mounting by disk label here. The label of a device can be read - with - e2label. The label of a newly-formatted Lustre server - may end in - FFFF if the - --index option is not specified to + with e2label. The label of a newly-formatted Lustre + server may end in FFFF if the + --index option is not specified to mkfs.lustre, meaning that it has yet to be assigned. The assignment takes place when the server is first started, and the disk label - is updated. It is recommended that the + is updated. It is recommended that the --index option always be used, which will also ensure that the label is set at format time. @@ -158,29 +152,38 @@ LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 umount -a -t lustre The example below shows the unmount of the testfs filesystem on a client node: - [root@client1 ~]# mount |grep testfs + + +[root@client1 ~]# mount -t lustre XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs) [root@client1 ~]# umount -a -t lustre -[154523.177714] Lustre: Unmounted testfs-client +[154523.177714] Lustre: Unmounted testfs-client + + - Unmount the MDT and MGT - On the MGS and MDS node(s), run the + + Unmount the MDT and MGT + On the MGS and MDS node(s), run the umount command: - umount -a -t lustre - The example below shows the unmount of the MDT and MGT for + umount -a -t lustre + The example below shows the unmount of the MDT and MGT for the testfs filesystem on a combined MGS/MDS: - - [root@mds1 ~]# mount |grep lustre + + + +[root@mds1 ~]# mount -t lustre /dev/sda on /mnt/mgt type lustre (ro) /dev/sdb on /mnt/mdt type lustre (ro) [root@mds1 ~]# umount -a -t lustre [155263.566230] Lustre: Failing over testfs-MDT0000 [155263.775355] Lustre: server umount testfs-MDT0000 complete -[155269.843862] Lustre: server umount MGS complete - For a seperate MGS and MDS, the same command is used, first on - the MDS and then followed by the MGS. +[155269.843862] Lustre: server umount MGS complete + + + For a seperate MGS and MDS, the same command is used, first on + the MDS and then followed by the MGS. Unmount all the OSTs On each OSS node, use the umount command: @@ -190,14 +193,18 @@ XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs) testfs filesystem on server OSS1: - [root@oss1 ~]# mount |grep lustre + + +[root@oss1 ~]# mount |grep lustre /dev/sda on /mnt/ost0 type lustre (ro) /dev/sdb on /mnt/ost1 type lustre (ro) /dev/sdc on /mnt/ost2 type lustre (ro) [root@oss1 ~]# umount -a -t lustre -[155336.491445] Lustre: Failing over testfs-OST0002 -[155336.556752] Lustre: server umount testfs-OST0002 complete +Lustre: Failing over testfs-OST0002 +Lustre: server umount testfs-OST0002 complete + + For unmount command syntax for a single OST, MDT, or MGT target @@ -210,15 +217,17 @@ XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs) unmounting Unmounting a Specific Target on a Server To stop a Lustre OST, MDT, or MGT , use the - umount + umount /mount_point command. The example below stops an OST, ost0, on mount point /mnt/ost0 for the testfs filesystem: - [root@oss1 ~]# umount /mnt/ost0 -[ 385.142264] Lustre: Failing over testfs-OST0000 -[ 385.210810] Lustre: server umount testfs-OST0000 complete - Gracefully stopping a server with the + +[root@oss1 ~]# umount /mnt/ost0 +Lustre: Failing over testfs-OST0000 +Lustre: server umount testfs-OST0000 complete + + Gracefully stopping a server with the umount command preserves the state of the connected clients. The next time the server is started, it waits for clients to reconnect, and then goes through the recovery procedure. @@ -228,7 +237,7 @@ XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs) recovery. Any currently connected clients receive I/O errors until they reconnect. - If you are using loopback devices, use the + If you are using loopback devices, use the -d flag. This flag cleans up loop devices and can always be safely specified. @@ -244,53 +253,44 @@ XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs) of two ways: - In - failout mode, Lustre clients immediately receive - errors (EIOs) after a timeout, instead of waiting for the OST to - recover. + In failout mode, Lustre clients immediately + receive errors (EIOs) after a timeout, instead of waiting for the OST + to recover. - In - failover mode, Lustre clients wait for the OST to - recover. + In failover mode, Lustre clients wait for the + OST to recover. - By default, the Lustre file system uses - failover mode for OSTs. To specify - failout mode instead, use the + By default, the Lustre file system uses + failover mode for OSTs. To specify + failout mode instead, use the --param="failover.mode=failout" option as shown below (entered on one line): - -oss# mkfs.lustre --fsname= -fsname --mgsnode= -mgs_NID --param=failover.mode=failout - --ost --index= -ost_index -/dev/ost_block_device - - In the example below, - failout mode is specified for the OSTs on the MGS - mds0 in the file system + +oss# mkfs.lustre --fsname=fsname --mgsnode=mgs_NID \ + --param=failover.mode=failout --ost --index=ost_index /dev/ost_block_device + + In the example below, + failout mode is specified for the OSTs on the MGS + mds0 in the file system testfs(entered on one line). - -oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout - --ost --index=3 /dev/sdb + +oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout \ + --ost --index=3 /dev/sdb Before running this command, unmount all OSTs that will be affected - by a change in - failover/ - failout mode. + by a change in failover/failout mode. + - After initial file system configuration, use the + After initial file system configuration, use the tunefs.lustre utility to change the mode. For example, - to set the - failout mode, run: + to set the failout mode, run: - -$ tunefs.lustre --param failover.mode=failout -/dev/ost_device + +# tunefs.lustre --param failover.mode=failout /dev/ost_device @@ -308,23 +308,23 @@ $ tunefs.lustre --param failover.mode=failout avoid a global performance slowdown due to a degraded OST, the MDS can avoid the OST for new object allocation if it is notified of the degraded state. - A parameter for each OST, called + A parameter for each OST, called degraded, specifies whether the OST is running in degraded mode or not. To mark the OST as degraded, use: - -lctl set_param obdfilter.{OST_name}.degraded=1 + +oss# lctl set_param obdfilter.{OST_name}.degraded=1 To mark that the OST is back in normal operation, use: - -lctl set_param obdfilter.{OST_name}.degraded=0 + +oss# lctl set_param obdfilter.{OST_name}.degraded=0 To determine if OSTs are currently in degraded mode, use: - -lctl get_param obdfilter.*.degraded + +oss# lctl get_param obdfilter.*.degraded If the OST is remounted due to a reboot or other condition, the flag - resets to + resets to 0. It is recommended that this be implemented by an automated script that monitors the status of individual RAID devices, such as MD-RAID's @@ -337,9 +337,9 @@ lctl get_param obdfilter.*.degraded operations multiple file systems Running Multiple Lustre File Systems - Lustre supports multiple file systems provided the combination of + Lustre supports multiple file systems provided the combination of NID:fsname is unique. Each file system must be allocated - a unique name during creation with the + a unique name during creation with the --fsname parameter. Unique names for file systems are enforced if a single MGS is present. If multiple MGSs are present (for example if you have an MGS on every MDS) the administrator is responsible @@ -353,58 +353,52 @@ lctl get_param obdfilter.*.degraded available. With multiple MGSs additional care must be taken to ensure file system names are unique. Each file system should have a unique fsname among all systems that may interoperate in the future. - By default, the - mkfs.lustre command creates a file system named + By default, the + mkfs.lustre command creates a file system named lustre. To specify a different file system name (limited - to 8 characters) at format time, use the + to 8 characters) at format time, use the --fsname option: - -mkfs.lustre --fsname= -file_system_name + +oss# mkfs.lustre --fsname=file_system_name The MDT, OSTs and clients in the new file system must use the same file system name (prepended to the device name). For example, for a new - file system named - foo, the MDT and two OSTs would be named - foo-MDT0000, - foo-OST0000, and + file system named foo, the MDT and two OSTs would be + named foo-MDT0000, + foo-OST0000, and foo-OST0001. To mount a client on the file system, run: - -client# mount -t lustre -mgsnode: -/new_fsname -/mount_point + +client# mount -t lustre mgsnode:/new_fsname /mount_point For example, to mount a client on file system foo at mount point /mnt/foo, run: - + client# mount -t lustre mgsnode:/foo /mnt/foo If a client(s) will be mounted on several file systems, add the - following line to - /etc/xattr.conf file to avoid problems when files are - moved between the file systems: + following line to /etc/xattr.conf file to avoid + problems when files are moved between the file systems: lustre.* skip To ensure that a new MDT is added to an existing MGS create the MDT - by specifying: - --mdt --mgsnode= - mgs_NID. + by specifying: + --mdt --mgsnode=mgs_NID. + A Lustre installation with two file systems ( - foo and - bar) could look like this, where the MGS node is - mgsnode@tcp0 and the mount points are - /mnt/foo and + foo and + bar) could look like this, where the MGS node is + mgsnode@tcp0 and the mount points are + /mnt/foo and /mnt/bar. - + mgsnode# mkfs.lustre --mgs /dev/sda mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0 /dev/sdb @@ -419,14 +413,15 @@ ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 /dev/sdd - To mount a client on file system foo at mount point - /mnt/foo, run: - + To mount a client on file system foo at mount point + /mnt/foo, run: + + client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo - To mount a client on file system bar at mount point + To mount a client on file system bar at mount point /mnt/bar, run: - + client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
@@ -441,7 +436,7 @@ client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar a sub-directory on a given MDT use the command: -client# lfs mkdir -i mdt_index /mount_point/remote_dir +client$ lfs mkdir -i mdt_index /mount_point/remote_dir This command will allocate the sub-directory remote_dir onto the MDT with index @@ -456,12 +451,19 @@ client# lfs mkdir -i mdt_index /mount_po default it is only possible to create remote sub-directories off MDT0000. To relax this restriction and enable remote sub-directories off any MDT, an administrator must issue the following command on the MGS: - mgs# lctl conf_param fsname.mdt.enable_remote_dir=1 + +mgs# lctl set_param -P mdt.fsname-MDT*.enable_remote_dir=1 + For Lustre filesystem 'scratch', the command executed is: - mgs# lctl conf_param scratch.mdt.enable_remote_dir=1 + +mgs# lctl set_param -P mdt.scratch-*.enable_remote_dir=1 + To verify the configuration setting execute the following command on any MDS: - mds# lctl get_param mdt.*.enable_remote_dir + +mds# lctl get_param mdt.*.enable_remote_dir + + With Lustre software version 2.8, a new tunable is available to allow users with a specific group ID to create @@ -472,11 +474,17 @@ client# lfs mkdir -i mdt_index /mount_po parameter to -1 on MDT0000 to permanently allow any non-root users create and delete remote and striped directories. On the MGS execute the following command: - mgs# lctl conf_param fsname.mdt.enable_remote_dir_gid=-1 + +mgs# lctl set_param -P mdt.fsname-*.enable_remote_dir_gid=-1 + For the Lustre filesystem 'scratch', the commands expands to: - mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1. + +mgs# lctl set_param -P mdt.scratch-*.enable_remote_dir_gid=-1 + The change can be verified by executing the following command on every MDS: - mds# lctl get_param mdt.*.enable_remote_dir_gid + +mds# lctl get_param mdt.*.enable_remote_dir_gid +
@@ -511,7 +519,7 @@ client# lfs mkdir -i mdt_index /mount_po This command to stripe a directory over mdt_count MDTs is: -client# lfs mkdir -c mdt_count /mount_point/new_directory +client$ lfs mkdir -c mdt_count /mount_point/new_directory The striped directory feature is most useful for distributing @@ -524,12 +532,14 @@ client# lfs mkdir -c mdt_count /mount_po this directory and its stripes will be distributed on MDTs by space usage. For example the following will create a new directory on an MDT preferring one that has less space usage: - lfs mkdir -c 1 -i -1 dir1 + +client$ lfs mkdir -c 1 -i -1 dir1 + Alternatively, if a default directory stripe is set on a directory, the subsequent use of mkdir for subdirectories in dir1 will have the same effect: -client# lfs setdirstripe -D -c 1 -i -1 dir1 +client$ lfs setdirstripe -D -c 1 -i -1 dir1 The policy is: @@ -556,19 +566,19 @@ client# lfs setdirstripe -D -c 1 -i -1 dir1 To set max_mdt_stripecount, on all MDSes of file system, run: - + mgs# lctl set_param -P lod.$fsname-MDTxxxx-mdtlov.max_mdt_stripecount=<N> - + To check max_mdt_stripecount, run: - + mds# lctl get_param lod.$fsname-MDTxxxx-mdtlov.max_mdt_stripecount - + To reset max_mdt_stripecount, run: - + mgs# lctl set_param -P -d lod.$fsname-MDTxxxx-mdtlov.max_mdt_stripecount - +
@@ -593,13 +603,15 @@ mgs# lctl set_param -P -d lod.$fsname-MDTxxxx-mdtlov.max_mdt_stripecount If administrator wants to change this default filesystem-wide directory striping, run the following command to limit this striping to the top level below the root directory: - lfs setdirstripe -D -i -1 -c 1 --max-inherit 0 <mountpoint> - + +client$ lfs setdirstripe -D -i -1 -c 1 --max-inherit 0 <mountpoint> + To revert to the pre-2.15 behavior of all directories being created only on MDT0000 by default (deleting this striping won't work because it will be recreated if missing): - lfs setdirstripe -D -i 0 -c 1 --max-inherit 0 <mountpoint> - + +client$ lfs setdirstripe -D -i 0 -c 1 --max-inherit 0 <mountpoint> +
@@ -610,7 +622,7 @@ mgs# lctl set_param -P -d lod.$fsname-MDTxxxx-mdtlov.max_mdt_stripecount Default Dir Stripe Policy If default dir stripe policy is set to a directory, it will be applied to sub directories created later. For example: - + $ mkdir testdir1 $ lfs setdirstripe testdir1 -D -c 2 $ lfs getdirstripe testdir1 -D @@ -621,7 +633,8 @@ lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: crush mdtidx FID[seq:oid:ver] 0 [0x200000400:0x2:0x0] 1 [0x240000401:0x2:0x0] - + + Default dir stripe can be inherited by sub directory. This behavior is controlled by lmv_max_inherit parameter. If lmv_max_inherit is 0 or 1, sub @@ -630,10 +643,10 @@ mdtidx FID[seq:oid:ver] lmv_max_inherit and uses it as its own lmv_max_inherit. -1 is special because it means unlimited. For example: - + $ lfs getdirstripe testdir1/subdir1 -D lmv_stripe_count: 2 lmv_stripe_offset: -1 lmv_hash_type: none lmv_max_inherit: 2 lmv_max_inherit_rr: 0 - + lmv_max_inherit can be set explicitly with --max-inherit option in @@ -654,107 +667,99 @@ lmv_stripe_count: 2 lmv_stripe_offset: -1 lmv_hash_type: none lmv_max_inherit: 2 Lustre: - When creating a file system, use mkfs.lustre. See + When creating a file system, use mkfs.lustre. See below. - When a server is stopped, use tunefs.lustre. See + When a server is stopped, use tunefs.lustre. See below. When the file system is running, use lctl to set or retrieve - Lustre parameters. See - and + Lustre parameters. See + and below.
- Setting Tunable Parameters with + <title>Setting Tunable Parameters with <literal>mkfs.lustre</literal> When the file system is first formatted, parameters can simply be - added as a - --param option to the + added as a --param option to the mkfs.lustre command. For example: - + mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda - For more details about creating a file system,see - . For more details about - mkfs.lustre, see + For more details about creating a file system,see + . For more details about + mkfs.lustre, see .
- Setting Parameters with + <title>Setting Parameters with <literal>tunefs.lustre</literal> If a server (OSS or MDS) is stopped, parameters can be added to an - existing file system using the - --param option to the + existing file system using the + --param option to the tunefs.lustre command. For example: - + oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda - With - tunefs.lustre, parameters are + With tunefs.lustre, parameters are additive-- new parameters are specified in addition - to old parameters, they do not replace them. To erase all old + to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run: - -mds# tunefs.lustre --erase-params --param= -new_parameters + +mds# tunefs.lustre --erase-params --param=new_parameters The tunefs.lustre command can be used to set any parameter settable via lctl conf_param and that has its own OBD device, - so it can be specified as + so it can be specified as obdname|fsname. obdtype. proc_file_name= value. For example: - + mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1 - For more details about - tunefs.lustre, see + For more details about tunefs.lustre, see .
- Setting Parameters with + <title>Setting Parameters with <literal>lctl</literal> - When the file system is running, the + When the file system is running, the lctl command can be used to set parameters (temporary or permanent) and report current parameter values. Temporary parameters are active as long as the server or client is not shut down. Permanent parameters live through server and client reboots. The lctl list_param command enables users to - list all parameters that can be set. See + list all parameters that can be set. See . - For more details about the + For more details about the lctl command, see the examples in the sections below - and + and .
Setting Temporary Parameters - Use + Use lctl set_param to set temporary parameters on the node where it is run. These parameters internally map to corresponding items in the kernel /proc/{fs,sys}/{lnet,lustre} and /sys/{fs,kernel/debug}/lustre virtual filesystems. However, since the mapping between a particular parameter name and the underlying virtual pathname may change, it is not - recommended to access the virtual pathname directly. The + recommended to access the virtual pathname directly. The lctl set_param command uses this syntax: - -lctl set_param [-n] [-P] -obdtype. -obdname. -proc_file_name= -value + +# lctl set_param [-n] [-P] obdtype.obdname.proc_file_name=value For example: - + # lctl set_param osc.*.max_dirty_mb=1024 osc.myth-OST0000-osc.max_dirty_mb=32 osc.myth-OST0001-osc.max_dirty_mb=32 @@ -767,34 +772,32 @@ osc.myth-OST0004-osc.max_dirty_mb=32 Setting Permanent Parameters Use lctl set_param -P or lctl conf_param command to set permanent parameters. - In general, the - lctl conf_param command can be used to specify any - settable parameter with its own OBD device. The - lctl conf_param command uses the following syntax - (the same as the mkfs.lustre and + In general, the set_param -P command is preferred + for new parameters, as this isolates the parameter settings from the + MDT and OST device configuration, and is consistent with the common + lctl get_param and lctl set_param + commands. The lctl conf_param command + was previously used to specify settable parameter, with the following + syntax (the same as the mkfs.lustre and tunefs.lustre commands): - -obdname|fsname. -obdtype. -proc_file_name= -value) + +obdname|fsname.obdtype.proc_file_name=value) The lctl conf_param and lctl set_param syntax is not the same. - Here are a few examples of + Here are a few examples of lctl conf_param commands: - + mgs# lctl conf_param testfs-MDT0000.sys.timeout=40 -$ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE -$ lctl conf_param testfs.llite.max_read_ahead_mb=16 -$ lctl conf_param testfs-MDT0000.lov.stripesize=2M -$ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15 -$ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15 -$ lctl conf_param testfs.sys.timeout=40 +mgs# lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE +mgs# lctl conf_param testfs.llite.max_read_ahead_mb=16 +mgs# lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15 +mgs# lctl conf_param testfs-OST0000.ost.client_cache_seconds=15 +mgs# lctl conf_param testfs.sys.timeout=40 - Parameters specified with the + Parameters specified with the lctl conf_param command are set permanently in the file system's configuration file on the MGS. @@ -804,38 +807,29 @@ $ lctl conf_param testfs.sys.timeout=40 The lctl set_param -P command can also set parameters permanently using the same syntax as lctl set_param and lctl - get_param commands. This command must be issued on the MGS. - The given parameter is set on every host using + get_param commands. Permanent parameter settings must be + issued on the MGS. The given parameter is set on every host using lctl upcall. The lctl set_param command uses the following syntax: - -lctl set_param -P -obdtype. -obdname. -proc_file_name= -value + +lctl set_param -P obdtype.obdname.proc_file_name=value For example: - -# lctl set_param -P osc.*.max_dirty_mb=1024 -osc.myth-OST0000-osc.max_dirty_mb=32 -osc.myth-OST0001-osc.max_dirty_mb=32 -osc.myth-OST0002-osc.max_dirty_mb=32 -osc.myth-OST0003-osc.max_dirty_mb=32 -osc.myth-OST0004-osc.max_dirty_mb=32 + +mgs# lctl set_param -P timeout=40 +mgs# lctl set_param -P mdt.testfs-MDT*.identity_upcall=NONE +mgs# lctl set_param -P llite.testfs-*.max_read_ahead_mb=16 +mgs# lctl set_param -P osc.testfs-OST*.max_dirty_mb=29.15 +mgs# lctl set_param -P ost.testfs-OST*.client_cache_seconds=15 - Use - -d(only with -P) option to delete permanent - parameter. Syntax: - -lctl set_param -P -d -obdtype. -obdname. -parameter_name + Use the -P -d option to delete permanent + parameters. Syntax: + +lctl set_param -P -d obdtype.obdname.parameter_name For example: - -# lctl set_param -P -d osc.*.max_dirty_mb + +mgs# lctl set_param -P -d osc.*.max_dirty_mb Starting in Lustre 2.12, there is lctl get_param command can provide @@ -845,17 +839,25 @@ lctl set_param -P -d provides an interactive list of available parameters.
+
+ Listing Persistent Parameters + To list tunable parameters stored in the params + log file by lctl set_param -P and applied to nodes at + mount, run the lctl --device MGS llog_print params + command on the MGS. For example: + +mgs# lctl --device MGS llog_print params +- { index: 2, event: set_param, device: general, parameter: osc.*.max_dirty_mb, value: 1024 } + +
- Listing Parameters + Listing All Tunable Parameters To list Lustre or LNet parameters that are available to set, use - the - lctl list_param command. For example: - -lctl list_param [-FR] -obdtype. -obdname + the lctl list_param command. For example: + +lctl list_param [-FR] obdtype.obdname - The following arguments are available for the + The following arguments are available for the lctl list_param command. -F Add ' @@ -867,19 +869,16 @@ lctl list_param [-FR] -R Recursively lists all parameters under the specified path For example: - -oss# lctl list_param obdfilter.lustre-OST0000 + +oss# lctl list_param obdfilter.lustre-OST0000
Reporting Current Parameter Values - To report current Lustre parameter values, use the + To report current Lustre parameter values, use the lctl get_param command with this syntax: - -lctl get_param [-n] -obdtype. -obdname. -proc_file_name + +lctl get_param [-n] obdtype.obdname.proc_file_name Starting in Lustre 2.12, there is lctl get_param command can provide @@ -889,13 +888,13 @@ lctl get_param [-n] provides an interactive list of available parameters. This example reports data on RPC service times. - + oss# lctl get_param -n ost.*.ost_io.timeouts -service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 +service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 This example reports the amount of space this client has reserved for writeback cache with each OST: - + client# lctl get_param osc.*.cur_grant_bytes osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304 @@ -918,24 +917,23 @@ osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384 a list delimited by commas ( ,). However, when failover nodes are specified, the NIDs are delimited by a colon ( - :) or by repeating a keyword such as - --mgsnode= or + :) or by repeating a keyword such as + --mgsnode= or --servicenode=). To display the NIDs of all servers in networks configured to work with the Lustre file system, run (while LNet is running): - -lctl list_nids + +# lctl list_nids - In the example below, - mds0 and + In the example below, + mds0 and mds1 are configured as a combined MGS/MDT failover pair - and - oss0 and + and oss0 and oss1 are configured as an OST failover pair. The Ethernet - address for - mds0 is 192.168.10.1, and for - mds1 is 192.168.10.2. The Ethernet addresses for - oss0 and + address for + mds0 is 192.168.10.1, and for + mds1 is 192.168.10.2. The Ethernet addresses for + oss0 and oss1 are 192.168.10.20 and 192.168.10.21 respectively. @@ -954,26 +952,26 @@ mds0# umount /mnt/mdt mds1# mount -t lustre /dev/sda1 /mnt/test/mdt mds1# lctl get_param mdt.testfs-MDT0000.recovery_status - Where multiple NIDs are specified separated by commas (for example, + Where multiple NIDs are specified separated by commas (for example, 10.67.73.200@tcp,192.168.10.1@tcp), the two NIDs refer - to the same host, and the Lustre software chooses the + to the same host, and the Lustre software chooses the best one for communication. When a pair of NIDs is - separated by a colon (for example, + separated by a colon (for example, 10.67.73.200@tcp:10.67.73.201@tcp), the two NIDs refer to two different hosts and are treated as a failover pair (the Lustre software tries the first one, and if that fails, it tries the second one.) - Two options to + Two options to mkfs.lustre can be used to specify failover nodes. The --servicenode option is used to specify all service NIDs, - including those for primary nodes and failover nodes. When the + including those for primary nodes and failover nodes. When the --servicenode option is used, the first service node to load the target device becomes the primary service node, while nodes corresponding to the other specified NIDs become failover locations for the target device. An older option, --failnode, specifies - just the NIDs of failover nodes. For more information about the - --servicenode and - --failnode options, see + just the NIDs of failover nodes. For more information about the + --servicenode and + --failnode options, see .
@@ -985,40 +983,36 @@ mds1# lctl get_param mdt.testfs-MDT0000.recovery_status Erasing a File System If you want to erase a file system and permanently delete all the data in the file system, run this command on your targets: - -$ "mkfs.lustre --reformat" + +# mkfs.lustre --reformat If you are using a separate MGS and want to keep other file systems - defined on that MGS, then set the - writeconf flag on the MDT for that file system. The + defined on that MGS, then set the + writeconf flag on the MDT for that file system. The writeconf flag causes the configuration logs to be erased; they are regenerated the next time the servers start. - To set the - writeconf flag on the MDT: + To set the writeconf flag on the MDT: Unmount all clients/servers using this file system, run: -$ umount /mnt/lustre +client# umount /mnt/lustre Permanently erase the file system and, presumably, replace it with another file system, run: - -$ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/ -{mdsdev} + +mgs# mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/mdsdev If you have a separate MGS (that you do not want to reformat), - then add the - --writeconf flag to + then add the --writeconf flag to mkfs.lustre on the MDT, run: - -$ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode= -mgs_nid --mdt --index=0 -/dev/mds_device + +mgs# mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=mgs_nid \ + --mdt --index=0 /dev/mds_device @@ -1041,9 +1035,8 @@ $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode= space to avoid file system fragmentation. In order to reclaim this space, run the following command on your OSS for each OST in the file system:
- -tune2fs [-m reserved_blocks_percent] /dev/ -{ostdev} + +# tune2fs [-m reserved_blocks_percent] /dev/ostdev You do not need to shut down Lustre before running this command or restart it afterwards. @@ -1063,10 +1056,10 @@ tune2fs [-m reserved_blocks_percent] /dev/ replacing an OST or MDS Replacing an Existing OST or MDT To copy the contents of an existing OST to a new OST (or an old MDT - to a new MDT), follow the process for either OST/MDT backups in - or + to a new MDT), follow the process for either OST/MDT backups in + or . - For more information on removing a MDT, see + For more information on removing a MDT, see .
@@ -1079,17 +1072,17 @@ tune2fs [-m reserved_blocks_percent] /dev/ a given OST. - On the OST (as root), run + On the OST (as root), run debugfs to display the file identifier ( FID) of the file associated with the object. - For example, if the object is - 34976 on - /dev/lustre/ost_test2, the debug command is: - -# debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2 + For example, if the object is + 34976 on + /dev/lustre/ost_test2, the debug command is: + +# debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2 - The command output is: - + The command output is: + debugfs 1.45.6.wc1 (20-Mar-2020) /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000 @@ -1109,7 +1102,8 @@ Extended attributes stored in inode body: fid: objid=34976 seq=0 parent=[0x200000400:0x122:0x0] stripe=1 EXTENTS: (0-64):4620544-4620607 - + + The parent FID will be of the form @@ -1120,24 +1114,19 @@ EXTENTS: In cases of an upgraded 1.x inode (if the first part of the - FID is below 0x200000400), the MDT inode number is - 0x24dab9 and generation + FID is below 0x200000400), the MDT inode number is + 0x24dab9 and generation 0x3f0dfa6a and the pathname can also be resolved - using - debugfs. + using debugfs. - On the MDS (as root), use + On the MDS (as root), use debugfs to find the file associated with the inode: - -# debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test - - Here is the command output: - + +# debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test debugfs 1.42.3.wc3 (15-Aug-2012) -/dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\ -s +/dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmaps Inode Pathname 2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP @@ -1152,7 +1141,7 @@ Inode Pathname To find the Lustre file from a disk LBA, follow the steps listed in - the document at this URL: + the document at this URL: https://www.smartmontools.org/wiki/BadBlockHowto. Then, follow the steps above to resolve the Lustre filename. diff --git a/index.xml b/index.xml index 6b49efb..8e45cb2 100644 --- a/index.xml +++ b/index.xml @@ -18,7 +18,7 @@ Manual.) - Notwithstanding Intel’s ownership of the copyright in the modifications to the original + Notwithstanding Intel's ownership of the copyright in the modifications to the original version of this Operations Manual, as between Intel and Oracle, Oracle and/or its affiliates retain sole ownership of the copyright in the unmodified portions of this Operations Manual.