<?xml version='1.0' encoding='utf-8'?>
<chapter xmlns="http://docbook.org/ns/docbook"
-xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
-xml:id="lustreoperations">
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustreoperations">
<title xml:id="lustreoperations.title">Lustre Operations</title>
<para>Once you have the Lustre file system up and running, you can use the
procedures in this section to perform these basic Lustre administration
<listitem>
<para>Mount the MDT.</para>
<note>
- <para condition='l24'>Mount all MDTs if multiple MDTs are
- present.</para>
+ <para>Mount all MDTs if multiple MDTs are present.</para>
</note>
</listitem>
<listitem>
environment.</para>
</caution>
</section>
- <section xml:id="dbdoclet.50438194_69255">
+ <section xml:id="dbdoclet.shutdownLustre">
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>shutdownLustre</secondary>
+ </indexterm>Stopping the Filesystem</title>
+ <para>A complete Lustre filesystem shutdown occurs by unmounting all
+ clients and servers in the order shown below. Please note that unmounting
+ a block device causes the Lustre software to be shut down on that node.
+ </para>
+ <note><para>Please note that the <literal>-a -t lustre</literal> in the
+ commands below is not the name of a filesystem, but rather is
+ specifying to unmount all entries in /etc/mtab that are of type
+ <literal>lustre</literal></para></note>
+ <orderedlist>
+ <listitem><para>Unmount the clients</para>
+ <para>On each client node, unmount the filesystem on that client
+ using the <literal>umount</literal> command:</para>
+ <para><literal>umount -a -t lustre</literal></para>
+ <para>The example below shows the unmount of the
+ <literal>testfs</literal> filesystem on a client node:</para>
+ <para><screen>[root@client1 ~]# mount |grep testfs
+XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs)
+
+[root@client1 ~]# umount -a -t lustre
+[154523.177714] Lustre: Unmounted testfs-client</screen></para>
+ </listitem>
+ <listitem><para>Unmount the MDT and MGT</para>
+ <para>On the MGS and MDS node(s), run the
+ <literal>umount</literal> command:</para>
+ <para><literal>umount -a -t lustre</literal></para>
+ <para>The example below shows the unmount of the MDT and MGT for
+ the <literal>testfs</literal> filesystem on a combined MGS/MDS:
+ </para>
+ <para><screen>[root@mds1 ~]# mount |grep lustre
+/dev/sda on /mnt/mgt type lustre (ro)
+/dev/sdb on /mnt/mdt type lustre (ro)
+
+[root@mds1 ~]# umount -a -t lustre
+[155263.566230] Lustre: Failing over testfs-MDT0000
+[155263.775355] Lustre: server umount testfs-MDT0000 complete
+[155269.843862] Lustre: server umount MGS complete</screen></para>
+ <para>For a seperate MGS and MDS, the same command is used, first on
+ the MDS and then followed by the MGS.</para>
+ </listitem>
+ <listitem><para>Unmount all the OSTs</para>
+ <para>On each OSS node, use the <literal>umount</literal> command:
+ </para>
+ <para><literal>umount -a -t lustre</literal></para>
+ <para>The example below shows the unmount of all OSTs for the
+ <literal>testfs</literal> filesystem on server
+ <literal>OSS1</literal>:
+ </para>
+ <para><screen>[root@oss1 ~]# mount |grep lustre
+/dev/sda on /mnt/ost0 type lustre (ro)
+/dev/sdb on /mnt/ost1 type lustre (ro)
+/dev/sdc on /mnt/ost2 type lustre (ro)
+
+[root@oss1 ~]# umount -a -t lustre
+[155336.491445] Lustre: Failing over testfs-OST0002
+[155336.556752] Lustre: server umount testfs-OST0002 complete</screen></para>
+ </listitem>
+ </orderedlist>
+ <para>For unmount command syntax for a single OST, MDT, or MGT target
+ please refer to <xref linkend="dbdoclet.umountTarget"/></para>
+ </section>
+ <section xml:id="dbdoclet.umountTarget">
<title>
<indexterm>
<primary>operations</primary>
<secondary>unmounting</secondary>
- </indexterm>Unmounting a Server</title>
- <para>To stop a Lustre server, use the
+ </indexterm>Unmounting a Specific Target on a Server</title>
+ <para>To stop a Lustre OST, MDT, or MGT , use the
<literal>umount
- <replaceable>/mount</replaceable>
- <replaceable>point</replaceable></literal> command.</para>
- <para>For example, to stop
- <literal>ost0</literal> on mount point
- <literal>/mnt/test</literal>, run:</para>
- <screen>
-$ umount /mnt/test
-</screen>
+ <replaceable>/mount_point</replaceable></literal> command.</para>
+ <para>The example below stops an OST, <literal>ost0</literal>, on mount
+ point <literal>/mnt/ost0</literal> for the <literal>testfs</literal>
+ filesystem:</para>
+ <screen>[root@oss1 ~]# umount /mnt/ost0
+[ 385.142264] Lustre: Failing over testfs-OST0000
+[ 385.210810] Lustre: server umount testfs-OST0000 complete</screen>
<para>Gracefully stopping a server with the
<literal>umount</literal> command preserves the state of the connected
clients. The next time the server is started, it waits for clients to
always be safely specified.</para>
</note>
</section>
- <section xml:id="dbdoclet.50438194_57420">
+ <section xml:id="failover_ost">
<title>
<indexterm>
<primary>operations</primary>
</para>
</note>
</section>
- <section xml:id="dbdoclet.50438194_54138">
+ <section xml:id="dbdoclet.degraded_ost">
<title>
<indexterm>
<primary>operations</primary>
resets to
<literal>0</literal>.</para>
<para>It is recommended that this be implemented by an automated script
- that monitors the status of individual RAID devices.</para>
+ that monitors the status of individual RAID devices, such as MD-RAID's
+ <literal>mdadm(8)</literal> command with the <literal>--monitor</literal>
+ option to mark an affected device degraded or restored.</para>
</section>
- <section xml:id="dbdoclet.50438194_88063">
+ <section xml:id="dbdoclet.lustre_configure_multiple_fs">
<title>
<indexterm>
<primary>operations</primary>
client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
</screen>
</section>
- <section xml:id="dbdoclet.lfsmkdir" condition='l24'>
+ <section xml:id="dbdoclet.lfsmkdir">
<title>
<indexterm>
<primary>operations</primary>
<secondary>remote directory</secondary>
- </indexterm>Creating a sub-directory on a given MDT</title>
- <para>Lustre 2.4 enables individual sub-directories to be serviced by
- unique MDTs. An administrator can allocate a sub-directory to a given MDT
- using the command:</para>
+ </indexterm>Creating a sub-directory on a specific MDT</title>
+ <para>It is possible to create individual directories, along with its
+ files and sub-directories, to be stored on specific MDTs. To create
+ a sub-directory on a given MDT use the command:
+ </para>
<screen>
client# lfs mkdir –i
<replaceable>mdt_index</replaceable>
<warning>
<para>An administrator can allocate remote sub-directories to separate
MDTs. Creating remote sub-directories in parent directories not hosted on
- MDT0 is not recommended. This is because the failure of the parent MDT
+ MDT0000 is not recommended. This is because the failure of the parent MDT
will leave the namespace below it inaccessible. For this reason, by
- default it is only possible to create remote sub-directories off MDT0. To
- relax this restriction and enable remote sub-directories off any MDT, an
- administrator must issue the command
- <literal>lctl set_param mdt.*.enable_remote_dir=1</literal>.</para>
+ default it is only possible to create remote sub-directories off MDT0000.
+ To relax this restriction and enable remote sub-directories off any MDT,
+ an administrator must issue the following command on the MGS:
+ <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir=1</screen>
+ For Lustre filesystem 'scratch', the command executed is:
+ <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir=1</screen>
+ To verify the configuration setting execute the following command on any
+ MDS:
+ <screen>mds# lctl get_param mdt.*.enable_remote_dir</screen></para>
</warning>
<para condition='l28'>With Lustre software version 2.8, a new
tunable is available to allow users with a specific group ID to create
<literal>enable_remote_dir_gid</literal>. For example, setting this
parameter to the 'wheel' or 'admin' group ID allows users with that GID
to create and delete remote and striped directories. Setting this
- parameter to <literal>-1</literal> on MDT0 to permanently allow any
- non-root users create and delete remote and striped directories. For
- example:
- <screen>lctl set_param -P mdt.*.enable_remote_dir_gid=-1</screen>
+ parameter to <literal>-1</literal> on MDT0000 to permanently allow any
+ non-root users create and delete remote and striped directories.
+ On the MGS execute the following command:
+ <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir_gid=-1</screen>
+ For the Lustre filesystem 'scratch', the commands expands to:
+ <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1</screen>.
+ The change can be verified by executing the following command on every MDS:
+ <screen>mds# lctl get_param mdt.<replaceable>*</replaceable>.enable_remote_dir_gid</screen>
</para>
</section>
<section xml:id="dbdoclet.lfsmkdirdne2" condition='l28'>
<para>The Lustre 2.8 DNE feature enables individual files in a given
directory to store their metadata on separate MDTs (a <emphasis>striped
directory</emphasis>) once additional MDTs have been added to the
- filesystem, see <xref linkend="dbdoclet.addingamdt"/>.
+ filesystem, see <xref linkend="lustremaint.adding_new_mdt"/>.
The result of this is that metadata requests for
files in a striped directory are serviced by multiple MDTs and metadata
service load is distributed over all the MDTs that service a given
<para>The striped directory feature is most useful for distributing
single large directories (50k entries or more) across multiple MDTs,
since it incurs more overhead than non-striped directories.</para>
+ <section xml:id="dbdoclet.lfsmkdirbyspace" condition='l2D'>
+ <title>Directory creation by space/inode usage</title>
+ <para>If the starting MDT is not specified when creating a new directory,
+ this directory and its stripes will be distributed on MDTs by space usage.
+ For example the following will create a directory and its stripes on MDTs
+ with balanced space usage:</para>
+ <screen>lfs mkdir -c 2 <dir1></screen>
+ <para>Alternatively, if a default directory stripe is set on a directory,
+ the subsequent syscall <literal>mkdir</literal> under
+ <literal><dir1></literal> will have the same effect:
+ <screen>lfs setdirstripe -D -c 2 <dir1></screen></para>
+ <para>The policy is:</para>
+ <itemizedlist>
+ <listitem><para>If free inodes/blocks on all MDT are almost the same,
+ i.e. <literal>max_inodes_avail * 84% < min_inodes_avail</literal> and
+ <literal>max_blocks_avail * 84% < min_blocks_avail</literal>, then
+ choose MDT roundrobin.</para></listitem>
+ <listitem><para>Otherwise, create more subdirectories on MDTs with more
+ free inodes/blocks.</para></listitem>
+ </itemizedlist>
+ </section>
</section>
<section xml:id="dbdoclet.50438194_88980">
<title>
<replaceable>new_parameters</replaceable>
</screen>
<para>The tunefs.lustre command can be used to set any parameter settable
- in a /proc/fs/lustre file and that has its own OBD device, so it can be
- specified as
+ via <literal>lctl conf_param</literal> and that has its own OBD device,
+ so it can be specified as
<literal>
<replaceable>obdname|fsname</replaceable>.
<replaceable>obdtype</replaceable>.
are active as long as the server or client is not shut down. Permanent
parameters live through server and client reboots.</para>
<note>
- <para>The lctl list_param command enables users to list all parameters
- that can be set. See
+ <para>The <literal>lctl list_param</literal> command enables users to
+ list all parameters that can be set. See
<xref linkend="dbdoclet.50438194_88217" />.</para>
</note>
<para>For more details about the
<title>Setting Temporary Parameters</title>
<para>Use
<literal>lctl set_param</literal> to set temporary parameters on the
- node where it is run. These parameters map to items in
- <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
+ node where it is run. These parameters internally map to corresponding
+ items in the kernel <literal>/proc/{fs,sys}/{lnet,lustre}</literal> and
+ <literal>/sys/{fs,kernel/debug}/lustre</literal> virtual filesystems.
+ However, since the mapping between a particular parameter name and the
+ underlying virtual pathname may change, it is <emphasis>not</emphasis>
+ recommended to access the virtual pathname directly. The
<literal>lctl set_param</literal> command uses this syntax:</para>
<screen>
-lctl set_param [-n]
+lctl set_param [-n] [-P]
<replaceable>obdtype</replaceable>.
<replaceable>obdname</replaceable>.
<replaceable>proc_file_name</replaceable>=
</section>
<section xml:id="dbdoclet.50438194_64195">
<title>Setting Permanent Parameters</title>
- <para>Use the
+ <para>Use <literal>lctl set_param -P</literal> or
<literal>lctl conf_param</literal> command to set permanent parameters.
In general, the
<literal>lctl conf_param</literal> command can be used to specify any
- parameter settable in a
- <literal>/proc/fs/lustre</literal> file, with its own OBD device. The
- <literal>lctl conf_param</literal> command uses this syntax (same as the
-
- <literal>mkfs.lustre</literal> and
+ settable parameter with its own OBD device. The
+ <literal>lctl conf_param</literal> command uses the following syntax
+ (the same as the <literal>mkfs.lustre</literal> and
<literal>tunefs.lustre</literal> commands):</para>
<screen>
<replaceable>obdname|fsname</replaceable>.
<replaceable>proc_file_name</replaceable>=
<replaceable>value</replaceable>)
</screen>
+ <note><para>The <literal>lctl conf_param</literal> and
+ <literal>lctl set_param</literal> syntax is <emphasis>not</emphasis>
+ the same.</para></note>
<para>Here are a few examples of
<literal>lctl conf_param</literal> commands:</para>
<screen>
</section>
<section xml:id="dbdoclet.setparamp" condition='l25'>
<title>Setting Permanent Parameters with lctl set_param -P</title>
- <para>Use the
- <literal>lctl set_param -P</literal> to set parameters permanently. This
- command must be issued on the MGS. The given parameter is set on every
- host using
- <literal>lctl</literal> upcall. Parameters map to items in
- <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
- <literal>lctl set_param</literal> command uses this syntax:</para>
+ <para>The <literal>lctl set_param -P</literal> command can also
+ set parameters permanently using the same syntax as
+ <literal>lctl set_param</literal> and <literal>lctl
+ get_param</literal> commands. This command must be issued on the MGS.
+ The given parameter is set on every host using
+ <literal>lctl</literal> upcall. The <literal>lctl set_param</literal>
+ command uses the following syntax:</para>
<screen>
lctl set_param -P
<replaceable>obdtype</replaceable>.
lctl set_param -P -d
<replaceable>obdtype</replaceable>.
<replaceable>obdname</replaceable>.
-<replaceable>proc_file_name</replaceable>
+<replaceable>parameter_name</replaceable>
</screen>
<para>For example:</para>
<screen>
# lctl set_param -P -d osc.*.max_dirty_mb
</screen>
+ <note condition='l2c'><para>Starting in Lustre 2.12, there is
+ <literal>lctl get_param</literal> command can provide
+ <emphasis>tab completion</emphasis> when using an interactive shell
+ with <literal>bash-completion</literal> installed. This simplifies
+ the use of <literal>get_param</literal> significantly, since it
+ provides an interactive list of available parameters.
+ </para></note>
</section>
<section xml:id="dbdoclet.50438194_88217">
<title>Listing Parameters</title>
- <para>To list Lustre or LNET parameters that are available to set, use
+ <para>To list Lustre or LNet parameters that are available to set, use
the
<literal>lctl list_param</literal> command. For example:</para>
<screen>
<replaceable>obdname</replaceable>.
<replaceable>proc_file_name</replaceable>
</screen>
+ <note condition='l2c'><para>Starting in Lustre 2.12, there is
+ <literal>lctl get_param</literal> command can provide
+ <emphasis>tab completion</emphasis> when using an interactive shell
+ with <literal>bash-completion</literal> installed. This simplifies
+ the use of <literal>get_param</literal> significantly, since it
+ provides an interactive list of available parameters.
+ </para></note>
<para>This example reports data on RPC service times.</para>
<screen>
oss# lctl get_param -n ost.*.ost_io.timeouts
</section>
</section>
</section>
- <section xml:id="dbdoclet.50438194_41817">
+ <section xml:id="failover_nids">
<title>
<indexterm>
<primary>operations</primary>
<literal>--mgsnode=</literal> or
<literal>--servicenode=</literal>).</para>
<para>To display the NIDs of all servers in networks configured to work
- with the Lustre file system, run (while LNET is running):</para>
+ with the Lustre file system, run (while LNet is running):</para>
<screen>
lctl list_nids
</screen>
<para>Where multiple NIDs are specified separated by commas (for example,
<literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
to the same host, and the Lustre software chooses the
- <emphasis>best</emphasis>one for communication. When a pair of NIDs is
+ <emphasis>best</emphasis> one for communication. When a pair of NIDs is
separated by a colon (for example,
<literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
to two different hosts and are treated as a failover pair (the Lustre
software tries the first one, and if that fails, it tries the second
one.)</para>
<para>Two options to
- <literal>mkfs.lustre</literal> can be used to specify failover nodes.
- Introduced in Lustre software release 2.0, the
+ <literal>mkfs.lustre</literal> can be used to specify failover nodes. The
<literal>--servicenode</literal> option is used to specify all service NIDs,
including those for primary nodes and failover nodes. When the
<literal>--servicenode</literal> option is used, the first service node to
load the target device becomes the primary service node, while nodes
corresponding to the other specified NIDs become failover locations for the
- target device. An older option,
- <literal>--failnode</literal>, specifies just the NIDS of failover nodes.
- For more information about the
+ target device. An older option, <literal>--failnode</literal>, specifies
+ just the NIDs of failover nodes. For more information about the
<literal>--servicenode</literal> and
<literal>--failnode</literal> options, see
<xref xmlns:xlink="http://www.w3.org/1999/xlink"
<para>To copy the contents of an existing OST to a new OST (or an old MDT
to a new MDT), follow the process for either OST/MDT backups in
<xref linkend='dbdoclet.backup_device' />or
- <xref linkend='dbdoclet.backup_target_filesystem' />.
+ <xref linkend='backup_fs_level' />.
For more information on removing a MDT, see
- <xref linkend='dbdoclet.rmremotedir' />.</para>
+ <xref linkend='lustremaint.rmremotedir' />.</para>
</section>
<section xml:id="dbdoclet.50438194_30872">
<title>
</screen></para>
<para>The command output is:
<screen>
-debugfs 1.42.3.wc3 (15-Aug-2012)
+debugfs 1.45.6.wc1 (20-Mar-2020)
/dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000
Generation: 2393149953 Version: 0x0000002a:00005f81
Extended attributes stored in inode body:
fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
00 00 00 00 00 00 00 00 " (32)
- fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1
+ fid: objid=34976 seq=0 parent=[0x200000400:0x122:0x0] stripe=1
EXTENTS:
(0-64):4620544-4620607
</screen></para>
</listitem>
<listitem>
- <para>For Lustre software release 2.x file systems, the parent FID will
- be of the form [0x200000400:0x122:0x0] and can be resolved directly
- using the
- <literal>lfs fid2path [0x200000404:0x122:0x0]
- /mnt/lustre</literal> command on any Lustre client, and the process is
+ <para>The parent FID will be of the form
+ <literal>[0x200000400:0x122:0x0]</literal> and can be resolved directly
+ using the command <literal>lfs fid2path [0x200000404:0x122:0x0]
+ /mnt/lustre</literal> on any Lustre client, and the process is
complete.</para>
</listitem>
<listitem>
- <para>In this example the parent inode FID is an upgraded 1.x inode
- (due to the first part of the FID being below 0x200000400), the MDT
- inode number is
+ <para>In cases of an upgraded 1.x inode (if the first part of the
+ FID is below 0x200000400), the MDT inode number is
<literal>0x24dab9</literal> and generation
- <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved
+ <literal>0x3f0dfa6a</literal> and the pathname can also be resolved
using
<literal>debugfs</literal>.</para>
</listitem>
</screen>
<para>Here is the command output:</para>
<screen>
-debugfs 1.42.3.wc2 (15-Aug-2012)
+debugfs 1.42.3.wc3 (15-Aug-2012)
/dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
s
Inode Pathname
<note>
<para>To find the Lustre file from a disk LBA, follow the steps listed in
the document at this URL:
- <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">
- http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then,
+ <link xl:href="https://www.smartmontools.org/wiki/BadBlockHowto">
+ https://www.smartmontools.org/wiki/BadBlockHowto</link>. Then,
follow the steps above to resolve the Lustre filename.</para>
</note>
</section>
</chapter>
+<!--
+ vim:expandtab:shiftwidth=2:tabstop=8:
+ -->