-<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="lustremaintenance">
+<?xml version='1.0' encoding='UTF-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustremaintenance">
<title xml:id="lustremaintenance.title">Lustre Maintenance</title>
<para>Once you have the Lustre file system up and running, you can use the procedures in this section to perform these basic Lustre maintenance tasks:</para>
<itemizedlist>
<listitem>
- <para><xref linkend="dbdoclet.50438199_42877"/></para>
+ <para><xref linkend="lustremaint.inactiveOST"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_15240"/></para>
+ <para><xref linkend="lustremaint.findingNodes"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_26070"/></para>
+ <para><xref linkend="lustremaint.mountingServerWithoutLustre"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_54623"/></para>
+ <para><xref linkend="lustremaint.regenerateConfigLogs"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.changingservernid"/></para>
+ <para><xref linkend="lustremaint.changingservernid"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.adding_new_mdt"/></para>
+ <para><xref linkend="lustremaint.clear_conf"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.adding_new_ost"/></para>
+ <para><xref linkend="lustremaint.adding_new_mdt"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.deactivating_mdt_ost"/></para>
+ <para><xref linkend="lustremaint.adding_new_ost"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.rmremotedir"/></para>
+ <para><xref linkend="lustremaint.deactivating_mdt_ost"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.inactivemdt"/></para>
+ <para><xref linkend="lustremaint.rmremotedir"/></para>
</listitem>
<listitem>
- <para><xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="section_remove_ost"/></para>
+ <para><xref linkend="lustremaint.inactivemdt"/></para>
</listitem>
<listitem>
- <para><xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="section_ydg_pgt_tl"/></para>
+ <para><xref linkend="lustremaint.remove_ost"/></para>
</listitem>
<listitem>
- <para><xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="section_restore_ost"/></para>
+ <para><xref linkend="lustremaint.ydg_pgt_tl"/></para>
</listitem>
<listitem>
- <para><xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="section_ucf_qgt_tl"/></para>
+ <para><xref linkend="lustremaint.restore_ost"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_77819"/></para>
+ <para><xref linkend="lustremaint.ucf_qgt_tl"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_12607"/></para>
+ <para><xref linkend="lustremaint.abortRecovery"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_62333"/></para>
+ <para><xref linkend="lustremaint.determineOST"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438199_62545"/></para>
+ <para><xref linkend="lustremaint.ChangeAddrFailoverNode"/></para>
+ </listitem>
+ <listitem>
+ <para><xref linkend="lustremaint.seperateCombinedMGSMDT"/></para>
+ </listitem>
+ <listitem>
+ <para><xref linkend="lustremaint.setMDTReadonly"/></para>
+ </listitem>
+ <listitem>
+ <para><xref linkend="lustremaint.tunefallocate"/></para>
</listitem>
</itemizedlist>
- <section xml:id="dbdoclet.50438199_42877">
+ <section xml:id="lustremaint.inactiveOST">
<title>
<indexterm><primary>maintenance</primary></indexterm>
<indexterm><primary>maintenance</primary><secondary>inactive OSTs</secondary></indexterm>
<literal>exclude=testfs-OST0000:testfs-OST0001</literal>.</para>
</note>
</section>
- <section xml:id="dbdoclet.50438199_15240">
+ <section xml:id="lustremaint.findingNodes">
<title><indexterm><primary>maintenance</primary><secondary>finding nodes</secondary></indexterm>
Finding Nodes in the Lustre File System</title>
<para>There may be situations in which you need to find all nodes in
0: testfs-OST0000_UUID ACTIVE
1: testfs-OST0001_UUID ACTIVE </screen>
</section>
- <section xml:id="dbdoclet.50438199_26070">
+ <section xml:id="lustremaint.mountingServerWithoutLustre">
<title><indexterm><primary>maintenance</primary><secondary>mounting a server</secondary></indexterm>
Mounting a Server Without Lustre Service</title>
<para>If you are using a combined MGS/MDT, but you only want to start the MGS and not the MDT, run this command:</para>
<para>In this example, the combined MGS/MDT is <literal>testfs-MDT0000</literal> and the mount point is <literal>/mnt/test/mdt</literal>.</para>
<screen>$ mount -t lustre -L testfs-MDT0000 -o nosvc /mnt/test/mdt</screen>
</section>
- <section xml:id="dbdoclet.50438199_54623">
+ <section xml:id="lustremaint.regenerateConfigLogs">
<title><indexterm><primary>maintenance</primary><secondary>regenerating config logs</secondary></indexterm>
Regenerating Lustre Configuration Logs</title>
<para>If the Lustre file system configuration logs are in a state where
run, the configuration logs are re-generated as servers connect to the
MGS.</para>
</section>
- <section xml:id="dbdoclet.changingservernid">
+ <section xml:id="lustremaint.changingservernid">
<title><indexterm><primary>maintenance</primary><secondary>changing a NID</secondary></indexterm>
Changing a Server NID</title>
- <para>In Lustre software release 2.3 or earlier, the <literal>tunefs.lustre
- --writeconf</literal> command is used to rewrite all of the configuration files.</para>
- <para condition="l24">If you need to change the NID on the MDT or OST, a new
- <literal>replace_nids</literal> command was added in Lustre software release 2.4 to simplify
- this process. The <literal>replace_nids</literal> command differs from <literal>tunefs.lustre
- --writeconf</literal> in that it does not erase the entire configuration log, precluding the
- need the need to execute the <literal>writeconf</literal> command on all servers and
- re-specify all permanent parameter settings. However, the <literal>writeconf</literal> command
- can still be used if desired.</para>
+ <para>In order to totally rewrite the Lustre configuration, the
+ <literal>tunefs.lustre --writeconf</literal> command is used to
+ rewrite all of the configuration files.</para>
+ <para>If you need to change only the NID of the MDT or OST, the
+ <literal>replace_nids</literal> command can simplify this process.
+ The <literal>replace_nids</literal> command differs from
+ <literal>tunefs.lustre --writeconf</literal> in that it does not
+ erase the entire configuration log, precluding the need the need to
+ execute the <literal>writeconf</literal> command on all servers and
+ re-specify all permanent parameter settings. However, the
+ <literal>writeconf</literal> command can still be used if desired.
+ </para>
<para>Change a server NID in these situations:</para>
<itemizedlist>
<listitem>
<note><para>The previous configuration log is backed up on the MGS
disk with the suffix <literal>'.bak'</literal>.</para></note>
</section>
- <section xml:id="dbdoclet.clear_conf" condition="l2B">
+ <section xml:id="lustremaint.clear_conf" condition="l2B">
<title><indexterm>
<primary>maintenance</primary>
<secondary>Clearing a config</secondary>
</listitem>
</orderedlist>
</section>
- <section xml:id="dbdoclet.adding_new_mdt" condition='l24'>
+ <section xml:id="lustremaint.adding_new_mdt">
<title><indexterm>
<primary>maintenance</primary>
<secondary>adding an MDT</secondary>
user or application workloads from other users of the filesystem. It
is possible to have multiple remote sub-directories reference the
same MDT. However, the root directory will always be located on
- MDT0. To add a new MDT into the file system:</para>
+ MDT0000. To add a new MDT into the file system:</para>
<orderedlist>
<listitem>
<para>Discover the maximum MDT index. Each MDT must have unique index.</para>
</listitem>
</orderedlist>
</section>
- <section xml:id="dbdoclet.adding_new_ost">
+ <section xml:id="lustremaint.adding_new_ost">
<title><indexterm><primary>maintenance</primary><secondary>adding a OST</secondary></indexterm>
Adding a New OST to a Lustre File System</title>
<para>A new OST can be added to existing Lustre file system on either
<listitem>
<para> Add a new OST by using <literal>mkfs.lustre</literal> as when
the filesystem was first formatted, see
- <xref linkend="dbdoclet.format_ost" /> for details. Each new OST
+ <xref linkend="format_ost" /> for details. Each new OST
must have a unique index number, use <literal>lctl dl</literal> to
see a list of all OSTs. For example, to add a new OST at index 12
to the <literal>testfs</literal> filesystem run following commands
system on <literal>OST0004</literal> that are larger than 4GB in
size to other OSTs, enter:</para>
<screen>client# lfs find /test --ost test-OST0004 -size +4G | lfs_migrate -y</screen>
- <para>See <xref linkend="dbdoclet.lfs_migrate"/> for details.</para>
+ <para>See <xref linkend="lfs_migrate"/> for details.</para>
</listitem>
</orderedlist>
</section>
- <section xml:id="dbdoclet.deactivating_mdt_ost">
+ <section xml:id="lustremaint.deactivating_mdt_ost">
<title><indexterm><primary>maintenance</primary><secondary>restoring an OST</secondary></indexterm>
<indexterm><primary>maintenance</primary><secondary>removing an OST</secondary></indexterm>
Removing and Restoring MDTs and OSTs</title>
<para>A hard drive has failed and a RAID resync/rebuild is underway,
though the OST can also be marked <emphasis>degraded</emphasis> by
the RAID system to avoid allocating new files on the slow OST which
- can reduce performance, see <xref linkend='dbdoclet.degraded_ost' />
+ can reduce performance, see <xref linkend='degraded_ost' />
for more details.
</para>
</listitem>
<listitem>
<para>OST is nearing its space capacity, though the MDS will already
try to avoid allocating new files on overly-full OSTs if possible,
- see <xref linkend='dbdoclet.balancing_free_space' /> for details.
+ see <xref linkend='balancing_free_space' /> for details.
</para>
</listitem>
<listitem>
desire to continue using the filesystem before it is repaired.</para>
</listitem>
</itemizedlist>
- <section condition="l24" xml:id="dbdoclet.rmremotedir">
+ <section xml:id="lustremaint.rmremotedir">
<title><indexterm><primary>maintenance</primary><secondary>removing an MDT</secondary></indexterm>Removing an MDT from the File System</title>
<para>If the MDT is permanently inaccessible,
<literal>lfs rm_entry {directory}</literal> can be used to delete the
<para>The <literal>lfs getstripe --mdt-index</literal> command
returns the index of the MDT that is serving the given directory.</para>
</section>
- <section xml:id="dbdoclet.inactivemdt" condition='l24'>
+ <section xml:id="lustremaint.inactivemdt">
<title>
<indexterm><primary>maintenance</primary></indexterm>
<indexterm><primary>maintenance</primary><secondary>inactive MDTs</secondary></indexterm>Working with Inactive MDTs</title>
the MDT is activated again. Clients accessing an inactive MDT will receive
an EIO error.</para>
</section>
- <section remap="h3" xml:id="section_remove_ost">
+ <section remap="h3" xml:id="lustremaint.remove_ost">
<title><indexterm>
<primary>maintenance</primary>
<secondary>removing an OST</secondary>
</listitem>
<listitem>
<para>If there is not expected to be a replacement for this OST in
- the near future, permanently deactivate it on all clients and
- the MDS by running the following command on the MGS:
- <screen>mgs# lctl conf_param <replaceable>ost_name</replaceable>.osc.active=0</screen></para>
+ the near future, permanently deactivate it on all clients and
+ the MDS by running the following command on the MGS:
+ <screen>mgs# lctl conf_param <replaceable>ost_name</replaceable>.osc.active=0</screen></para>
<note><para>A deactivated OST still appears in the file system
- configuration, though a replacement OST can be created using the
+ configuration, though a replacement OST can be created that
+ re-uses the same OST index with the
<literal>mkfs.lustre --replace</literal> option, see
- <xref linkend="section_restore_ost"/>.
+ <xref linkend="lustremaint.restore_ost"/>.
</para></note>
+ <para>To totally remove the OST from the filesystem configuration,
+ the OST configuration records should be found in the startup
+ logs by running the command
+ "<literal>lctl --device MGS llog_print <replaceable>fsname</replaceable>-client</literal>"
+ on the MGS (and also
+ "<literal>... <replaceable>$fsname</replaceable>-MDT<replaceable>xxxx</replaceable></literal>"
+ for all the MDTs) to list all <literal>attach</literal>,
+ <literal>setup</literal>, <literal>add_osc</literal>,
+ <literal>add_pool</literal>, and other records related to the
+ removed OST(s). Once the <literal>index</literal> value is
+ known for each configuration record, the command
+ "<literal>lctl --device MGS llog_cancel <replaceable>llog_name</replaceable> -i <replaceable>index</replaceable> </literal>"
+ will drop that record from the configuration log
+ <replaceable>llog_name</replaceable> for each of the
+ <literal><replaceable>fsname</replaceable>-client</literal> and
+ <literal><replaceable>fsname</replaceable>-MDTxxxx</literal>
+ configuration logs so that new mounts will no longer process it.
+ If a whole OSS is being removed, the<literal>add_uuid</literal>
+ records for the OSS should similarly be canceled.
+ <screen>
+mgs# lctl --device MGS llog_print testfs-client | egrep "192.168.10.99@tcp|OST0003"
+- { index: 135, event: add_uuid, nid: 192.168.10.99@tcp(0x20000c0a80a63), node: 192.168.10.99@tcp }
+- { index: 136, event: attach, device: testfs-OST0003-osc, type: osc, UUID: testfs-clilov_UUID }
+- { index: 137, event: setup, device: testfs-OST0003-osc, UUID: testfs-OST0003_UUID, node: 192.168.10.99@tcp }
+- { index: 138, event: add_osc, device: testfs-clilov, ost: testfs-OST0003_UUID, index: 3, gen: 1 }
+mgs# lctl --device MGS llog_cancel testfs-client -i 138
+mgs# lctl --device MGS llog_cancel testfs-client -i 137
+mgs# lctl --device MGS llog_cancel testfs-client -i 136
+ </screen>
+ </para>
</listitem>
</orderedlist>
</listitem>
</orderedlist>
</section>
- <section remap="h3" xml:id="section_ydg_pgt_tl">
+ <section remap="h3" xml:id="lustremaint.ydg_pgt_tl">
<title><indexterm>
<primary>maintenance</primary>
<secondary>backing up OST config</secondary>
</listitem>
</orderedlist>
</section>
- <section xml:id="section_restore_ost">
+ <section xml:id="lustremaint.restore_ost">
<title><indexterm>
<primary>maintenance</primary>
<secondary>restoring OST config</secondary>
</indexterm> Restoring OST Configuration Files</title>
<para>If the original OST is still available, it is best to follow the
OST backup and restore procedure given in either
- <xref linkend="dbdoclet.backup_device"/>, or
+ <xref linkend="backup_device"/>, or
<xref linkend="backup_fs_level"/> and
<xref linkend="backup_fs_level.restore"/>.</para>
<para>To replace an OST that was removed from service due to corruption
<listitem>
<para>Recreate the OST configuration files, if unavailable. </para>
<para>Follow the procedure in
- <xref linkend="dbdoclet.repair_ost_lastid"/> to recreate the LAST_ID
+ <xref linkend="repair_ost_lastid"/> to recreate the LAST_ID
file for this OST index. The <literal>last_rcvd</literal> file
will be recreated when the OST is first mounted using the default
parameters, which are normally correct for all file systems. The
</listitem>
</orderedlist>
</section>
- <section xml:id="section_ucf_qgt_tl">
+ <section xml:id="lustremaint.ucf_qgt_tl">
<title><indexterm>
<primary>maintenance</primary>
<secondary>reintroducing an OSTs</secondary>
client# lctl set_param osc.<replaceable>fsname</replaceable>-OST<replaceable>number</replaceable>-*.active=1</screen></para>
</section>
</section>
- <section xml:id="dbdoclet.50438199_77819">
+ <section xml:id="lustremaint.abortRecovery">
<title><indexterm><primary>maintenance</primary><secondary>aborting recovery</secondary></indexterm>
<indexterm><primary>backup</primary><secondary>aborting recovery</secondary></indexterm>
Aborting Recovery</title>
<para>The recovery process is blocked until all OSTs are available. </para>
</note>
</section>
- <section xml:id="dbdoclet.50438199_12607">
+ <section xml:id="lustremaint.determineOST">
<title><indexterm><primary>maintenance</primary><secondary>identifying OST host</secondary></indexterm>
Determining Which Machine is Serving an OST </title>
<para>In the course of administering a Lustre file system, you may need to determine which
osc.testfs-OST0003-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp
osc.testfs-OST0004-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp</screen></para>
</section>
- <section xml:id="dbdoclet.50438199_62333">
+ <section xml:id="lustremaint.ChangeAddrFailoverNode">
<title><indexterm><primary>maintenance</primary><secondary>changing failover node address</secondary></indexterm>
Changing the Address of a Failover Node</title>
<para>To change the address of a failover node (e.g, to use node X instead of node Y), run
<literal>--failnode</literal> options, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
linkend="configuringfailover"/>.</para>
</section>
- <section xml:id="dbdoclet.50438199_62545">
+ <section xml:id="lustremaint.seperateCombinedMGSMDT">
<title><indexterm><primary>maintenance</primary><secondary>separate a
combined MGS/MDT</secondary></indexterm>
Separate a combined MGS/MDT</title>
<para>These instructions assume the MGS node will be the same as the MDS
node. For instructions on how to move MGS to a different node, see
- <xref linkend="dbdoclet.changingservernid"/>.</para>
+ <xref linkend="lustremaint.changingservernid"/>.</para>
<para>These instructions are for doing the split without shutting down
other servers and clients.</para>
<orderedlist>
<screen>mds# cp -r <replaceable>/mdt_mount_point</replaceable>/CONFIGS/<replaceable>filesystem_name</replaceable>-* <replaceable>/mgs_mount_point</replaceable>/CONFIGS/. </screen>
<screen>mds# umount <replaceable>/mgs_mount_point</replaceable></screen>
<screen>mds# umount <replaceable>/mdt_mount_point</replaceable></screen>
- <para>See <xref linkend="dbdoclet.50438199_54623"/> for alternative method.</para>
+ <para>See <xref linkend="lustremaint.regenerateConfigLogs"/> for alternative method.</para>
</listitem>
<listitem>
<para>Start the MGS.</para>
</listitem>
</orderedlist>
</section>
+ <section xml:id="lustremaint.setMDTReadonly" condition="l2D">
+ <title><indexterm><primary>maintenance</primary>
+ <secondary>set an MDT to readonly</secondary></indexterm>
+ Set an MDT to read-only</title>
+ <para>It is sometimes desirable to be able to mark the filesystem
+ read-only directly on the server, rather than remounting the clients and
+ setting the option there. This can be useful if there is a rogue client
+ that is deleting files, or when decommissioning a system to prevent
+ already-mounted clients from modifying it anymore.</para>
+ <para>Set the <literal>mdt.*.readonly</literal> parameter to
+ <literal>1</literal> to immediately set the MDT to read-only. All future
+ MDT access will immediately return a "Read-only file system" error
+ (<literal>EROFS</literal>) until the parameter is set to
+ <literal>0</literal> again.</para>
+ <para>Example of setting the <literal>readonly</literal> parameter to
+ <literal>1</literal>, verifying the current setting, accessing from a
+ client, and setting the parameter back to <literal>0</literal>:</para>
+ <screen>mds# lctl set_param mdt.fs-MDT0000.readonly=1
+mdt.fs-MDT0000.readonly=1
+
+mds# lctl get_param mdt.fs-MDT0000.readonly
+mdt.fs-MDT0000.readonly=1
+
+client$ touch test_file
+touch: cannot touch ‘test_file’: Read-only file system
+
+mds# lctl set_param mdt.fs-MDT0000.readonly=0
+mdt.fs-MDT0000.readonly=0</screen>
+ </section>
+ <section xml:id="lustremaint.tunefallocate" condition="l2E">
+ <title><indexterm><primary>maintenance</primary>
+ <secondary>Tune fallocate</secondary></indexterm>
+ Tune Fallocate for ldiskfs</title>
+ <para>This section shows how to tune/enable/disable fallocate for
+ ldiskfs OSTs.</para>
+ <para>The default <literal>mode=0</literal> is the standard
+ "allocate unwritten extents" behavior used by ext4. This is by far the
+ fastest for space allocation, but requires the unwritten extents to be
+ split and/or zeroed when they are overwritten.</para>
+ <para> The OST fallocate <literal>mode=1</literal> can also be set to use
+ "zeroed extents", which may be handled by "WRITE SAME", "TRIM zeroes data",
+ or other low-level functionality in the underlying block device.</para>
+ <para><literal>mode=-1</literal> completely disables fallocate.</para>
+ <para>Example: To completely disable fallocate</para>
+ <screen>lctl set_param osd-ldiskfs.*.fallocate_zero_blocks=-1</screen>
+ <para>Example: To enable fallocate to use 'zeroed extents'</para>
+ <screen>lctl set_param osd-ldiskfs.*.fallocate_zero_blocks=1</screen>
+ </section>
</chapter>
+<!--
+ vim:expandtab:shiftwidth=2:tabstop=8:
+ -->