As subject.
Change-Id: Ib5b0ba11b154be11e62626dba47653b98b72a20c
Signed-off-by: Richard Henwood <richard.henwood@intel.com>
Reviewed-on: http://review.whamcloud.com/17393
Tested-by: Jenkins
-<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="configuringlustre">
- <title xml:id="configuringlustre.title">Configuring a Lustre File System</title>
- <para>This chapter shows how to configure a simple Lustre file system comprised of a combined
- MGS/MDT, an OST and a client. It includes:</para>
+<?xml version='1.0' encoding='utf-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+xml:id="configuringlustre">
+ <title xml:id="configuringlustre.title">Configuring a Lustre File
+ System</title>
+ <para>This chapter shows how to configure a simple Lustre file system
+ comprised of a combined MGS/MDT, an OST and a client. It includes:</para>
<itemizedlist>
<listitem>
- <para><xref linkend="dbdoclet.50438267_50692"/>
- </para>
+ <para>
+ <xref linkend="dbdoclet.50438267_50692" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438267_76752"/>
- </para>
+ <para>
+ <xref linkend="dbdoclet.50438267_76752" />
+ </para>
</listitem>
</itemizedlist>
<section xml:id="dbdoclet.50438267_50692">
- <title>
- <indexterm><primary>Lustre</primary><secondary>configuring</secondary></indexterm>
- Configuring a Simple Lustre File System</title>
- <para>A Lustre file system can be set up in a variety of configurations by using the
- administrative utilities provided with the Lustre software. The procedure below shows how to
- configure a simple Lustre file system consisting of a combined MGS/MDS, one OSS with two OSTs,
- and a client. For an overview of the entire Lustre installation procedure, see <xref
- linkend="installoverview"/>.</para>
- <para>This configuration procedure assumes you have completed the following:</para>
+ <title>
+ <indexterm>
+ <primary>Lustre</primary>
+ <secondary>configuring</secondary>
+ </indexterm>Configuring a Simple Lustre File System</title>
+ <para>A Lustre file system can be set up in a variety of configurations by
+ using the administrative utilities provided with the Lustre software. The
+ procedure below shows how to configure a simple Lustre file system
+ consisting of a combined MGS/MDS, one OSS with two OSTs, and a client. For
+ an overview of the entire Lustre installation procedure, see
+ <xref linkend="installoverview" />.</para>
+ <para>This configuration procedure assumes you have completed the
+ following:</para>
<itemizedlist>
<listitem>
- <para><emphasis>
- <emphasis role="bold">Set up and configured your hardware</emphasis>
- </emphasis>. For more information about hardware requirements, see <xref linkend="settinguplustresystem"/>.</para>
+ <para>
+ <emphasis>
+ <emphasis role="bold">Set up and configured your hardware</emphasis>
+ </emphasis>. For more information about hardware requirements, see
+ <xref linkend="settinguplustresystem" />.</para>
</listitem>
<listitem>
- <para><emphasis role="bold">Downloaded and installed the Lustre software.</emphasis> For more information about preparing for and installing the Lustre software, see <xref linkend="installinglustre"/>.</para>
+ <para>
+ <emphasis role="bold">Downloaded and installed the Lustre
+ software.</emphasis>For more information about preparing for and
+ installing the Lustre software, see
+ <xref linkend="installinglustre" />.</para>
</listitem>
</itemizedlist>
- <para>The following optional steps should also be completed, if needed, before the Lustre software is configured:</para>
+ <para>The following optional steps should also be completed, if needed,
+ before the Lustre software is configured:</para>
<itemizedlist>
<listitem>
- <para><emphasis>Set up a hardware or software RAID on block devices to be used as OSTs or MDTs.</emphasis> For information about setting up RAID, see the documentation for your RAID controller or <xref linkend="configuringstorage"/>.</para>
+ <para>
+ <emphasis>Set up a hardware or software RAID on block devices to be
+ used as OSTs or MDTs.</emphasis>For information about setting up RAID,
+ see the documentation for your RAID controller or
+ <xref linkend="configuringstorage" />.</para>
</listitem>
<listitem>
- <para><emphasis>Set up network interface bonding on Ethernet interfaces.</emphasis> For information about setting up network interface bonding, see <xref linkend="settingupbonding"/>.</para>
+ <para>
+ <emphasis>Set up network interface bonding on Ethernet
+ interfaces.</emphasis>For information about setting up network
+ interface bonding, see
+ <xref linkend="settingupbonding" />.</para>
</listitem>
<listitem>
- <para><emphasis>Set</emphasis> lnet <emphasis>module parameters to specify how Lustre
- Networking (LNET) is to be configured to work with a Lustre file system and test the
- LNET configuration.</emphasis> LNET will, by default, use the first TCP/IP interface it
- discovers on a system. If this network configuration is sufficient, you do not need to
- configure LNET. LNET configuration is required if you are using InfiniBand or multiple
- Ethernet interfaces.</para>
+ <para>
+ <emphasis>Set</emphasis>lnet
+ <emphasis>module parameters to specify how Lustre Networking (LNET) is
+ to be configured to work with a Lustre file system and test the LNET
+ configuration.</emphasis>LNET will, by default, use the first TCP/IP
+ interface it discovers on a system. If this network configuration is
+ sufficient, you do not need to configure LNET. LNET configuration is
+ required if you are using InfiniBand or multiple Ethernet
+ interfaces.</para>
</listitem>
</itemizedlist>
- <para>For information about configuring LNET, see <xref linkend="configuringlnet"/>. For information about testing LNET, see <xref linkend="lnetselftest"/>.</para>
+ <para>For information about configuring LNET, see
+ <xref linkend="configuringlnet" />. For information about testing LNET, see
+
+ <xref linkend="lnetselftest" />.</para>
<itemizedlist>
<listitem>
- <para><emphasis>Run the benchmark script <literal>sgpdd-survey</literal> to determine
- baseline performance of your hardware.</emphasis> Benchmarking your hardware will
- simplify debugging performance issues that are unrelated to the Lustre software and ensure
- you are getting the best possible performance with your installation. For information
- about running <literal>sgpdd-survey</literal>, see <xref linkend="benchmarkingtests"
- />.</para>
+ <para>
+ <emphasis>Run the benchmark script
+ <literal>sgpdd-survey</literal> to determine baseline performance of
+ your hardware.</emphasis>Benchmarking your hardware will simplify
+ debugging performance issues that are unrelated to the Lustre software
+ and ensure you are getting the best possible performance with your
+ installation. For information about running
+ <literal>sgpdd-survey</literal>, see
+ <xref linkend="benchmarkingtests" />.</para>
</listitem>
</itemizedlist>
<note>
- <para>The <literal>sgpdd-survey</literal> script overwrites the device being tested so it must
- be run before the OSTs are configured.</para>
+ <para>The
+ <literal>sgpdd-survey</literal> script overwrites the device being tested
+ so it must be run before the OSTs are configured.</para>
</note>
- <para>To configure a simple Lustre file system, complete these steps:</para>
+ <para>To configure a simple Lustre file system, complete these
+ steps:</para>
<orderedlist>
<listitem>
- <para>Create a combined MGS/MDT file system on a block device. On the MDS node, run:</para>
- <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgs --mdt --index=0 <replaceable>/dev/block_device</replaceable></screen>
- <para>The default file system name (<literal>fsname</literal>) is <literal>lustre</literal>.</para>
+ <para>Create a combined MGS/MDT file system on a block device. On the
+ MDS node, run:</para>
+ <screen>
+mkfs.lustre --fsname=
+<replaceable>fsname</replaceable> --mgs --mdt --index=0
+<replaceable>/dev/block_device</replaceable>
+</screen>
+ <para>The default file system name (
+ <literal>fsname</literal>) is
+ <literal>lustre</literal>.</para>
<note>
- <para>If you plan to create multiple file systems, the MGS should be created separately on its own dedicated block device, by running:</para>
- <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgs <replaceable>/dev/block_device</replaceable></screen>
- <para>See <xref linkend="dbdoclet.50438194_88063"/> for more details.</para>
+ <para>If you plan to create multiple file systems, the MGS should be
+ created separately on its own dedicated block device, by
+ running:</para>
+ <screen>
+mkfs.lustre --fsname=
+<replaceable>fsname</replaceable> --mgs
+<replaceable>/dev/block_device</replaceable>
+</screen>
+ <para>See
+ <xref linkend="dbdoclet.50438194_88063" />for more details.</para>
</note>
</listitem>
<listitem xml:id="dbdoclet.addmdtindex">
- <para>Optional for Lustre software release 2.4 and later. Add in additional MDTs.</para>
- <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgsnode=<replaceable>nid</replaceable> --mdt --index=1 <replaceable>/dev/block_device</replaceable></screen>
- <note><para>Up to 4095 additional MDTs can be added.</para></note>
+ <para>Optional for Lustre software release 2.4 and later. Add in
+ additional MDTs.</para>
+ <screen>
+mkfs.lustre --fsname=
+<replaceable>fsname</replaceable> --mgsnode=
+<replaceable>nid</replaceable> --mdt --index=1
+<replaceable>/dev/block_device</replaceable>
+</screen>
+ <note>
+ <para>Up to 4095 additional MDTs can be added.</para>
+ </note>
</listitem>
<listitem>
- <para>Mount the combined MGS/MDT file system on the block device. On the MDS node, run:</para>
- <screen>mount -t lustre <replaceable>/dev/block_device</replaceable> <replaceable>/mount_point</replaceable></screen>
+ <para>Mount the combined MGS/MDT file system on the block device. On
+ the MDS node, run:</para>
+ <screen>
+mount -t lustre
+<replaceable>/dev/block_device</replaceable>
+<replaceable>/mount_point</replaceable>
+</screen>
<note>
- <para>If you have created an MGS and an MDT on separate block devices, mount them both.</para>
+ <para>If you have created an MGS and an MDT on separate block
+ devices, mount them both.</para>
</note>
</listitem>
<listitem xml:id="dbdoclet.50438267_pgfId-1290915">
<para>Create the OST. On the OSS node, run:</para>
- <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgsnode=<replaceable>MGS_NID</replaceable> --ost --index=<replaceable>OST_index</replaceable> <replaceable>/dev/block_device</replaceable></screen>
- <para>When you create an OST, you are formatting a <literal>ldiskfs</literal> or <literal>ZFS</literal> file system on a block storage device like you would with any local file system.</para>
- <para>You can have as many OSTs per OSS as the hardware or drivers allow. For more information about storage and memory requirements for a Lustre file system, see <xref linkend="settinguplustresystem"/>.</para>
- <para>You can only configure one OST per block device. You should create an OST that uses the raw block device and does not use partitioning.</para>
- <para>You should specify the OST index number at format time in order to simplify translating the OST number in error messages or file striping to the OSS node and block device later on.</para>
- <para>If you are using block devices that are accessible from multiple OSS nodes, ensure that you mount the OSTs from only one OSS node at at time. It is strongly recommended that multiple-mount protection be enabled for such devices to prevent serious data corruption. For more information about multiple-mount protection, see <xref linkend="managingfailover"/>.</para>
+ <screen>
+mkfs.lustre --fsname=
+<replaceable>fsname</replaceable> --mgsnode=
+<replaceable>MGS_NID</replaceable> --ost --index=
+<replaceable>OST_index</replaceable>
+<replaceable>/dev/block_device</replaceable>
+</screen>
+ <para>When you create an OST, you are formatting a
+ <literal>ldiskfs</literal> or
+ <literal>ZFS</literal> file system on a block storage device like you
+ would with any local file system.</para>
+ <para>You can have as many OSTs per OSS as the hardware or drivers
+ allow. For more information about storage and memory requirements for a
+ Lustre file system, see
+ <xref linkend="settinguplustresystem" />.</para>
+ <para>You can only configure one OST per block device. You should
+ create an OST that uses the raw block device and does not use
+ partitioning.</para>
+ <para>You should specify the OST index number at format time in order
+ to simplify translating the OST number in error messages or file
+ striping to the OSS node and block device later on.</para>
+ <para>If you are using block devices that are accessible from multiple
+ OSS nodes, ensure that you mount the OSTs from only one OSS node at at
+ time. It is strongly recommended that multiple-mount protection be
+ enabled for such devices to prevent serious data corruption. For more
+ information about multiple-mount protection, see
+ <xref linkend="managingfailover" />.</para>
<note>
- <para>The Lustre software currently supports block devices up to 128 TB on Red Hat
- Enterprise Linux 5 and 6 (up to 8 TB on other distributions). If the device size is only
- slightly larger that 16 TB, it is recommended that you limit the file system size to 16
- TB at format time. We recommend that you not place DOS partitions on top of RAID 5/6
- block devices due to negative impacts on performance, but instead format the whole disk
- for the file system.</para>
+ <para>The Lustre software currently supports block devices up to 128
+ TB on Red Hat Enterprise Linux 5 and 6 (up to 8 TB on other
+ distributions). If the device size is only slightly larger that 16
+ TB, it is recommended that you limit the file system size to 16 TB at
+ format time. We recommend that you not place DOS partitions on top of
+ RAID 5/6 block devices due to negative impacts on performance, but
+ instead format the whole disk for the file system.</para>
</note>
</listitem>
<listitem xml:id="dbdoclet.50438267_pgfId-1293955">
- <para>Mount the OST. On the OSS node where the OST was created, run:</para>
- <screen>mount -t lustre <replaceable>/dev/block_device</replaceable> <replaceable>/mount_point</replaceable></screen>
+ <para>Mount the OST. On the OSS node where the OST was created,
+ run:</para>
+ <screen>
+mount -t lustre
+<replaceable>/dev/block_device</replaceable>
+<replaceable>/mount_point</replaceable>
+</screen>
<note>
- <para>
- To create additional OSTs, repeat Step <xref linkend="dbdoclet.50438267_pgfId-1290915"/> and Step <xref linkend="dbdoclet.50438267_pgfId-1293955"/>, specifying the next higher OST index number.</para>
+ <para>To create additional OSTs, repeat Step
+ <xref linkend="dbdoclet.50438267_pgfId-1290915" />and Step
+ <xref linkend="dbdoclet.50438267_pgfId-1293955" />, specifying the
+ next higher OST index number.</para>
</note>
</listitem>
<listitem xml:id="dbdoclet.50438267_pgfId-1290934">
- <para>Mount the Lustre file system on the client. On the client node, run:</para>
- <screen>mount -t lustre <replaceable>MGS_node</replaceable>:/<replaceable>fsname</replaceable> <replaceable>/mount_point</replaceable>
+ <para>Mount the Lustre file system on the client. On the client node,
+ run:</para>
+ <screen>
+mount -t lustre
+<replaceable>MGS_node</replaceable>:/
+<replaceable>fsname</replaceable>
+<replaceable>/mount_point</replaceable>
</screen>
<note>
- <para>To create additional clients, repeat Step <xref linkend="dbdoclet.50438267_pgfId-1290934"/>.</para>
+ <para>To create additional clients, repeat Step
+ <xref linkend="dbdoclet.50438267_pgfId-1290934" />.</para>
</note>
<note>
- <para>If you have a problem mounting the file system, check the syslogs on the client and all the servers for errors and also check the network settings. A common issue with newly-installed systems is that <literal>hosts.deny</literal> or firewall rules may prevent connections on port 988.</para>
+ <para>If you have a problem mounting the file system, check the
+ syslogs on the client and all the servers for errors and also check
+ the network settings. A common issue with newly-installed systems is
+ that
+ <literal>hosts.deny</literal> or firewall rules may prevent
+ connections on port 988.</para>
</note>
</listitem>
<listitem>
- <para>Verify that the file system started and is working correctly. Do this by running <literal>lfs df</literal>, <literal>dd</literal> and <literal>ls</literal> commands on the client node.</para>
+ <para>Verify that the file system started and is working correctly. Do
+ this by running
+ <literal>lfs df</literal>,
+ <literal>dd</literal> and
+ <literal>ls</literal> commands on the client node.</para>
</listitem>
<listitem>
- <para><emphasis>(Optional)</emphasis> Run benchmarking tools to validate the performance of hardware and software layers in the cluster. Available tools include:</para>
+ <para>
+ <emphasis>(Optional)</emphasis>Run benchmarking tools to validate the
+ performance of hardware and software layers in the cluster. Available
+ tools include:</para>
<itemizedlist>
<listitem>
- <para><literal>obdfilter-survey</literal> - Characterizes the storage performance of a
- Lustre file system. For details, see <xref linkend="dbdoclet.50438212_26516"/>.</para>
+ <para>
+ <literal>obdfilter-survey</literal>- Characterizes the storage
+ performance of a Lustre file system. For details, see
+ <xref linkend="dbdoclet.50438212_26516" />.</para>
</listitem>
<listitem>
- <para><literal>ost-survey</literal> - Performs I/O against OSTs to detect anomalies
- between otherwise identical disk subsystems. For details, see <xref
- linkend="dbdoclet.50438212_85136"/>.</para>
+ <para>
+ <literal>ost-survey</literal>- Performs I/O against OSTs to detect
+ anomalies between otherwise identical disk subsystems. For details,
+ see
+ <xref linkend="dbdoclet.50438212_85136" />.</para>
</listitem>
</itemizedlist>
</listitem>
</orderedlist>
<section remap="h3">
- <title>
- <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>simple example</tertiary></indexterm>
- Simple Lustre Configuration Example</title>
- <para>To see the steps to complete for a simple Lustre file system configuration, follow this
- example in which a combined MGS/MDT and two OSTs are created to form a file system called
- <literal>temp</literal>. Three block devices are used, one for the combined MGS/MDS node
- and one for each OSS node. Common parameters used in the example are listed below, along
- with individual node parameters.</para>
+ <title>
+ <indexterm>
+ <primary>Lustre</primary>
+ <secondary>configuring</secondary>
+ <tertiary>simple example</tertiary>
+ </indexterm>Simple Lustre Configuration Example</title>
+ <para>To see the steps to complete for a simple Lustre file system
+ configuration, follow this example in which a combined MGS/MDT and two
+ OSTs are created to form a file system called
+ <literal>temp</literal>. Three block devices are used, one for the
+ combined MGS/MDS node and one for each OSS node. Common parameters used
+ in the example are listed below, along with individual node
+ parameters.</para>
<informaltable frame="all">
<tgroup cols="4">
- <colspec colname="c1" colwidth="2*"/>
- <colspec colname="c2" colwidth="25*"/>
- <colspec colname="c3" colwidth="25*"/>
- <colspec colname="c4" colwidth="25*"/>
+ <colspec colname="c1" colwidth="2*" />
+ <colspec colname="c2" colwidth="25*" />
+ <colspec colname="c3" colwidth="25*" />
+ <colspec colname="c4" colwidth="25*" />
<thead>
<row>
<entry nameend="c2" namest="c1">
- <para><emphasis role="bold">Common Parameters</emphasis></para>
+ <para>
+ <emphasis role="bold">Common Parameters</emphasis>
+ </para>
</entry>
<entry>
- <para><emphasis role="bold">Value</emphasis></para>
+ <para>
+ <emphasis role="bold">Value</emphasis>
+ </para>
</entry>
<entry>
- <para><emphasis role="bold">Description</emphasis></para>
+ <para>
+ <emphasis role="bold">Description</emphasis>
+ </para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">MGS node</emphasis></para>
+ <para>
+ <emphasis role="bold">MGS node</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>10.2.0.1@tcp0</literal></para>
+ <para>
+ <literal>10.2.0.1@tcp0</literal>
+ </para>
</entry>
<entry>
<para>Node for the combined MGS/MDS</para>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">file system</emphasis></para>
+ <para>
+ <emphasis role="bold">file system</emphasis>
+ </para>
</entry>
<entry>
- <para><literal> temp</literal></para>
+ <para>
+ <literal>temp</literal>
+ </para>
</entry>
<entry>
<para>Name of the Lustre file system</para>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">network type</emphasis></para>
+ <para>
+ <emphasis role="bold">network type</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>TCP/IP</literal></para>
+ <para>
+ <literal>TCP/IP</literal>
+ </para>
</entry>
<entry>
- <para>Network type used for Lustre file system <literal>temp</literal></para>
+ <para>Network type used for Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
</tbody>
</informaltable>
<informaltable frame="all">
<tgroup cols="4">
- <colspec colname="c1" colwidth="25*"/>
- <colspec colname="c2" colwidth="25*"/>
- <colspec colname="c3" colwidth="25*"/>
- <colspec colname="c4" colwidth="25*"/>
+ <colspec colname="c1" colwidth="25*" />
+ <colspec colname="c2" colwidth="25*" />
+ <colspec colname="c3" colwidth="25*" />
+ <colspec colname="c4" colwidth="25*" />
<thead>
<row>
<entry nameend="c2" namest="c1">
- <para><emphasis role="bold">Node Parameters</emphasis></para>
+ <para>
+ <emphasis role="bold">Node Parameters</emphasis>
+ </para>
</entry>
<entry>
- <para><emphasis role="bold">Value</emphasis></para>
+ <para>
+ <emphasis role="bold">Value</emphasis>
+ </para>
</entry>
<entry>
- <para><emphasis role="bold">Description</emphasis></para>
+ <para>
+ <emphasis role="bold">Description</emphasis>
+ </para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry nameend="c4" namest="c1">
- <para> MGS/MDS node</para>
+ <para>MGS/MDS node</para>
</entry>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">MGS/MDS node</emphasis></para>
+ <para>
+ <emphasis role="bold">MGS/MDS node</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>mdt0</literal></para>
+ <para>
+ <literal>mdt0</literal>
+ </para>
</entry>
<entry>
- <para>MDS in Lustre file system <literal>temp</literal></para>
+ <para>MDS in Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">block device</emphasis></para>
+ <para>
+ <emphasis role="bold">block device</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>/dev/sdb</literal></para>
+ <para>
+ <literal>/dev/sdb</literal>
+ </para>
</entry>
<entry>
<para>Block device for the combined MGS/MDS node</para>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">mount point</emphasis></para>
+ <para>
+ <emphasis role="bold">mount point</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>/mnt/mdt</literal></para>
+ <para>
+ <literal>/mnt/mdt</literal>
+ </para>
</entry>
<entry>
- <para>Mount point for the <literal>mdt0</literal> block device (<literal>/dev/sdb</literal>) on the MGS/MDS node</para>
+ <para>Mount point for the
+ <literal>mdt0</literal> block device (
+ <literal>/dev/sdb</literal>) on the MGS/MDS node</para>
</entry>
</row>
<row>
<entry nameend="c4" namest="c1">
- <para> First OSS node</para>
+ <para>First OSS node</para>
</entry>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">OSS node</emphasis></para>
+ <para>
+ <emphasis role="bold">OSS node</emphasis>
+ </para>
</entry>
<entry>
- <para><literal> oss0</literal></para>
+ <para>
+ <literal>oss0</literal>
+ </para>
</entry>
<entry>
- <para>First OSS node in Lustre file system <literal>temp</literal></para>
+ <para>First OSS node in Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">OST</emphasis></para>
+ <para>
+ <emphasis role="bold">OST</emphasis>
+ </para>
</entry>
<entry>
- <para><literal>ost0</literal></para>
+ <para>
+ <literal>ost0</literal>
+ </para>
</entry>
<entry>
- <para>First OST in Lustre file system <literal>temp</literal></para>
+ <para>First OST in Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">block device</emphasis></para>
+ <para>
+ <emphasis role="bold">block device</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>/dev/sdc</literal></para>
+ <para>
+ <literal>/dev/sdc</literal>
+ </para>
</entry>
<entry>
- <para>Block device for the first OSS node (<literal>oss0</literal>)</para>
+ <para>Block device for the first OSS node (
+ <literal>oss0</literal>)</para>
</entry>
</row>
<row>
<entry>
- <para>  </para>
+ <para> </para>
</entry>
<entry>
- <para> <emphasis role="bold">mount point</emphasis></para>
+ <para>
+ <emphasis role="bold">mount point</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>/mnt/ost0</literal></para>
+ <para>
+ <literal>/mnt/ost0</literal>
+ </para>
</entry>
<entry>
- <para> Mount point for the <literal>ost0</literal> block device (<literal>/dev/sdc</literal>) on the <literal>oss1</literal> node</para>
+ <para>Mount point for the
+ <literal>ost0</literal> block device (
+ <literal>/dev/sdc</literal>) on the
+ <literal>oss1</literal> node</para>
</entry>
</row>
<row>
<entry nameend="c4" namest="c1">
- <para> Second OSS node</para>
+ <para>Second OSS node</para>
</entry>
</row>
<row>
<entry>
- <para> </para>
+ <para></para>
</entry>
<entry>
- <para> <emphasis role="bold">OSS node</emphasis></para>
+ <para>
+ <emphasis role="bold">OSS node</emphasis>
+ </para>
</entry>
<entry>
- <para><literal>oss1</literal></para>
+ <para>
+ <literal>oss1</literal>
+ </para>
</entry>
<entry>
- <para>Second OSS node in Lustre file system <literal>temp</literal></para>
+ <para>Second OSS node in Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
<row>
<entry>
- <para> </para>
+ <para></para>
</entry>
<entry>
- <para> <emphasis role="bold">OST</emphasis></para>
+ <para>
+ <emphasis role="bold">OST</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>ost1</literal></para>
+ <para>
+ <literal>ost1</literal>
+ </para>
</entry>
<entry>
- <para>Second OST in Lustre file system <literal>temp</literal></para>
+ <para>Second OST in Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
<row>
- <entry/>
+ <entry />
<entry>
- <para> <emphasis role="bold">block device</emphasis></para>
+ <para>
+ <emphasis role="bold">block device</emphasis>
+ </para>
</entry>
<entry>
- <para><literal>/dev/sdd</literal></para>
+ <para>
+ <literal>/dev/sdd</literal>
+ </para>
</entry>
<entry>
<para>Block device for the second OSS node (oss1)</para>
</row>
<row>
<entry>
- <para> </para>
+ <para></para>
</entry>
<entry>
- <para> <emphasis role="bold">mount point</emphasis></para>
+ <para>
+ <emphasis role="bold">mount point</emphasis>
+ </para>
</entry>
<entry>
- <para><literal>/mnt/ost1</literal></para>
+ <para>
+ <literal>/mnt/ost1</literal>
+ </para>
</entry>
<entry>
- <para> Mount point for the <literal>ost1</literal> block device (<literal>/dev/sdd</literal>) on the <literal>oss1</literal> node</para>
+ <para>Mount point for the
+ <literal>ost1</literal> block device (
+ <literal>/dev/sdd</literal>) on the
+ <literal>oss1</literal> node</para>
</entry>
</row>
<row>
<entry nameend="c4" namest="c1">
- <para> Client node</para>
+ <para>Client node</para>
</entry>
</row>
<row>
<entry>
- <para> </para>
+ <para></para>
</entry>
<entry>
- <para> <emphasis role="bold">client node</emphasis></para>
+ <para>
+ <emphasis role="bold">client node</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>client1</literal></para>
+ <para>
+ <literal>client1</literal>
+ </para>
</entry>
<entry>
- <para>Client in Lustre file system <literal>temp</literal></para>
+ <para>Client in Lustre file system
+ <literal>temp</literal></para>
</entry>
</row>
<row>
<entry>
- <para> </para>
+ <para></para>
</entry>
<entry>
- <para> <emphasis role="bold">mount point</emphasis></para>
+ <para>
+ <emphasis role="bold">mount point</emphasis>
+ </para>
</entry>
<entry>
- <para> <literal>/lustre</literal></para>
+ <para>
+ <literal>/lustre</literal>
+ </para>
</entry>
<entry>
- <para>Mount point for Lustre file system <literal>temp</literal> on the
- <literal>client1</literal> node</para>
+ <para>Mount point for Lustre file system
+ <literal>temp</literal> on the
+ <literal>client1</literal> node</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<note>
- <para>We recommend that you use 'dotted-quad' notation for IP addresses rather than host names to make it easier to read debug logs and debug configurations with multiple interfaces.</para>
+ <para>We recommend that you use 'dotted-quad' notation for IP addresses
+ rather than host names to make it easier to read debug logs and debug
+ configurations with multiple interfaces.</para>
</note>
<para>For this example, complete the steps below:</para>
<orderedlist>
<listitem>
- <para>Create a combined MGS/MDT file system on the block device. On the MDS node, run:</para>
- <screen>[root@mds /]# mkfs.lustre --fsname=temp --mgs --mdt --index=0 /dev/sdb</screen>
+ <para>Create a combined MGS/MDT file system on the block device. On
+ the MDS node, run:</para>
+ <screen>
+[root@mds /]# mkfs.lustre --fsname=temp --mgs --mdt --index=0 /dev/sdb
+</screen>
<para>This command generates this output:</para>
- <screen> Permanent disk data:
+ <screen>
+ Permanent disk data:
Target: temp-MDT0000
Index: 0
Lustre FS: temp
options -i 4096 -I 512 -q -O dir_index,uninit_groups -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-MDTffff -i 4096 -I 512 -q -O
dir_index,uninit_groups -F /dev/sdb
-Writing CONFIGS/mountdata </screen>
+Writing CONFIGS/mountdata
+</screen>
</listitem>
<listitem>
- <para>Mount the combined MGS/MDT file system on the block device. On the MDS node, run:</para>
- <screen>[root@mds /]# mount -t lustre /dev/sdb /mnt/mdt</screen>
+ <para>Mount the combined MGS/MDT file system on the block device. On
+ the MDS node, run:</para>
+ <screen>
+[root@mds /]# mount -t lustre /dev/sdb /mnt/mdt
+</screen>
<para>This command generates this output:</para>
- <screen>Lustre: temp-MDT0000: new disk, initializing
+ <screen>
+Lustre: temp-MDT0000: new disk, initializing
Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_identity_upcall()) temp-MDT0000:
group upcall set to /usr/sbin/l_getidentity
Lustre: temp-MDT0000.mdt: set parameter identity_upcall=/usr/sbin/l_getidentity
-Lustre: Server temp-MDT0000 on device /dev/sdb has started </screen>
+Lustre: Server temp-MDT0000 on device /dev/sdb has started
+</screen>
</listitem>
<listitem xml:id="dbdoclet.50438267_pgfId-1291170">
- <para>Create and mount <literal>ost0</literal>.</para>
- <para>In this example, the OSTs (<literal>ost0</literal> and <literal>ost1</literal>) are being created on different OSS nodes (<literal>oss0</literal> and <literal>oss1</literal> respectively).</para>
+ <para>Create and mount
+ <literal>ost0</literal>.</para>
+ <para>In this example, the OSTs (
+ <literal>ost0</literal> and
+ <literal>ost1</literal>) are being created on different OSS nodes (
+ <literal>oss0</literal> and
+ <literal>oss1</literal> respectively).</para>
<orderedlist>
<listitem>
- <para>Create <literal>ost0</literal>. On <literal>oss0</literal> node, run:</para>
- <screen>[root@oss0 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost --index=0 /dev/sdc</screen>
+ <para>Create
+ <literal>ost0</literal>. On
+ <literal>oss0</literal> node, run:</para>
+ <screen>
+[root@oss0 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost
+--index=0 /dev/sdc
+</screen>
<para>The command generates this output:</para>
- <screen> Permanent disk data:
+ <screen>
+ Permanent disk data:
Target: temp-OST0000
Index: 0
Lustre FS: temp
options -I 256 -q -O dir_index,uninit_groups -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0000 -I 256 -q -O
dir_index,uninit_groups -F /dev/sdc
-Writing CONFIGS/mountdata </screen>
+Writing CONFIGS/mountdata
+</screen>
</listitem>
<listitem>
- <para>Mount ost0 on the OSS on which it was created. On <literal>oss0</literal> node, run:</para>
- <screen>root@oss0 /] mount -t lustre /dev/sdc /mnt/ost0</screen>
+ <para>Mount ost0 on the OSS on which it was created. On
+ <literal>oss0</literal> node, run:</para>
+ <screen>
+root@oss0 /] mount -t lustre /dev/sdc /mnt/ost0
+</screen>
<para>The command generates this output:</para>
- <screen>LDISKFS-fs: file extents enabled
+ <screen>
+LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
Lustre: temp-OST0000: new disk, initializing
-Lustre: Server temp-OST0000 on device /dev/sdb has started</screen>
+Lustre: Server temp-OST0000 on device /dev/sdb has started
+</screen>
<para>Shortly afterwards, this output appears:</para>
- <screen>Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
-Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans </screen>
+ <screen>
+Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
+Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans
+</screen>
</listitem>
</orderedlist>
</listitem>
<listitem>
- <para>Create and mount <literal>ost1</literal>.</para>
+ <para>Create and mount
+ <literal>ost1</literal>.</para>
<orderedlist>
<listitem>
- <para>Create ost1. On <literal>oss1</literal> node, run:</para>
- <screen>[root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \
- --ost --index=1 /dev/sdd</screen>
+ <para>Create ost1. On
+ <literal>oss1</literal> node, run:</para>
+ <screen>
+[root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \
+ --ost --index=1 /dev/sdd
+</screen>
<para>The command generates this output:</para>
- <screen> Permanent disk data:
+ <screen>
+ Permanent disk data:
Target: temp-OST0001
Index: 1
Lustre FS: temp
options -I 256 -q -O dir_index,uninit_groups -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0001 -I 256 -q -O
dir_index,uninit_groups -F /dev/sdc
-Writing CONFIGS/mountdata </screen>
+Writing CONFIGS/mountdata
+</screen>
</listitem>
<listitem>
- <para>Mount ost1 on the OSS on which it was created. On <literal>oss1</literal> node, run:</para>
- <screen>root@oss1 /] mount -t lustre /dev/sdd /mnt/ost1 </screen>
+ <para>Mount ost1 on the OSS on which it was created. On
+ <literal>oss1</literal> node, run:</para>
+ <screen>
+root@oss1 /] mount -t lustre /dev/sdd /mnt/ost1
+</screen>
<para>The command generates this output:</para>
- <screen>LDISKFS-fs: file extents enabled
+ <screen>
+LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
Lustre: temp-OST0001: new disk, initializing
-Lustre: Server temp-OST0001 on device /dev/sdb has started</screen>
+Lustre: Server temp-OST0001 on device /dev/sdb has started
+</screen>
<para>Shortly afterwards, this output appears:</para>
- <screen>Lustre: temp-OST0001: received MDS connection from 10.2.0.1@tcp0
-Lustre: MDS temp-MDT0000: temp-OST0001_UUID now active, resetting orphans </screen>
+ <screen>
+Lustre: temp-OST0001: received MDS connection from 10.2.0.1@tcp0
+Lustre: MDS temp-MDT0000: temp-OST0001_UUID now active, resetting orphans
+</screen>
</listitem>
</orderedlist>
</listitem>
<listitem>
- <para>Mount the Lustre file system on the client. On the client node, run:</para>
- <screen>root@client1 /] mount -t lustre 10.2.0.1@tcp0:/temp /lustre </screen>
+ <para>Mount the Lustre file system on the client. On the client node,
+ run:</para>
+ <screen>
+root@client1 /] mount -t lustre 10.2.0.1@tcp0:/temp /lustre
+</screen>
<para>This command generates this output:</para>
- <screen>Lustre: Client temp-client has started</screen>
+ <screen>
+Lustre: Client temp-client has started
+</screen>
</listitem>
<listitem>
- <para>Verify that the file system started and is working by running the <literal>df</literal>, <literal>dd</literal> and <literal>ls</literal> commands on the client node.</para>
+ <para>Verify that the file system started and is working by running
+ the
+ <literal>df</literal>,
+ <literal>dd</literal> and
+ <literal>ls</literal> commands on the client node.</para>
<orderedlist>
<listitem>
- <para>Run the <literal>lfs df -h</literal> command:</para>
- <screen>[root@client1 /] lfs df -h </screen>
- <para>The <literal>lfs df -h</literal> command lists space usage per OST and the MDT in human-readable format. This command generates output similar to this:</para>
+ <para>Run the
+ <literal>lfs df -h</literal> command:</para>
+ <screen>
+[root@client1 /] lfs df -h
+</screen>
+ <para>The
+ <literal>lfs df -h</literal> command lists space usage per OST and
+ the MDT in human-readable format. This command generates output
+ similar to this:</para>
<screen>
UUID bytes Used Available Use% Mounted on
temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% /lustre[MDT:0]
temp-OST0000_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:0]
temp-OST0001_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:1]
-filesystem summary: 1.6T 800.0M 1.6T 0% /lustre</screen>
+filesystem summary: 1.6T 800.0M 1.6T 0% /lustre
+</screen>
</listitem>
<listitem>
- <para>Run the <literal>lfs df -ih</literal> command.</para>
- <screen>[root@client1 /] lfs df -ih</screen>
- <para>The <literal>lfs df -ih</literal> command lists inode usage per OST and the MDT. This command generates output similar to this:</para>
+ <para>Run the
+ <literal>lfs df -ih</literal> command.</para>
+ <screen>
+[root@client1 /] lfs df -ih
+</screen>
+ <para>The
+ <literal>lfs df -ih</literal> command lists inode usage per OST
+ and the MDT. This command generates output similar to
+ this:</para>
<screen>
UUID Inodes IUsed IFree IUse% Mounted on
temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0]
temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0]
temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1]
-filesystem summary: 2.5M 32 2.5M 0% /lustre</screen>
+filesystem summary: 2.5M 32 2.5M 0% /lustre
+</screen>
</listitem>
<listitem>
- <para>Run the <literal>dd</literal> command:</para>
- <screen>[root@client1 /] cd /lustre
-[root@client1 /lustre] dd if=/dev/zero of=/lustre/zero.dat bs=4M count=2</screen>
- <para>The <literal>dd</literal> command verifies write functionality by creating a file containing all zeros (<literal>0</literal>s). In this command, an 8 MB file is created. This command generates output similar to this:</para>
- <screen>2+0 records in
+ <para>Run the
+ <literal>dd</literal> command:</para>
+ <screen>
+[root@client1 /] cd /lustre
+[root@client1 /lustre] dd if=/dev/zero of=/lustre/zero.dat bs=4M count=2
+</screen>
+ <para>The
+ <literal>dd</literal> command verifies write functionality by
+ creating a file containing all zeros (
+ <literal>0</literal>s). In this command, an 8 MB file is created.
+ This command generates output similar to this:</para>
+ <screen>
+2+0 records in
2+0 records out
-8388608 bytes (8.4 MB) copied, 0.159628 seconds, 52.6 MB/s</screen>
+8388608 bytes (8.4 MB) copied, 0.159628 seconds, 52.6 MB/s
+</screen>
</listitem>
<listitem>
- <para>Run the <literal>ls</literal> command:</para>
- <screen>[root@client1 /lustre] ls -lsah</screen>
- <para>The <literal>ls -lsah</literal> command lists files and directories in the current working directory. This command generates output similar to this:</para>
- <screen>total 8.0M
+ <para>Run the
+ <literal>ls</literal> command:</para>
+ <screen>
+[root@client1 /lustre] ls -lsah
+</screen>
+ <para>The
+ <literal>ls -lsah</literal> command lists files and directories in
+ the current working directory. This command generates output
+ similar to this:</para>
+ <screen>
+total 8.0M
4.0K drwxr-xr-x 2 root root 4.0K Oct 16 15:27 .
8.0K drwxr-xr-x 25 root root 4.0K Oct 16 15:27 ..
8.0M -rw-r--r-- 1 root root 8.0M Oct 16 15:27 zero.dat
</orderedlist>
</listitem>
</orderedlist>
- <para>Once the Lustre file system is configured, it is ready for use.</para>
+ <para>Once the Lustre file system is configured, it is ready for
+ use.</para>
</section>
</section>
<section xml:id="dbdoclet.50438267_76752">
- <title>
- <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>additional options</tertiary></indexterm>
- Additional Configuration Options</title>
- <para>This section describes how to scale the Lustre file system or make configuration changes using the Lustre configuration utilities.</para>
+ <title>
+ <indexterm>
+ <primary>Lustre</primary>
+ <secondary>configuring</secondary>
+ <tertiary>additional options</tertiary>
+ </indexterm>Additional Configuration Options</title>
+ <para>This section describes how to scale the Lustre file system or make
+ configuration changes using the Lustre configuration utilities.</para>
<section remap="h3">
- <title>
- <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>for scale</tertiary></indexterm>
- Scaling the Lustre File System</title>
- <para>A Lustre file system can be scaled by adding OSTs or clients. For instructions on creating additional OSTs repeat Step <xref linkend="dbdoclet.50438267_pgfId-1291170"/> and Step <xref linkend="dbdoclet.50438267_pgfId-1293955"/> above. For mounting additional clients, repeat Step <xref linkend="dbdoclet.50438267_pgfId-1290934"/> for each client.</para>
+ <title>
+ <indexterm>
+ <primary>Lustre</primary>
+ <secondary>configuring</secondary>
+ <tertiary>for scale</tertiary>
+ </indexterm>Scaling the Lustre File System</title>
+ <para>A Lustre file system can be scaled by adding OSTs or clients. For
+ instructions on creating additional OSTs repeat Step
+ <xref linkend="dbdoclet.50438267_pgfId-1291170" />and Step
+ <xref linkend="dbdoclet.50438267_pgfId-1293955" />above. For mounting
+ additional clients, repeat Step
+ <xref linkend="dbdoclet.50438267_pgfId-1290934" />for each client.</para>
</section>
<section remap="h3">
- <title>
- <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>striping</tertiary></indexterm>
- Changing Striping Defaults</title>
- <para>The default settings for the file layout stripe pattern are shown in <xref linkend="configuringlustre.tab.stripe"/>.</para>
+ <title>
+ <indexterm>
+ <primary>Lustre</primary>
+ <secondary>configuring</secondary>
+ <tertiary>striping</tertiary>
+ </indexterm>Changing Striping Defaults</title>
+ <para>The default settings for the file layout stripe pattern are shown
+ in
+ <xref linkend="configuringlustre.tab.stripe" />.</para>
<table frame="none" xml:id="configuringlustre.tab.stripe">
<title>Default stripe pattern</title>
<tgroup cols="3">
- <colspec colname="c1" colwidth="13*"/>
- <colspec colname="c2" colwidth="13*"/>
- <colspec colname="c3" colwidth="13*"/>
+ <colspec colname="c1" colwidth="13*" />
+ <colspec colname="c2" colwidth="13*" />
+ <colspec colname="c3" colwidth="13*" />
<tbody>
<row>
<entry>
- <para><emphasis role="bold">File Layout Parameter</emphasis></para>
+ <para>
+ <emphasis role="bold">File Layout Parameter</emphasis>
+ </para>
</entry>
<entry>
- <para><emphasis role="bold">Default</emphasis></para>
+ <para>
+ <emphasis role="bold">Default</emphasis>
+ </para>
</entry>
<entry>
- <para><emphasis role="bold">Description</emphasis></para>
+ <para>
+ <emphasis role="bold">Description</emphasis>
+ </para>
</entry>
</row>
<row>
<entry>
- <para> <literal>stripe_size</literal></para>
+ <para>
+ <literal>stripe_size</literal>
+ </para>
</entry>
<entry>
- <para> 1 MB</para>
+ <para>1 MB</para>
</entry>
<entry>
- <para> Amount of data to write to one OST before moving to the next OST.</para>
+ <para>Amount of data to write to one OST before moving to the
+ next OST.</para>
</entry>
</row>
<row>
<entry>
- <para> <literal>stripe_count</literal></para>
+ <para>
+ <literal>stripe_count</literal>
+ </para>
</entry>
<entry>
- <para> 1</para>
+ <para>1</para>
</entry>
<entry>
- <para> The number of OSTs to use for a single file.</para>
+ <para>The number of OSTs to use for a single file.</para>
</entry>
</row>
<row>
<entry>
- <para> <literal>start_ost</literal></para>
+ <para>
+ <literal>start_ost</literal>
+ </para>
</entry>
<entry>
- <para> -1</para>
+ <para>-1</para>
</entry>
<entry>
- <para> The first OST where objects are created for each file. The default -1 allows the MDS to choose the starting index based on available space and load balancing. <emphasis>It's strongly recommended not to change the default for this parameter to a value other than -1.</emphasis></para>
+ <para>The first OST where objects are created for each file.
+ The default -1 allows the MDS to choose the starting index
+ based on available space and load balancing.
+ <emphasis>It's strongly recommended not to change the default
+ for this parameter to a value other than -1.</emphasis></para>
</entry>
</row>
</tbody>
</tgroup>
</table>
- <para>Use the <literal>lfs setstripe</literal> command described in <xref linkend="managingstripingfreespace"/> to change the file layout configuration.</para>
+ <para>Use the
+ <literal>lfs setstripe</literal> command described in
+ <xref linkend="managingstripingfreespace" />to change the file layout
+ configuration.</para>
</section>
<section remap="h3">
- <title>
- <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>utilities</tertiary></indexterm>
- Using the Lustre Configuration Utilities</title>
- <para>If additional configuration is necessary, several configuration utilities are available:</para>
+ <title>
+ <indexterm>
+ <primary>Lustre</primary>
+ <secondary>configuring</secondary>
+ <tertiary>utilities</tertiary>
+ </indexterm>Using the Lustre Configuration Utilities</title>
+ <para>If additional configuration is necessary, several configuration
+ utilities are available:</para>
<itemizedlist>
<listitem>
- <para><literal>mkfs.lustre</literal> - Use to format a disk for a Lustre service.</para>
+ <para>
+ <literal>mkfs.lustre</literal>- Use to format a disk for a Lustre
+ service.</para>
</listitem>
<listitem>
- <para><literal>tunefs.lustre</literal> - Use to modify configuration information on a Lustre target disk.</para>
+ <para>
+ <literal>tunefs.lustre</literal>- Use to modify configuration
+ information on a Lustre target disk.</para>
</listitem>
<listitem>
- <para><literal>lctl</literal> - Use to directly control Lustre features via an
- <literal>ioctl</literal> interface, allowing various configuration, maintenance and
- debugging features to be accessed.</para>
+ <para>
+ <literal>lctl</literal>- Use to directly control Lustre features via
+ an
+ <literal>ioctl</literal> interface, allowing various configuration,
+ maintenance and debugging features to be accessed.</para>
</listitem>
<listitem>
- <para><literal>mount.lustre</literal> - Use to start a Lustre client or target service.</para>
+ <para>
+ <literal>mount.lustre</literal>- Use to start a Lustre client or
+ target service.</para>
</listitem>
</itemizedlist>
- <para>For examples using these utilities, see the topic <xref linkend="systemconfigurationutilities"/></para>
- <para>The <literal>lfs</literal> utility is useful for configuring and querying a variety of options related to files. For more information, see <xref linkend="userutilities"/>.</para>
+ <para>For examples using these utilities, see the topic
+ <xref linkend="systemconfigurationutilities" /></para>
+ <para>The
+ <literal>lfs</literal> utility is useful for configuring and querying a
+ variety of options related to files. For more information, see
+ <xref linkend="userutilities" />.</para>
<note>
- <para>Some sample scripts are included in the directory where the Lustre software is
- installed. If you have installed the Lustre source code, the scripts are located in the
- <literal>lustre/tests</literal> sub-directory. These scripts enable quick setup of some
- simple standard Lustre configurations.</para>
+ <para>Some sample scripts are included in the directory where the
+ Lustre software is installed. If you have installed the Lustre source
+ code, the scripts are located in the
+ <literal>lustre/tests</literal> sub-directory. These scripts enable
+ quick setup of some simple standard Lustre configurations.</para>
</note>
</section>
</section>
-<?xml version="1.0" encoding="UTF-8"?>
-<glossary xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US">
+<?xml version="1.0" encoding="utf-8"?>
+<glossary xmlns="http://docbook.org/ns/docbook"
+xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US">
<title>Glossary</title>
<glossdiv>
<title>A</title>
<glossentry xml:id="acl">
<glossterm>ACL</glossterm>
<glossdef>
- <para>Access control list. An extended attribute associated with a file that contains
- enhanced authorization directives.</para>
+ <para>Access control list. An extended attribute associated with a file
+ that contains enhanced authorization directives.</para>
</glossdef>
</glossentry>
<glossentry xml:id="ostfail">
<glossterm>Administrative OST failure</glossterm>
<glossdef>
- <para>A manual configuration change to mark an OST as unavailable, so that operations
- intended for that OST fail immediately with an I/O error instead of waiting indefinitely
- for OST recovery to complete</para>
+ <para>A manual configuration change to mark an OST as unavailable, so
+ that operations intended for that OST fail immediately with an I/O
+ error instead of waiting indefinitely for OST recovery to
+ complete</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>C</title>
<glossentry xml:id="completioncallback">
- <glossterm>Completion callback </glossterm>
+ <glossterm>Completion callback</glossterm>
<glossdef>
- <para>An RPC made by the lock server on an OST or MDT to another system, usually a client,
- to indicate that the lock is now granted.</para>
+ <para>An RPC made by the lock server on an OST or MDT to another
+ system, usually a client, to indicate that the lock is now
+ granted.</para>
</glossdef>
</glossentry>
<glossentry xml:id="changelog">
- <glossterm>configlog </glossterm>
+ <glossterm>configlog</glossterm>
<glossdef>
- <para>An llog file used in a node, or retrieved from a management server over the network
- with configuration instructions for the Lustre file system at startup time.</para>
+ <para>An llog file used in a node, or retrieved from a management
+ server over the network with configuration instructions for the Lustre
+ file system at startup time.</para>
</glossdef>
</glossentry>
<glossentry xml:id="configlock">
- <glossterm>Configuration lock </glossterm>
+ <glossterm>Configuration lock</glossterm>
<glossdef>
- <para>A lock held by every node in the cluster to control configuration changes. When the
- configuration is changed on the MGS, it revokes this lock from all nodes. When the nodes
- receive the blocking callback, they quiesce their traffic, cancel and re-enqueue the lock
- and wait until it is granted again. They can then fetch the configuration updates and
- resume normal operation.</para>
+ <para>A lock held by every node in the cluster to control configuration
+ changes. When the configuration is changed on the MGS, it revokes this
+ lock from all nodes. When the nodes receive the blocking callback, they
+ quiesce their traffic, cancel and re-enqueue the lock and wait until it
+ is granted again. They can then fetch the configuration updates and
+ resume normal operation.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>D</title>
<glossentry xml:id="defaultstrippattern">
- <glossterm>Default stripe pattern </glossterm>
+ <glossterm>Default stripe pattern</glossterm>
<glossdef>
- <para>Information in the LOV descriptor that describes the default stripe count, stripe
- size, and layout pattern used for new files in a file system. This can be amended by using
- a directory stripe descriptor or a per-file stripe descriptor.</para>
+ <para>Information in the LOV descriptor that describes the default
+ stripe count, stripe size, and layout pattern used for new files in a
+ file system. This can be amended by using a directory stripe descriptor
+ or a per-file stripe descriptor.</para>
</glossdef>
</glossentry>
<glossentry xml:id="directio">
- <glossterm>Direct I/O </glossterm>
+ <glossterm>Direct I/O</glossterm>
<glossdef>
- <para>A mechanism that can be used during read and write system calls to avoid memory cache
- overhead for large I/O requests. It bypasses the data copy between application and kernel
- memory, and avoids buffering the data in the client memory.</para>
+ <para>A mechanism that can be used during read and write system calls
+ to avoid memory cache overhead for large I/O requests. It bypasses the
+ data copy between application and kernel memory, and avoids buffering
+ the data in the client memory.</para>
</glossdef>
</glossentry>
<glossentry xml:id="dirstripdesc">
- <glossterm>Directory stripe descriptor </glossterm>
+ <glossterm>Directory stripe descriptor</glossterm>
<glossdef>
- <para>An extended attribute that describes the default stripe pattern for new files created
- within that directory. This is also inherited by new subdirectories at the time they are
- created.</para>
+ <para>An extended attribute that describes the default stripe pattern
+ for new files created within that directory. This is also inherited by
+ new subdirectories at the time they are created.</para>
</glossdef>
</glossentry>
<glossentry xml:id="DNE">
<glossterm>Distributed namespace (DNE)</glossterm>
<glossdef>
- <para>A collection of metadata targets implementing a single file system namespace.</para>
+ <para>A collection of metadata targets implementing a single file
+ system namespace.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>E</title>
<glossentry xml:id="ea">
- <glossterm>EA
- </glossterm>
+ <glossterm>EA</glossterm>
<glossdef>
- <para>Extended attribute. A small amount of data that can be retrieved through a name (EA or
- attr) associated with a particular inode. A Lustre file system uses EAs to store striping
- information (indicating the location of file data on OSTs). Examples of extended
- attributes are ACLs, striping information, and the FID of the file.</para>
+ <para>Extended attribute. A small amount of data that can be retrieved
+ through a name (EA or attr) associated with a particular inode. A
+ Lustre file system uses EAs to store striping information (indicating
+ the location of file data on OSTs). Examples of extended attributes are
+ ACLs, striping information, and the FID of the file.</para>
</glossdef>
</glossentry>
<glossentry xml:id="eviction">
- <glossterm>Eviction
- </glossterm>
+ <glossterm>Eviction</glossterm>
<glossdef>
- <para>The process of removing a client's state from the server if the client is unresponsive
- to server requests after a timeout or if server recovery fails. If a client is still
- running, it is required to flush the cache associated with the server when it becomes
- aware that it has been evicted.</para>
+ <para>The process of removing a client's state from the server if the
+ client is unresponsive to server requests after a timeout or if server
+ recovery fails. If a client is still running, it is required to flush
+ the cache associated with the server when it becomes aware that it has
+ been evicted.</para>
</glossdef>
</glossentry>
<glossentry xml:id="export">
- <glossterm>Export
- </glossterm>
+ <glossterm>Export</glossterm>
<glossdef>
- <para>The state held by a server for a client that is sufficient to transparently recover all in-flight operations when a single failure occurs.</para>
+ <para>The state held by a server for a client that is sufficient to
+ transparently recover all in-flight operations when a single failure
+ occurs.</para>
</glossdef>
</glossentry>
<glossentry>
- <glossterm>Extent </glossterm>
+ <glossterm>Extent</glossterm>
<glossdef>
- <para>A range of contiguous bytes or blocks in a file that are addressed by a {start,
- length} tuple instead of individual block numbers.</para>
+ <para>A range of contiguous bytes or blocks in a file that are
+ addressed by a {start, length} tuple instead of individual block
+ numbers.</para>
</glossdef>
</glossentry>
<glossentry xml:id="extendloc">
- <glossterm>Extent lock </glossterm>
+ <glossterm>Extent lock</glossterm>
<glossdef>
- <para>An LDLM lock used by the OSC to protect an extent in a storage object for concurrent
- control of read/write, file size acquisition, and truncation operations.</para>
+ <para>An LDLM lock used by the OSC to protect an extent in a storage
+ object for concurrent control of read/write, file size acquisition, and
+ truncation operations.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>F</title>
<glossentry xml:id="failback">
- <glossterm>Failback
- </glossterm>
+ <glossterm>Failback</glossterm>
<glossdef>
- <para> The failover process in which the default active server regains control from the
- backup server that had taken control of the service.</para>
+ <para>The failover process in which the default active server regains
+ control from the backup server that had taken control of the
+ service.</para>
</glossdef>
</glossentry>
<glossentry xml:id="failoutost">
- <glossterm>Failout OST
- </glossterm>
+ <glossterm>Failout OST</glossterm>
<glossdef>
- <para>An OST that is not expected to recover if it fails to answer client requests. A
- failout OST can be administratively failed, thereby enabling clients to return errors when
- accessing data on the failed OST without making additional network requests or waiting for
- OST recovery to complete.</para>
+ <para>An OST that is not expected to recover if it fails to answer
+ client requests. A failout OST can be administratively failed, thereby
+ enabling clients to return errors when accessing data on the failed OST
+ without making additional network requests or waiting for OST recovery
+ to complete.</para>
</glossdef>
</glossentry>
<glossentry xml:id="failover">
- <glossterm>Failover
- </glossterm>
+ <glossterm>Failover</glossterm>
<glossdef>
- <para>The process by which a standby computer server system takes over for an active computer server after a failure of the active node. Typically, the standby computer server gains exclusive access to a shared storage device between the two servers.</para>
+ <para>The process by which a standby computer server system takes over
+ for an active computer server after a failure of the active node.
+ Typically, the standby computer server gains exclusive access to a
+ shared storage device between the two servers.</para>
</glossdef>
</glossentry>
<glossentry xml:id="fid">
- <glossterm>FID
- </glossterm>
+ <glossterm>FID</glossterm>
<glossdef>
- <para> Lustre File Identifier. A 128-bit file system-unique identifier for a file or object
- in the file system. The FID structure contains a unique 64-bit sequence number (see
- <emphasis role="italic">FLDB</emphasis>), a 32-bit object ID (OID), and a 32-bit version
- number. The sequence number is unique across all Lustre targets (OSTs and MDTs).</para>
+ <para>Lustre File Identifier. A 128-bit file system-unique identifier
+ for a file or object in the file system. The FID structure contains a
+ unique 64-bit sequence number (see
+ <emphasis role="italic">FLDB</emphasis>), a 32-bit object ID (OID), and
+ a 32-bit version number. The sequence number is unique across all
+ Lustre targets (OSTs and MDTs).</para>
</glossdef>
</glossentry>
<glossentry xml:id="fileset">
- <glossterm>Fileset
- </glossterm>
+ <glossterm>Fileset</glossterm>
<glossdef>
- <para>A group of files that are defined through a directory that represents the start point
- of a file system.</para>
+ <para>A group of files that are defined through a directory that
+ represents the start point of a file system.</para>
</glossdef>
</glossentry>
<glossentry xml:id="fldb">
- <glossterm>FLDB
- </glossterm>
+ <glossterm>FLDB</glossterm>
<glossdef>
- <para>FID location database. This database maps a sequence of FIDs to a specific target (MDT
- or OST), which manages the objects within the sequence. The FLDB is cached by all clients
- and servers in the file system, but is typically only modified when new servers are added
- to the file system.</para>
+ <para>FID location database. This database maps a sequence of FIDs to a
+ specific target (MDT or OST), which manages the objects within the
+ sequence. The FLDB is cached by all clients and servers in the file
+ system, but is typically only modified when new servers are added to
+ the file system.</para>
</glossdef>
</glossentry>
<glossentry xml:id="flightgroup">
- <glossterm>Flight group </glossterm>
+ <glossterm>Flight group</glossterm>
<glossdef>
- <para>Group of I/O RPCs initiated by the OSC that are concurrently queued or processed at
- the OST. Increasing the number of RPCs in flight for high latency networks can increase
- throughput and reduce visible latency at the client.</para>
+ <para>Group of I/O RPCs initiated by the OSC that are concurrently
+ queued or processed at the OST. Increasing the number of RPCs in flight
+ for high latency networks can increase throughput and reduce visible
+ latency at the client.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>G</title>
<glossentry xml:id="glimpsecallback">
- <glossterm>Glimpse callback </glossterm>
+ <glossterm>Glimpse callback</glossterm>
<glossdef>
- <para>An RPC made by an OST or MDT to another system (usually a client) to indicate that a
- held extent lock should be surrendered. If the system is using the lock, then the system
- should return the object size and timestamps in the reply to the glimpse callback instead
- of cancelling the lock. Glimpses are introduced to optimize the acquisition of file
- attributes without introducing contention on an active lock.</para>
+ <para>An RPC made by an OST or MDT to another system (usually a client)
+ to indicate that a held extent lock should be surrendered. If the
+ system is using the lock, then the system should return the object size
+ and timestamps in the reply to the glimpse callback instead of
+ cancelling the lock. Glimpses are introduced to optimize the
+ acquisition of file attributes without introducing contention on an
+ active lock.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>I</title>
<glossentry xml:id="import">
- <glossterm>Import
- </glossterm>
+ <glossterm>Import</glossterm>
<glossdef>
- <para>The state held held by the client for each target that it is connected to. It holds
- server NIDs, connection state, and uncommitted RPCs needed to fully recover a transaction
- sequence after a server failure and restart.</para>
+ <para>The state held held by the client for each target that it is
+ connected to. It holds server NIDs, connection state, and uncommitted
+ RPCs needed to fully recover a transaction sequence after a server
+ failure and restart.</para>
</glossdef>
</glossentry>
<glossentry xml:id="intentlock">
- <glossterm>Intent lock </glossterm>
+ <glossterm>Intent lock</glossterm>
<glossdef>
- <para>A special Lustre file system locking operation in the Linux kernel. An intent lock
- combines a request for a lock with the full information to perform the operation(s) for
- which the lock was requested. This offers the server the option of granting the lock or
- performing the operation and informing the client of the operation result without granting
- a lock. The use of intent locks enables metadata operations (even complicated ones) to be
- implemented with a single RPC from the client to the server.</para>
+ <para>A special Lustre file system locking operation in the Linux
+ kernel. An intent lock combines a request for a lock with the full
+ information to perform the operation(s) for which the lock was
+ requested. This offers the server the option of granting the lock or
+ performing the operation and informing the client of the operation
+ result without granting a lock. The use of intent locks enables
+ metadata operations (even complicated ones) to be implemented with a
+ single RPC from the client to the server.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>L</title>
<glossentry xml:id="lbug">
- <glossterm>LBUG
- </glossterm>
+ <glossterm>LBUG</glossterm>
<glossdef>
- <para>A fatal error condition detected by the software that halts execution of the kernel
- thread to avoid potential further corruption of the system state. It is printed to the
- console log and triggers a dump of the internal debug log. The system must be rebooted to
- clear this state.</para>
+ <para>A fatal error condition detected by the software that halts
+ execution of the kernel thread to avoid potential further corruption of
+ the system state. It is printed to the console log and triggers a dump
+ of the internal debug log. The system must be rebooted to clear this
+ state.</para>
</glossdef>
</glossentry>
<glossentry xml:id="ldlm">
- <glossterm>LDLM
- </glossterm>
+ <glossterm>LDLM</glossterm>
<glossdef>
<para>Lustre Distributed Lock Manager.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lfs">
- <glossterm>lfs
- </glossterm>
+ <glossterm>lfs</glossterm>
<glossdef>
- <para>The Lustre file system command-line utility that allows end users to interact with
- Lustre software features, such as setting or checking file striping or per-target free
- space. For more details, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
- linkend="dbdoclet.50438206_94597"/>.</para>
+ <para>The Lustre file system command-line utility that allows end users
+ to interact with Lustre software features, such as setting or checking
+ file striping or per-target free space. For more details, see
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="dbdoclet.50438206_94597" />.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lfsck">
- <glossterm>LFSCK
- </glossterm>
+ <glossterm>LFSCK</glossterm>
<glossdef>
- <para>Lustre file system check. A distributed version of a disk file system checker.
- Normally, <literal>lfsck</literal> does not need to be run, except when file systems are
- damaged by events such as multiple disk failures and cannot be recovered using file system
- journal recovery.</para>
+ <para>Lustre file system check. A distributed version of a disk file
+ system checker. Normally,
+ <literal>lfsck</literal> does not need to be run, except when file
+ systems are damaged by events such as multiple disk failures and cannot
+ be recovered using file system journal recovery.</para>
</glossdef>
</glossentry>
<glossentry xml:id="llite">
- <glossterm>llite </glossterm>
+ <glossterm>llite</glossterm>
<glossdef>
- <para>Lustre lite. This term is in use inside code and in module names for code that is
- related to the Linux client VFS interface.</para>
+ <para>Lustre lite. This term is in use inside code and in module names
+ for code that is related to the Linux client VFS interface.</para>
</glossdef>
</glossentry>
<glossentry xml:id="llog">
- <glossterm>llog </glossterm>
+ <glossterm>llog</glossterm>
<glossdef>
- <para>Lustre log. An efficient log data structure used internally by the file system for
- storing configuration and distributed transaction records. An <literal>llog</literal> is
- suitable for rapid transactional appends of records and cheap cancellation of records
- through a bitmap.</para>
+ <para>Lustre log. An efficient log data structure used internally by
+ the file system for storing configuration and distributed transaction
+ records. An
+ <literal>llog</literal> is suitable for rapid transactional appends of
+ records and cheap cancellation of records through a bitmap.</para>
</glossdef>
</glossentry>
<glossentry xml:id="llogcatalog">
- <glossterm>llog catalog </glossterm>
+ <glossterm>llog catalog</glossterm>
<glossdef>
- <para>Lustre log catalog. An <literal>llog</literal> with records that each point at an
- <literal>llog</literal>. Catalogs were introduced to give <literal>llogs</literal>
- increased scalability. <literal>llogs</literal> have an originator which writes records
- and a replicator which cancels records when the records are no longer needed.</para>
+ <para>Lustre log catalog. An
+ <literal>llog</literal> with records that each point at an
+ <literal>llog</literal>. Catalogs were introduced to give
+ <literal>llogs</literal> increased scalability.
+ <literal>llogs</literal> have an originator which writes records and a
+ replicator which cancels records when the records are no longer
+ needed.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lmv">
- <glossterm>LMV
- </glossterm>
+ <glossterm>LMV</glossterm>
<glossdef>
- <para>Logical metadata volume. A module that implements a DNE client-side abstraction
- device. It allows a client to work with many MDTs without changes to the llite module. The
- LMV code forwards requests to the correct MDT based on name or directory striping
- information and merges replies into a single result to pass back to the higher
- <literal>llite</literal> layer that connects the Lustre file system with Linux VFS,
- supports VFS semantics, and complies with POSIX interface specifications.</para>
+ <para>Logical metadata volume. A module that implements a DNE
+ client-side abstraction device. It allows a client to work with many
+ MDTs without changes to the llite module. The LMV code forwards
+ requests to the correct MDT based on name or directory striping
+ information and merges replies into a single result to pass back to the
+ higher
+ <literal>llite</literal> layer that connects the Lustre file system with
+ Linux VFS, supports VFS semantics, and complies with POSIX interface
+ specifications.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lnd">
- <glossterm>LND
- </glossterm>
+ <glossterm>LND</glossterm>
<glossdef>
- <para>Lustre network driver. A code module that enables LNET support over particular
- transports, such as TCP and various kinds of InfiniBand networks.</para>
+ <para>Lustre network driver. A code module that enables LNET support
+ over particular transports, such as TCP and various kinds of InfiniBand
+ networks.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lnet">
- <glossterm>LNET
- </glossterm>
+ <glossterm>LNET</glossterm>
<glossdef>
- <para>Lustre networking. A message passing network protocol capable of running and routing
- through various physical layers. LNET forms the underpinning of LNETrpc.</para>
+ <para>Lustre networking. A message passing network protocol capable of
+ running and routing through various physical layers. LNET forms the
+ underpinning of LNETrpc.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lockclient">
- <glossterm>Lock client </glossterm>
+ <glossterm>Lock client</glossterm>
<glossdef>
- <para>A module that makes lock RPCs to a lock server and handles revocations from the
- server.</para>
+ <para>A module that makes lock RPCs to a lock server and handles
+ revocations from the server.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lockserver">
- <glossterm>Lock server </glossterm>
+ <glossterm>Lock server</glossterm>
<glossdef>
- <para>A service that is co-located with a storage target that manages locks on certain
- objects. It also issues lock callback requests, calls while servicing or, for objects that
- are already locked, completes lock requests.</para>
+ <para>A service that is co-located with a storage target that manages
+ locks on certain objects. It also issues lock callback requests, calls
+ while servicing or, for objects that are already locked, completes lock
+ requests.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lov">
- <glossterm>LOV
- </glossterm>
+ <glossterm>LOV</glossterm>
<glossdef>
- <para>Logical object volume. The object storage analog of a logical volume in a block device
- volume management system, such as LVM or EVMS. The LOV is primarily used to present a
- collection of OSTs as a single device to the MDT and client file system drivers.</para>
+ <para>Logical object volume. The object storage analog of a logical
+ volume in a block device volume management system, such as LVM or EVMS.
+ The LOV is primarily used to present a collection of OSTs as a single
+ device to the MDT and client file system drivers.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lovdes">
- <glossterm>LOV descriptor
- </glossterm>
+ <glossterm>LOV descriptor</glossterm>
<glossdef>
- <para>A set of configuration directives which describes which nodes are OSS systems in the
- Lustre cluster and providing names for their OSTs.</para>
+ <para>A set of configuration directives which describes which nodes are
+ OSS systems in the Lustre cluster and providing names for their
+ OSTs.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lustreclient">
- <glossterm>Lustre client
- </glossterm>
+ <glossterm>Lustre client</glossterm>
<glossdef>
<para>An operating instance with a mounted Lustre file system.</para>
</glossdef>
</glossentry>
<glossentry xml:id="lustrefile">
- <glossterm>Lustre file
- </glossterm>
+ <glossterm>Lustre file</glossterm>
<glossdef>
- <para>A file in the Lustre file system. The implementation of a Lustre file is through an
- inode on a metadata server that contains references to a storage object on OSSs.</para>
+ <para>A file in the Lustre file system. The implementation of a Lustre
+ file is through an inode on a metadata server that contains references
+ to a storage object on OSSs.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>M</title>
<glossentry xml:id="mballoc">
- <glossterm>mballoc </glossterm>
+ <glossterm>mballoc</glossterm>
<glossdef>
- <para>Multi-block allocate. Functionality in ext4 that enables the
- <literal>ldiskfs</literal> file system to allocate multiple blocks with a single request
- to the block allocator. </para>
+ <para>Multi-block allocate. Functionality in ext4 that enables the
+ <literal>ldiskfs</literal> file system to allocate multiple blocks with
+ a single request to the block allocator.</para>
</glossdef>
</glossentry>
<glossentry xml:id="mdc">
- <glossterm>MDC
- </glossterm>
+ <glossterm>MDC</glossterm>
<glossdef>
- <para>Metadata client. A Lustre client component that sends metadata requests via RPC over
- LNET to the metadata target (MDT).</para>
+ <para>Metadata client. A Lustre client component that sends metadata
+ requests via RPC over LNET to the metadata target (MDT).</para>
</glossdef>
</glossentry>
<glossentry xml:id="mdd">
- <glossterm>MDD
- </glossterm>
+ <glossterm>MDD</glossterm>
<glossdef>
- <para>Metadata disk device. Lustre server component that interfaces with the underlying
- object storage device to manage the Lustre file system namespace (directories, file
- ownership, attributes).</para>
+ <para>Metadata disk device. Lustre server component that interfaces
+ with the underlying object storage device to manage the Lustre file
+ system namespace (directories, file ownership, attributes).</para>
</glossdef>
</glossentry>
<glossentry xml:id="mds">
- <glossterm>MDS
- </glossterm>
+ <glossterm>MDS</glossterm>
<glossdef>
- <para>Metadata server. The server node that is hosting the metadata target (MDT).</para>
+ <para>Metadata server. The server node that is hosting the metadata
+ target (MDT).</para>
</glossdef>
</glossentry>
<glossentry xml:id="mdt">
- <glossterm>MDT
- </glossterm>
+ <glossterm>MDT</glossterm>
<glossdef>
- <para>Metadata target. A storage device containing the file system namespace that is made
- available over the network to a client. It stores filenames, attributes, and the layout of
- OST objects that store the file data.</para>
+ <para>Metadata target. A storage device containing the file system
+ namespace that is made available over the network to a client. It
+ stores filenames, attributes, and the layout of OST objects that store
+ the file data.</para>
</glossdef>
</glossentry>
<glossentry xml:id="mdt0" condition='l24'>
- <glossterm>MDT0
- </glossterm>
+ <glossterm>MDT0</glossterm>
<glossdef>
- <para>The metadata target for the file system root. Since Lustre software release 2.4,
- multiple metadata targets are possible in the same file system. MDT0 is the root of the
- file system, which must be available for the file system to be accessible.</para>
+ <para>The metadata target for the file system root. Since Lustre
+ software release 2.4, multiple metadata targets are possible in the
+ same file system. MDT0 is the root of the file system, which must be
+ available for the file system to be accessible.</para>
</glossdef>
</glossentry>
<glossentry xml:id="mgs">
- <glossterm>MGS
- </glossterm>
+ <glossterm>MGS</glossterm>
<glossdef>
- <para>Management service. A software module that manages the startup configuration and
- changes to the configuration. Also, the server node on which this system runs.</para>
+ <para>Management service. A software module that manages the startup
+ configuration and changes to the configuration. Also, the server node
+ on which this system runs.</para>
</glossdef>
</glossentry>
<glossentry xml:id="mountconf">
- <glossterm>mountconf </glossterm>
+ <glossterm>mountconf</glossterm>
<glossdef>
- <para>The Lustre configuration protocol that formats disk file systems on servers with the
- <literal>mkfs.lustre</literal> program, and prepares them for automatic incorporation
- into a Lustre cluster. This allows clients to be configured and mounted with a simple
- <literal>mount</literal> command.</para>
+ <para>The Lustre configuration protocol that formats disk file systems
+ on servers with the
+ <literal>mkfs.lustre</literal> program, and prepares them for automatic
+ incorporation into a Lustre cluster. This allows clients to be
+ configured and mounted with a simple
+ <literal>mount</literal> command.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>N</title>
<glossentry xml:id="nid">
- <glossterm>NID
- </glossterm>
+ <glossterm>NID</glossterm>
<glossdef>
- <para>Network identifier. Encodes the type, network number, and network address of a network
- interface on a node for use by the Lustre file system.</para>
+ <para>Network identifier. Encodes the type, network number, and network
+ address of a network interface on a node for use by the Lustre file
+ system.</para>
</glossdef>
</glossentry>
<glossentry xml:id="nioapi">
- <glossterm>NIO API
- </glossterm>
+ <glossterm>NIO API</glossterm>
<glossdef>
- <para>A subset of the LNET RPC module that implements a library for sending large network requests, moving buffers with RDMA.</para>
+ <para>A subset of the LNET RPC module that implements a library for
+ sending large network requests, moving buffers with RDMA.</para>
</glossdef>
</glossentry>
<glossentry xml:id="nodeaffdef">
- <glossterm>Node affinity </glossterm>
+ <glossterm>Node affinity</glossterm>
<glossdef>
- <para>Node affinity describes the property of a multi-threaded application to behave
- sensibly on multiple cores. Without the property of node affinity, an operating scheduler
- may move application threads across processors in a sub-optimal way that significantly
- reduces performance of the application overall.</para>
+ <para>Node affinity describes the property of a multi-threaded
+ application to behave sensibly on multiple cores. Without the property
+ of node affinity, an operating scheduler may move application threads
+ across processors in a sub-optimal way that significantly reduces
+ performance of the application overall.</para>
</glossdef>
</glossentry>
<glossentry xml:id="nrs">
- <glossterm>NRS
- </glossterm>
+ <glossterm>NRS</glossterm>
<glossdef>
- <para>Network request scheduler. A subcomponent of the PTLRPC layer, which specifies the
- order in which RPCs are handled at servers. This allows optimizing large numbers of
- incoming requests for disk access patterns, fairness between clients, and other
- administrator-selected policies.</para>
+ <para>Network request scheduler. A subcomponent of the PTLRPC layer,
+ which specifies the order in which RPCs are handled at servers. This
+ allows optimizing large numbers of incoming requests for disk access
+ patterns, fairness between clients, and other administrator-selected
+ policies.</para>
</glossdef>
</glossentry>
<glossentry xml:id="NUMAdef">
- <glossterm>NUMA
- </glossterm>
+ <glossterm>NUMA</glossterm>
<glossdef>
- <para>Non-uniform memory access. Describes a multi-processing architecture where the time
- taken to access given memory differs depending on memory location relative to a given
- processor. Typically machines with multiple sockets are NUMA architectures.</para>
+ <para>Non-uniform memory access. Describes a multi-processing
+ architecture where the time taken to access given memory differs
+ depending on memory location relative to a given processor. Typically
+ machines with multiple sockets are NUMA architectures.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>O</title>
<glossentry xml:id="odb">
- <glossterm>OBD </glossterm>
+ <glossterm>OBD</glossterm>
<glossdef>
- <para>Object-based device. The generic term for components in the Lustre software stack that
- can be configured on the client or server. Examples include MDC, OSC, LOV, MDT, and
- OST.</para>
+ <para>Object-based device. The generic term for components in the
+ Lustre software stack that can be configured on the client or server.
+ Examples include MDC, OSC, LOV, MDT, and OST.</para>
</glossdef>
</glossentry>
<glossentry xml:id="odbapi">
- <glossterm>OBD API </glossterm>
+ <glossterm>OBD API</glossterm>
<glossdef>
- <para>The programming interface for configuring OBD devices. This was formerly also the API
- for accessing object IO and attribute methods on both the client and server, but has been
- replaced by the OSD API in most parts of the code.</para>
+ <para>The programming interface for configuring OBD devices. This was
+ formerly also the API for accessing object IO and attribute methods on
+ both the client and server, but has been replaced by the OSD API in
+ most parts of the code.</para>
</glossdef>
</glossentry>
<glossentry xml:id="odbtype">
- <glossterm>OBD type </glossterm>
+ <glossterm>OBD type</glossterm>
<glossdef>
- <para>Module that can implement the Lustre object or metadata APIs. Examples of OBD types
- include the LOV, OSC and OSD.</para>
+ <para>Module that can implement the Lustre object or metadata APIs.
+ Examples of OBD types include the LOV, OSC and OSD.</para>
</glossdef>
</glossentry>
<glossentry xml:id="odbfilter">
- <glossterm>Obdfilter </glossterm>
+ <glossterm>Obdfilter</glossterm>
<glossdef>
- <para>An older name for the OBD API data object operation device driver that sits between
- the OST and the OSD. In Lustre software release 2.4 this device has been renamed
- OFD."</para>
+ <para>An older name for the OBD API data object operation device driver
+ that sits between the OST and the OSD. In Lustre software release 2.4
+ this device has been renamed OFD."</para>
</glossdef>
</glossentry>
<glossentry xml:id="objectstorage">
- <glossterm>Object storage </glossterm>
+ <glossterm>Object storage</glossterm>
<glossdef>
- <para>Refers to a storage-device API or protocol involving storage objects. The two most
- well known instances of object storage are the T10 iSCSI storage object protocol and the
- Lustre object storage protocol (a network implementation of the Lustre object API). The
- principal difference between the Lustre protocol and T10 protocol is that the Lustre
- protocol includes locking and recovery control in the protocol and is not tied to a SCSI
- transport layer.</para>
+ <para>Refers to a storage-device API or protocol involving storage
+ objects. The two most well known instances of object storage are the
+ T10 iSCSI storage object protocol and the Lustre object storage
+ protocol (a network implementation of the Lustre object API). The
+ principal difference between the Lustre protocol and T10 protocol is
+ that the Lustre protocol includes locking and recovery control in the
+ protocol and is not tied to a SCSI transport layer.</para>
</glossdef>
</glossentry>
<glossentry xml:id="opencache">
- <glossterm>opencache </glossterm>
+ <glossterm>opencache</glossterm>
<glossdef>
- <para>A cache of open file handles. This is a performance enhancement for NFS.</para>
+ <para>A cache of open file handles. This is a performance enhancement
+ for NFS.</para>
</glossdef>
</glossentry>
<glossentry xml:id="orphanobjects">
- <glossterm>Orphan objects </glossterm>
+ <glossterm>Orphan objects</glossterm>
<glossdef>
- <para>Storage objects to which no Lustre file points. Orphan objects can arise from crashes
- and are automatically removed by an <literal>llog</literal> recovery between the MDT and
- OST. When a client deletes a file, the MDT unlinks it from the namespace. If this is the
- last link, it will atomically add the OST objects into a per-OST <literal>llog</literal>
- (if a crash has occurred) and then wait until the unlink commits to disk. (At this point,
- it is safe to destroy the OST objects. Once the destroy is committed, the MDT
- <literal>llog</literal> records can be cancelled.)</para>
+ <para>Storage objects to which no Lustre file points. Orphan objects
+ can arise from crashes and are automatically removed by an
+ <literal>llog</literal> recovery between the MDT and OST. When a client
+ deletes a file, the MDT unlinks it from the namespace. If this is the
+ last link, it will atomically add the OST objects into a per-OST
+ <literal>llog</literal>(if a crash has occurred) and then wait until
+ the unlink commits to disk. (At this point, it is safe to destroy the
+ OST objects. Once the destroy is committed, the MDT
+ <literal>llog</literal> records can be cancelled.)</para>
</glossdef>
</glossentry>
<glossentry xml:id="osc">
- <glossterm>OSC </glossterm>
+ <glossterm>OSC</glossterm>
<glossdef>
- <para>Object storage client. The client module communicating to an OST (via an OSS).</para>
+ <para>Object storage client. The client module communicating to an OST
+ (via an OSS).</para>
</glossdef>
</glossentry>
<glossentry xml:id="osd">
- <glossterm>OSD </glossterm>
+ <glossterm>OSD</glossterm>
<glossdef>
- <para>Object storage device. A generic, industry term for storage devices with a more
- extended interface than block-oriented devices such as disks. For the Lustre file system,
- this name is used to describe a software module that implements an object storage API in
- the kernel. It is also used to refer to an instance of an object storage device created by
- that driver. The OSD device is layered on a file system, with methods that mimic create,
- destroy and I/O operations on file inodes.</para>
+ <para>Object storage device. A generic, industry term for storage
+ devices with a more extended interface than block-oriented devices such
+ as disks. For the Lustre file system, this name is used to describe a
+ software module that implements an object storage API in the kernel. It
+ is also used to refer to an instance of an object storage device
+ created by that driver. The OSD device is layered on a file system,
+ with methods that mimic create, destroy and I/O operations on file
+ inodes.</para>
</glossdef>
</glossentry>
<glossentry xml:id="oss">
- <glossterm>OSS </glossterm>
+ <glossterm>OSS</glossterm>
<glossdef>
- <para>Object storage server. A server OBD that provides access to local OSTs.</para>
+ <para>Object storage server. A server OBD that provides access to local
+ OSTs.</para>
</glossdef>
</glossentry>
<glossentry xml:id="ost">
- <glossterm>OST </glossterm>
+ <glossterm>OST</glossterm>
<glossdef>
- <para>Object storage target. An OSD made accessible through a network protocol. Typically,
- an OST is associated with a unique OSD which, in turn is associated with a formatted disk
- file system on the server containing the data objects.</para>
+ <para>Object storage target. An OSD made accessible through a network
+ protocol. Typically, an OST is associated with a unique OSD which, in
+ turn is associated with a formatted disk file system on the server
+ containing the data objects.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>P</title>
<glossentry xml:id="pdirops">
- <glossterm>pdirops </glossterm>
+ <glossterm>pdirops</glossterm>
<glossdef>
- <para>A locking protocol in the Linux VFS layer that allows for directory operations
- performed in parallel.</para>
+ <para>A locking protocol in the Linux VFS layer that allows for
+ directory operations performed in parallel.</para>
</glossdef>
</glossentry>
<glossentry xml:id="pool">
- <glossterm>Pool </glossterm>
+ <glossterm>Pool</glossterm>
<glossdef>
- <para>OST pools allows the administrator to associate a name with an arbitrary subset of
- OSTs in a Lustre cluster. A group of OSTs can be combined into a named pool with unique
- access permissions and stripe characteristics.</para>
+ <para>OST pools allows the administrator to associate a name with an
+ arbitrary subset of OSTs in a Lustre cluster. A group of OSTs can be
+ combined into a named pool with unique access permissions and stripe
+ characteristics.</para>
</glossdef>
</glossentry>
<glossentry xml:id="portal">
- <glossterm>Portal </glossterm>
+ <glossterm>Portal</glossterm>
<glossdef>
- <para>A service address on an LNET NID that binds requests to a specific request service,
- such as an MDS, MGS, OSS, or LDLM. Services may listen on multiple portals to ensure that
- high priority messages are not queued behind many slow requests on another portal.</para>
+ <para>A service address on an LNET NID that binds requests to a
+ specific request service, such as an MDS, MGS, OSS, or LDLM. Services
+ may listen on multiple portals to ensure that high priority messages
+ are not queued behind many slow requests on another portal.</para>
</glossdef>
</glossentry>
<glossentry xml:id="ptlrpc">
- <glossterm>PTLRPC
- </glossterm>
+ <glossterm>PTLRPC</glossterm>
<glossdef>
- <para>An RPC protocol layered on LNET. This protocol deals with stateful servers and has exactly-once semantics and built in support for recovery.</para>
+ <para>An RPC protocol layered on LNET. This protocol deals with
+ stateful servers and has exactly-once semantics and built in support
+ for recovery.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>R</title>
<glossentry xml:id="recovery">
- <glossterm>Recovery
- </glossterm>
+ <glossterm>Recovery</glossterm>
<glossdef>
- <para>The process that re-establishes the connection state when a client that was previously connected to a server reconnects after the server restarts.</para>
+ <para>The process that re-establishes the connection state when a
+ client that was previously connected to a server reconnects after the
+ server restarts.</para>
</glossdef>
</glossentry>
<glossentry xml:id="replay">
<glossterm>Replay request</glossterm>
<glossdef>
- <para>The concept of re-executing a server request after the server has lost information in
- its memory caches and shut down. The replay requests are retained by clients until the
- server(s) have confirmed that the data is persistent on disk. Only requests for which a
- client received a reply and were assigned a transaction number by the server are replayed.
- Requests that did not get a reply are resent.</para>
+ <para>The concept of re-executing a server request after the server has
+ lost information in its memory caches and shut down. The replay
+ requests are retained by clients until the server(s) have confirmed
+ that the data is persistent on disk. Only requests for which a client
+ received a reply and were assigned a transaction number by the server
+ are replayed. Requests that did not get a reply are resent.</para>
</glossdef>
</glossentry>
<glossentry xml:id="resent">
- <glossterm>Resent request </glossterm>
+ <glossterm>Resent request</glossterm>
<glossdef>
- <para>An RPC request sent from a client to a server that has not had a reply from the
- server. This might happen if the request was lost on the way to the server, if the reply
- was lost on the way back from the server, or if the server crashes before or after
- processing the request. During server RPC recovery processing, resent requests are
- processed after replayed requests, and use the client RPC XID to determine if the resent
- RPC request was already executed on the server. </para>
+ <para>An RPC request sent from a client to a server that has not had a
+ reply from the server. This might happen if the request was lost on the
+ way to the server, if the reply was lost on the way back from the
+ server, or if the server crashes before or after processing the
+ request. During server RPC recovery processing, resent requests are
+ processed after replayed requests, and use the client RPC XID to
+ determine if the resent RPC request was already executed on the
+ server.</para>
</glossdef>
</glossentry>
<glossentry xml:id="revocation">
- <glossterm>Revocation callback </glossterm>
+ <glossterm>Revocation callback</glossterm>
<glossdef>
- <para>Also called a "blocking callback". An RPC request made by the lock server (typically
- for an OST or MDT) to a lock client to revoke a granted DLM lock.</para>
+ <para>Also called a "blocking callback". An RPC request made by the
+ lock server (typically for an OST or MDT) to a lock client to revoke a
+ granted DLM lock.</para>
</glossdef>
</glossentry>
<glossentry xml:id="rootsquash">
- <glossterm>Root squash
- </glossterm>
+ <glossterm>Root squash</glossterm>
<glossdef>
- <para>A mechanism whereby the identity of a root user on a client system is mapped to a
- different identity on the server to avoid root users on clients from accessing or
- modifying root-owned files on the servers. This does not prevent root users on the client
- from assuming the identity of a non-root user, so should not be considered a robust
- security mechanism. Typically, for management purposes, at least one client system should
- not be subject to root squash.</para>
+ <para>A mechanism whereby the identity of a root user on a client
+ system is mapped to a different identity on the server to avoid root
+ users on clients from accessing or modifying root-owned files on the
+ servers. This does not prevent root users on the client from assuming
+ the identity of a non-root user, so should not be considered a robust
+ security mechanism. Typically, for management purposes, at least one
+ client system should not be subject to root squash.</para>
</glossdef>
</glossentry>
<glossentry xml:id="routing">
- <glossterm>Routing </glossterm>
+ <glossterm>Routing</glossterm>
<glossdef>
<para>LNET routing between different networks and LNDs.</para>
</glossdef>
</glossentry>
<glossentry xml:id="rpc">
- <glossterm>RPC
- </glossterm>
+ <glossterm>RPC</glossterm>
<glossdef>
<para>Remote procedure call. A network encoding of a request.</para>
</glossdef>
<glossdiv>
<title>S</title>
<glossentry xml:id="stride">
- <glossterm>Stripe </glossterm>
+ <glossterm>Stripe</glossterm>
<glossdef>
- <para>A contiguous, logical extent of a Lustre file written to a single OST. Used
- synonymously with a single OST data object that makes up part of a file visible to user
- applications.</para>
+ <para>A contiguous, logical extent of a Lustre file written to a single
+ OST. Used synonymously with a single OST data object that makes up part
+ of a file visible to user applications.</para>
</glossdef>
</glossentry>
<glossentry xml:id="stridesize">
- <glossterm>Stripe size </glossterm>
+ <glossterm>Stripe size</glossterm>
<glossdef>
- <para>The maximum number of bytes that will be written to an OST object before the next
- object in a file's layout is used when writing sequential data to a file. Once a full
- stripe has been written to each of the objects in the layout, the first object will be
- written to again in round-robin fashion.</para>
+ <para>The maximum number of bytes that will be written to an OST object
+ before the next object in a file's layout is used when writing
+ sequential data to a file. Once a full stripe has been written to each
+ of the objects in the layout, the first object will be written to again
+ in round-robin fashion.</para>
</glossdef>
</glossentry>
<glossentry xml:id="stripcount">
- <glossterm>Stripe count
- </glossterm>
+ <glossterm>Stripe count</glossterm>
<glossdef>
- <para>The number of OSTs holding objects for a RAID0-striped Lustre file.</para>
+ <para>The number of OSTs holding objects for a RAID0-striped Lustre
+ file.</para>
</glossdef>
</glossentry>
<glossentry xml:id="stripingmetadata">
- <glossterm>Striping metadata
- </glossterm>
+ <glossterm>Striping metadata</glossterm>
<glossdef>
- <para>The extended attribute associated with a file that describes how its data is
- distributed over storage objects. See also <emphasis role="italic">Default stripe
- pattern</emphasis>.</para>
+ <para>The extended attribute associated with a file that describes how
+ its data is distributed over storage objects. See also
+ <emphasis role="italic">Default stripe pattern</emphasis>.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>T</title>
<glossentry xml:id="t10">
- <glossterm>T10 object protocol
- </glossterm>
+ <glossterm>T10 object protocol</glossterm>
<glossdef>
- <para>An object storage protocol tied to the SCSI transport layer. The Lustre file system
- does not use T10.</para>
+ <para>An object storage protocol tied to the SCSI transport layer. The
+ Lustre file system does not use T10.</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>W</title>
<glossentry xml:id="widestriping">
- <glossterm>Wide striping
- </glossterm>
- <glossdef>
- <para>Strategy of using many OSTs to store stripes of a single file. This obtains maximum
- bandwidth access to a single file through parallel utilization of many OSTs. For more
- information about wide striping, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
- linkend="section_syy_gcl_qk"/>.</para>
+ <glossterm>Wide striping</glossterm>
+ <glossdef>
+ <para>Strategy of using many OSTs to store stripes of a single file.
+ This obtains maximum bandwidth access to a single file through parallel
+ utilization of many OSTs. For more information about wide striping, see
+
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="section_syy_gcl_qk" />.</para>
</glossdef>
</glossentry>
</glossdiv>
-<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="lustreoperations">
+<?xml version='1.0' encoding='utf-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+xml:id="lustreoperations">
<title xml:id="lustreoperations.title">Lustre Operations</title>
- <para>Once you have the Lustre file system up and running, you can use the procedures in this section to perform these basic Lustre administration tasks:</para>
+ <para>Once you have the Lustre file system up and running, you can use the
+ procedures in this section to perform these basic Lustre administration
+ tasks:</para>
<itemizedlist>
<listitem>
- <para><xref linkend="dbdoclet.50438194_42877"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_42877" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_24122"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_24122" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_84876"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_84876" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_69255"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_69255" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_57420"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_57420" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_54138"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_54138" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_88063"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_88063" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.lfsmkdir"/></para>
+ <para>
+ <xref linkend="dbdoclet.lfsmkdir" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_88980"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_88980" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_41817"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_41817" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_70905"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_70905" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_16954"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_16954" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_69998"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_69998" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438194_30872"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438194_30872" />
+ </para>
</listitem>
</itemizedlist>
<section xml:id="dbdoclet.50438194_42877">
- <title><indexterm><primary>operations</primary></indexterm>
-<indexterm><primary>operations</primary><secondary>mounting by label</secondary></indexterm>
-Mounting by Label</title>
- <para>The file system name is limited to 8 characters. We have encoded the file system and target information in the disk label, so you can mount by label. This allows system administrators to move disks around without worrying about issues such as SCSI disk reordering or getting the <literal>/dev/device</literal> wrong for a shared target. Soon, file system naming will be made as fail-safe as possible. Currently, Linux disk labels are limited to 16 characters. To identify the target within the file system, 8 characters are reserved, leaving 8 characters for the file system name:</para>
- <screen><replaceable>fsname</replaceable>-MDT0000 or <replaceable>fsname</replaceable>-OST0a19</screen>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ </indexterm>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>mounting by label</secondary>
+ </indexterm>Mounting by Label</title>
+ <para>The file system name is limited to 8 characters. We have encoded the
+ file system and target information in the disk label, so you can mount by
+ label. This allows system administrators to move disks around without
+ worrying about issues such as SCSI disk reordering or getting the
+ <literal>/dev/device</literal> wrong for a shared target. Soon, file system
+ naming will be made as fail-safe as possible. Currently, Linux disk labels
+ are limited to 16 characters. To identify the target within the file
+ system, 8 characters are reserved, leaving 8 characters for the file system
+ name:</para>
+ <screen>
+<replaceable>fsname</replaceable>-MDT0000 or
+<replaceable>fsname</replaceable>-OST0a19
+</screen>
<para>To mount by label, use this command:</para>
- <screen>mount -t lustre -L <replaceable>file_system_label</replaceable> <replaceable>/mount_point</replaceable></screen>
+ <screen>
+mount -t lustre -L
+<replaceable>file_system_label</replaceable>
+<replaceable>/mount_point</replaceable>
+</screen>
<para>This is an example of mount-by-label:</para>
- <screen>mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt</screen>
+ <screen>
+mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
+</screen>
<caution>
- <para>Mount-by-label should NOT be used in a multi-path environment or when snapshots are being created of the device, since multiple block devices will have the same label.</para>
+ <para>Mount-by-label should NOT be used in a multi-path environment or
+ when snapshots are being created of the device, since multiple block
+ devices will have the same label.</para>
</caution>
- <para>Although the file system name is internally limited to 8 characters, you can mount the clients at any mount point, so file system users are not subjected to short names. Here is an example:</para>
- <screen>client# mount -t lustre mds0@tcp0:/short <replaceable>/dev/long_mountpoint_name</replaceable></screen>
+ <para>Although the file system name is internally limited to 8 characters,
+ you can mount the clients at any mount point, so file system users are not
+ subjected to short names. Here is an example:</para>
+ <screen>
+client# mount -t lustre mds0@tcp0:/short
+<replaceable>/dev/long_mountpoint_name</replaceable>
+</screen>
</section>
<section xml:id="dbdoclet.50438194_24122">
- <title><indexterm><primary>operations</primary><secondary>starting</secondary></indexterm>Starting Lustre</title>
- <para>On the first start of a Lustre file system, the components must be started in the following order:</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>starting</secondary>
+ </indexterm>Starting Lustre</title>
+ <para>On the first start of a Lustre file system, the components must be
+ started in the following order:</para>
<orderedlist>
<listitem>
<para>Mount the MGT.</para>
- <note><para>If a combined MGT/MDT is present, Lustre will correctly mount the MGT and MDT automatically.</para></note>
+ <note>
+ <para>If a combined MGT/MDT is present, Lustre will correctly mount
+ the MGT and MDT automatically.</para>
+ </note>
</listitem>
<listitem>
<para>Mount the MDT.</para>
- <note><para condition='l24'>Mount all MDTs if multiple MDTs are present.</para></note>
+ <note>
+ <para condition='l24'>Mount all MDTs if multiple MDTs are
+ present.</para>
+ </note>
</listitem>
<listitem>
<para>Mount the OST(s).</para>
</orderedlist>
</section>
<section xml:id="dbdoclet.50438194_84876">
- <title><indexterm><primary>operations</primary><secondary>mounting</secondary></indexterm>Mounting a Server</title>
- <para>Starting a Lustre server is straightforward and only involves the mount command. Lustre servers can be added to <literal>/etc/fstab</literal>:</para>
- <screen>mount -t lustre</screen>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>mounting</secondary>
+ </indexterm>Mounting a Server</title>
+ <para>Starting a Lustre server is straightforward and only involves the
+ mount command. Lustre servers can be added to
+ <literal>/etc/fstab</literal>:</para>
+ <screen>
+mount -t lustre
+</screen>
<para>The mount command generates output similar to this:</para>
- <screen>/dev/sda1 on /mnt/test/mdt type lustre (rw)
+ <screen>
+/dev/sda1 on /mnt/test/mdt type lustre (rw)
/dev/sda2 on /mnt/test/ost0 type lustre (rw)
-192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)</screen>
- <para>In this example, the MDT, an OST (ost0) and file system (testfs) are mounted.</para>
- <screen>LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
-LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0</screen>
- <para>In general, it is wise to specify noauto and let your high-availability (HA) package
- manage when to mount the device. If you are not using failover, make sure that networking has
- been started before mounting a Lustre server. If you are running Red Hat Enterprise Linux,
- SUSE Linux Enterprise Server, Debian operating system (and perhaps others), use the
- <literal>_netdev</literal> flag to ensure that these disks are mounted after the network is
- up.</para>
- <para>We are mounting by disk label here. The label of a device can be read with <literal>e2label</literal>. The label of a newly-formatted Lustre server may end in <literal>FFFF</literal> if the <literal>--index</literal> option is not specified to <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The assignment takes place when the server is first started, and the disk label is updated. It is recommended that the <literal>--index</literal> option always be used, which will also ensure that the label is set at format time.</para>
+192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
+</screen>
+ <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
+ mounted.</para>
+ <screen>
+LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
+LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
+</screen>
+ <para>In general, it is wise to specify noauto and let your
+ high-availability (HA) package manage when to mount the device. If you are
+ not using failover, make sure that networking has been started before
+ mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
+ Linux Enterprise Server, Debian operating system (and perhaps others), use
+ the
+ <literal>_netdev</literal> flag to ensure that these disks are mounted after
+ the network is up.</para>
+ <para>We are mounting by disk label here. The label of a device can be read
+ with
+ <literal>e2label</literal>. The label of a newly-formatted Lustre server
+ may end in
+ <literal>FFFF</literal> if the
+ <literal>--index</literal> option is not specified to
+ <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
+ assignment takes place when the server is first started, and the disk label
+ is updated. It is recommended that the
+ <literal>--index</literal> option always be used, which will also ensure
+ that the label is set at format time.</para>
<caution>
- <para>Do not do this when the client and OSS are on the same node, as memory pressure between the client and OSS can lead to deadlocks.</para>
+ <para>Do not do this when the client and OSS are on the same node, as
+ memory pressure between the client and OSS can lead to deadlocks.</para>
</caution>
<caution>
- <para>Mount-by-label should NOT be used in a multi-path environment.</para>
+ <para>Mount-by-label should NOT be used in a multi-path
+ environment.</para>
</caution>
</section>
<section xml:id="dbdoclet.50438194_69255">
- <title><indexterm><primary>operations</primary><secondary>unmounting</secondary></indexterm>Unmounting a Server</title>
- <para>To stop a Lustre server, use the <literal>umount <replaceable>/mount</replaceable> <replaceable>point</replaceable></literal> command.</para>
- <para>For example, to stop <literal>ost0</literal> on mount point <literal>/mnt/test</literal>, run:</para>
- <screen>$ umount /mnt/test</screen>
- <para>Gracefully stopping a server with the <literal>umount</literal> command preserves the state of the connected clients. The next time the server is started, it waits for clients to reconnect, and then goes through the recovery procedure.</para>
- <para>If the force (<literal>-f</literal>) flag is used, then the server evicts all clients and stops WITHOUT recovery. Upon restart, the server does not wait for recovery. Any currently connected clients receive I/O errors until they reconnect.</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>unmounting</secondary>
+ </indexterm>Unmounting a Server</title>
+ <para>To stop a Lustre server, use the
+ <literal>umount
+ <replaceable>/mount</replaceable>
+ <replaceable>point</replaceable></literal> command.</para>
+ <para>For example, to stop
+ <literal>ost0</literal> on mount point
+ <literal>/mnt/test</literal>, run:</para>
+ <screen>
+$ umount /mnt/test
+</screen>
+ <para>Gracefully stopping a server with the
+ <literal>umount</literal> command preserves the state of the connected
+ clients. The next time the server is started, it waits for clients to
+ reconnect, and then goes through the recovery procedure.</para>
+ <para>If the force (
+ <literal>-f</literal>) flag is used, then the server evicts all clients and
+ stops WITHOUT recovery. Upon restart, the server does not wait for
+ recovery. Any currently connected clients receive I/O errors until they
+ reconnect.</para>
<note>
- <para>If you are using loopback devices, use the <literal>-d</literal> flag. This flag cleans up loop devices and can always be safely specified.</para>
+ <para>If you are using loopback devices, use the
+ <literal>-d</literal> flag. This flag cleans up loop devices and can
+ always be safely specified.</para>
</note>
</section>
<section xml:id="dbdoclet.50438194_57420">
- <title><indexterm><primary>operations</primary><secondary>failover</secondary></indexterm>Specifying Failout/Failover Mode for OSTs</title>
- <para>In a Lustre file system, an OST that has become unreachable because it fails, is taken off
- the network, or is unmounted can be handled in one of two ways:</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>failover</secondary>
+ </indexterm>Specifying Failout/Failover Mode for OSTs</title>
+ <para>In a Lustre file system, an OST that has become unreachable because
+ it fails, is taken off the network, or is unmounted can be handled in one
+ of two ways:</para>
<itemizedlist>
<listitem>
- <para> In <literal>failout</literal> mode, Lustre clients immediately receive errors (EIOs)
- after a timeout, instead of waiting for the OST to recover.</para>
+ <para>In
+ <literal>failout</literal> mode, Lustre clients immediately receive
+ errors (EIOs) after a timeout, instead of waiting for the OST to
+ recover.</para>
</listitem>
<listitem>
- <para> In <literal>failover</literal> mode, Lustre clients wait for the OST to
- recover.</para>
+ <para>In
+ <literal>failover</literal> mode, Lustre clients wait for the OST to
+ recover.</para>
</listitem>
</itemizedlist>
- <para>By default, the Lustre file system uses <literal>failover</literal> mode for OSTs. To
- specify <literal>failout</literal> mode instead, use the
- <literal>--param="failover.mode=failout"</literal> option as shown below (entered
- on one line):</para>
- <screen>oss# mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgsnode=<replaceable>mgs_NID</replaceable> --param=failover.mode=failout
- --ost --index=<replaceable>ost_index</replaceable> <replaceable>/dev/ost_block_device</replaceable></screen>
- <para>In the example below, <literal>failout</literal> mode is specified for the OSTs on the MGS
- <literal>mds0</literal> in the file system <literal>testfs</literal> (entered on one
- line).</para>
- <screen>oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
- --ost --index=3 /dev/sdb </screen>
+ <para>By default, the Lustre file system uses
+ <literal>failover</literal> mode for OSTs. To specify
+ <literal>failout</literal> mode instead, use the
+ <literal>--param="failover.mode=failout"</literal> option as shown below
+ (entered on one line):</para>
+ <screen>
+oss# mkfs.lustre --fsname=
+<replaceable>fsname</replaceable> --mgsnode=
+<replaceable>mgs_NID</replaceable> --param=failover.mode=failout
+ --ost --index=
+<replaceable>ost_index</replaceable>
+<replaceable>/dev/ost_block_device</replaceable>
+</screen>
+ <para>In the example below,
+ <literal>failout</literal> mode is specified for the OSTs on the MGS
+ <literal>mds0</literal> in the file system
+ <literal>testfs</literal>(entered on one line).</para>
+ <screen>
+oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
+ --ost --index=3 /dev/sdb
+</screen>
<caution>
- <para>Before running this command, unmount all OSTs that will be affected by a change in
- <literal>failover</literal> / <literal>failout</literal> mode.</para>
+ <para>Before running this command, unmount all OSTs that will be affected
+ by a change in
+ <literal>failover</literal>/
+ <literal>failout</literal> mode.</para>
</caution>
<note>
- <para>After initial file system configuration, use the <literal>tunefs.lustre</literal>
- utility to change the mode. For example, to set the <literal>failout</literal> mode,
- run:</para>
- <para><screen>$ tunefs.lustre --param failover.mode=failout <replaceable>/dev/ost_device</replaceable></screen></para>
+ <para>After initial file system configuration, use the
+ <literal>tunefs.lustre</literal> utility to change the mode. For example,
+ to set the
+ <literal>failout</literal> mode, run:</para>
+ <para>
+ <screen>
+$ tunefs.lustre --param failover.mode=failout
+<replaceable>/dev/ost_device</replaceable>
+</screen>
+ </para>
</note>
</section>
<section xml:id="dbdoclet.50438194_54138">
- <title><indexterm><primary>operations</primary><secondary>degraded OST RAID</secondary></indexterm>Handling Degraded OST RAID Arrays</title>
- <para>Lustre includes functionality that notifies Lustre if an external RAID array has degraded performance (resulting in reduced overall file system performance), either because a disk has failed and not been replaced, or because a disk was replaced and is undergoing a rebuild. To avoid a global performance slowdown due to a degraded OST, the MDS can avoid the OST for new object allocation if it is notified of the degraded state.</para>
- <para>A parameter for each OST, called <literal>degraded</literal>, specifies whether the OST is running in degraded mode or not.</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>degraded OST RAID</secondary>
+ </indexterm>Handling Degraded OST RAID Arrays</title>
+ <para>Lustre includes functionality that notifies Lustre if an external
+ RAID array has degraded performance (resulting in reduced overall file
+ system performance), either because a disk has failed and not been
+ replaced, or because a disk was replaced and is undergoing a rebuild. To
+ avoid a global performance slowdown due to a degraded OST, the MDS can
+ avoid the OST for new object allocation if it is notified of the degraded
+ state.</para>
+ <para>A parameter for each OST, called
+ <literal>degraded</literal>, specifies whether the OST is running in
+ degraded mode or not.</para>
<para>To mark the OST as degraded, use:</para>
- <screen>lctl set_param obdfilter.{OST_name}.degraded=1</screen>
+ <screen>
+lctl set_param obdfilter.{OST_name}.degraded=1
+</screen>
<para>To mark that the OST is back in normal operation, use:</para>
- <screen>lctl set_param obdfilter.{OST_name}.degraded=0
+ <screen>
+lctl set_param obdfilter.{OST_name}.degraded=0
</screen>
<para>To determine if OSTs are currently in degraded mode, use:</para>
- <screen>lctl get_param obdfilter.*.degraded
+ <screen>
+lctl get_param obdfilter.*.degraded
</screen>
- <para>If the OST is remounted due to a reboot or other condition, the flag resets to <literal>0</literal>.</para>
- <para>It is recommended that this be implemented by an automated script that monitors the status of individual RAID devices.</para>
+ <para>If the OST is remounted due to a reboot or other condition, the flag
+ resets to
+ <literal>0</literal>.</para>
+ <para>It is recommended that this be implemented by an automated script
+ that monitors the status of individual RAID devices.</para>
</section>
<section xml:id="dbdoclet.50438194_88063">
- <title><indexterm><primary>operations</primary><secondary>multiple file systems</secondary></indexterm>Running Multiple Lustre File Systems</title>
- <para>Lustre supports multiple file systems provided the combination of <literal>NID:fsname</literal> is unique. Each file system must be allocated a unique name during creation with the <literal>--fsname</literal> parameter. Unique names for file systems are enforced if a single MGS is present. If multiple MGSs are present (for example if you have an MGS on every MDS) the administrator is responsible for ensuring file system names are unique. A single MGS and unique file system names provides a single point of administration and allows commands to be issued against the file system even if it is not mounted.</para>
- <para>Lustre supports multiple file systems on a single MGS. With a single MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs to co-exist. For example, multiple MGSs will be necessary if multiple file systems on different Lustre software versions are to be concurrently available. With multiple MGSs additional care must be taken to ensure file system names are unique. Each file system should have a unique fsname among all systems that may interoperate in the future.</para>
- <para>By default, the <literal>mkfs.lustre</literal> command creates a file system named <literal>lustre</literal>. To specify a different file system name (limited to 8 characters) at format time, use the <literal>--fsname</literal> option:</para>
- <para><screen>mkfs.lustre --fsname=<replaceable>file_system_name</replaceable></screen></para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>multiple file systems</secondary>
+ </indexterm>Running Multiple Lustre File Systems</title>
+ <para>Lustre supports multiple file systems provided the combination of
+ <literal>NID:fsname</literal> is unique. Each file system must be allocated
+ a unique name during creation with the
+ <literal>--fsname</literal> parameter. Unique names for file systems are
+ enforced if a single MGS is present. If multiple MGSs are present (for
+ example if you have an MGS on every MDS) the administrator is responsible
+ for ensuring file system names are unique. A single MGS and unique file
+ system names provides a single point of administration and allows commands
+ to be issued against the file system even if it is not mounted.</para>
+ <para>Lustre supports multiple file systems on a single MGS. With a single
+ MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
+ to co-exist. For example, multiple MGSs will be necessary if multiple file
+ systems on different Lustre software versions are to be concurrently
+ available. With multiple MGSs additional care must be taken to ensure file
+ system names are unique. Each file system should have a unique fsname among
+ all systems that may interoperate in the future.</para>
+ <para>By default, the
+ <literal>mkfs.lustre</literal> command creates a file system named
+ <literal>lustre</literal>. To specify a different file system name (limited
+ to 8 characters) at format time, use the
+ <literal>--fsname</literal> option:</para>
+ <para>
+ <screen>
+mkfs.lustre --fsname=
+<replaceable>file_system_name</replaceable>
+</screen>
+ </para>
<note>
- <para>The MDT, OSTs and clients in the new file system must use the same file system name
- (prepended to the device name). For example, for a new file system named
- <literal>foo</literal>, the MDT and two OSTs would be named
- <literal>foo-MDT0000</literal>, <literal>foo-OST0000</literal>, and
- <literal>foo-OST0001</literal>.</para>
+ <para>The MDT, OSTs and clients in the new file system must use the same
+ file system name (prepended to the device name). For example, for a new
+ file system named
+ <literal>foo</literal>, the MDT and two OSTs would be named
+ <literal>foo-MDT0000</literal>,
+ <literal>foo-OST0000</literal>, and
+ <literal>foo-OST0001</literal>.</para>
</note>
<para>To mount a client on the file system, run:</para>
- <screen>client# mount -t lustre <replaceable>mgsnode</replaceable>:<replaceable>/new_fsname</replaceable> <replaceable>/mount_point</replaceable></screen>
- <para>For example, to mount a client on file system foo at mount point /mnt/foo, run:</para>
- <screen>client# mount -t lustre mgsnode:/foo /mnt/foo</screen>
+ <screen>
+client# mount -t lustre
+<replaceable>mgsnode</replaceable>:
+<replaceable>/new_fsname</replaceable>
+<replaceable>/mount_point</replaceable>
+</screen>
+ <para>For example, to mount a client on file system foo at mount point
+ /mnt/foo, run:</para>
+ <screen>
+client# mount -t lustre mgsnode:/foo /mnt/foo
+</screen>
<note>
- <para>If a client(s) will be mounted on several file systems, add the following line to <literal>/etc/xattr.conf</literal> file to avoid problems when files are moved between the file systems: <literal>lustre.* skip</literal></para>
+ <para>If a client(s) will be mounted on several file systems, add the
+ following line to
+ <literal>/etc/xattr.conf</literal> file to avoid problems when files are
+ moved between the file systems:
+ <literal>lustre.* skip</literal></para>
</note>
<note>
- <para>To ensure that a new MDT is added to an existing MGS create the MDT by specifying: <literal>--mdt --mgsnode=<replaceable>mgs_NID</replaceable></literal>.</para>
+ <para>To ensure that a new MDT is added to an existing MGS create the MDT
+ by specifying:
+ <literal>--mdt --mgsnode=
+ <replaceable>mgs_NID</replaceable></literal>.</para>
</note>
- <para>A Lustre installation with two file systems (<literal>foo</literal> and <literal>bar</literal>) could look like this, where the MGS node is <literal>mgsnode@tcp0</literal> and the mount points are <literal>/mnt/foo</literal> and <literal>/mnt/bar</literal>.</para>
- <screen>mgsnode# mkfs.lustre --mgs /dev/sda
-mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0 /dev/sdb
-ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0 /dev/sda
-ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1 /dev/sdb
-mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0 /dev/sda
-ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0 /dev/sdc
-ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 /dev/sdd</screen>
- <para>To mount a client on file system foo at mount point <literal>/mnt/foo</literal>, run:</para>
- <screen>client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo</screen>
- <para>To mount a client on file system bar at mount point <literal>/mnt/bar</literal>, run:</para>
- <screen>client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar</screen>
+ <para>A Lustre installation with two file systems (
+ <literal>foo</literal> and
+ <literal>bar</literal>) could look like this, where the MGS node is
+ <literal>mgsnode@tcp0</literal> and the mount points are
+ <literal>/mnt/foo</literal> and
+ <literal>/mnt/bar</literal>.</para>
+ <screen>
+mgsnode# mkfs.lustre --mgs /dev/sda
+mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
+/dev/sdb
+ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
+/dev/sda
+ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
+/dev/sdb
+mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
+/dev/sda
+ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
+/dev/sdc
+ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
+/dev/sdd
+</screen>
+ <para>To mount a client on file system foo at mount point
+ <literal>/mnt/foo</literal>, run:</para>
+ <screen>
+client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
+</screen>
+ <para>To mount a client on file system bar at mount point
+ <literal>/mnt/bar</literal>, run:</para>
+ <screen>
+client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
+</screen>
</section>
<section xml:id="dbdoclet.lfsmkdir" condition='l24'>
- <title><indexterm><primary>operations</primary><secondary>remote directory</secondary></indexterm>Creating a sub-directory on a given MDT</title>
- <para>Lustre 2.4 enables individual sub-directories to be serviced by unique MDTs. An administrator can allocate a sub-directory to a given MDT using the command:</para>
- <screen>client# lfs mkdir –i <replaceable>mdt_index</replaceable> <replaceable>/mount_point/remote_dir</replaceable>
- </screen>
- <para>This command will allocate the sub-directory <literal>remote_dir</literal> onto the MDT of index <literal>mdtindex</literal>. For more information on adding additional MDTs and <literal>mdtindex</literal> see <xref linkend='dbdoclet.addmdtindex'/>.</para>
- <warning><para>An administrator can allocate remote sub-directories to separate MDTs. Creating remote sub-directories in parent directories not hosted on MDT0 is not recommended. This is because the failure of the parent MDT will leave the namespace below it inaccessible. For this reason, by default it is only possible to create remote sub-directories off MDT0. To relax this restriction and enable remote sub-directories off any MDT, an administrator must issue the command <literal>lctl set_param mdd.*.enable_remote_dir=1</literal>.</para></warning>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>remote directory</secondary>
+ </indexterm>Creating a sub-directory on a given MDT</title>
+ <para>Lustre 2.4 enables individual sub-directories to be serviced by
+ unique MDTs. An administrator can allocate a sub-directory to a given MDT
+ using the command:</para>
+ <screen>
+client# lfs mkdir –i
+<replaceable>mdt_index</replaceable>
+<replaceable>/mount_point/remote_dir</replaceable>
+
+</screen>
+ <para>This command will allocate the sub-directory
+ <literal>remote_dir</literal> onto the MDT of index
+ <literal>mdtindex</literal>. For more information on adding additional MDTs
+ and
+ <literal>mdtindex</literal> see
+ <xref linkend='dbdoclet.addmdtindex' />.</para>
+ <warning>
+ <para>An administrator can allocate remote sub-directories to separate
+ MDTs. Creating remote sub-directories in parent directories not hosted on
+ MDT0 is not recommended. This is because the failure of the parent MDT
+ will leave the namespace below it inaccessible. For this reason, by
+ default it is only possible to create remote sub-directories off MDT0. To
+ relax this restriction and enable remote sub-directories off any MDT, an
+ administrator must issue the command
+ <literal>lctl set_param mdd.*.enable_remote_dir=1</literal>.</para>
+ </warning>
</section>
<section xml:id="dbdoclet.50438194_88980">
- <title><indexterm><primary>operations</primary><secondary>parameters</secondary></indexterm>Setting and Retrieving Lustre Parameters</title>
- <para>Several options are available for setting parameters in Lustre:</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>parameters</secondary>
+ </indexterm>Setting and Retrieving Lustre Parameters</title>
+ <para>Several options are available for setting parameters in
+ Lustre:</para>
<itemizedlist>
<listitem>
- <para> When creating a file system, use mkfs.lustre. See <xref linkend="dbdoclet.50438194_17237"/> below.</para>
+ <para>When creating a file system, use mkfs.lustre. See
+ <xref linkend="dbdoclet.50438194_17237" />below.</para>
</listitem>
<listitem>
- <para> When a server is stopped, use tunefs.lustre. See <xref linkend="dbdoclet.50438194_55253"/> below.</para>
+ <para>When a server is stopped, use tunefs.lustre. See
+ <xref linkend="dbdoclet.50438194_55253" />below.</para>
</listitem>
<listitem>
- <para> When the file system is running, use lctl to set or retrieve Lustre parameters. See <xref linkend="dbdoclet.50438194_51490"/> and <xref linkend="dbdoclet.50438194_63247"/> below.</para>
+ <para>When the file system is running, use lctl to set or retrieve
+ Lustre parameters. See
+ <xref linkend="dbdoclet.50438194_51490" />and
+ <xref linkend="dbdoclet.50438194_63247" />below.</para>
</listitem>
</itemizedlist>
<section xml:id="dbdoclet.50438194_17237">
- <title>Setting Tunable Parameters with <literal>mkfs.lustre</literal></title>
- <para>When the file system is first formatted, parameters can simply be added as a <literal>--param</literal> option to the <literal>mkfs.lustre</literal> command. For example:</para>
- <screen>mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda</screen>
- <para>For more details about creating a file system,see <xref linkend="configuringlustre"/>. For more details about <literal>mkfs.lustre</literal>, see <xref linkend="systemconfigurationutilities"/>.</para>
+ <title>Setting Tunable Parameters with
+ <literal>mkfs.lustre</literal></title>
+ <para>When the file system is first formatted, parameters can simply be
+ added as a
+ <literal>--param</literal> option to the
+ <literal>mkfs.lustre</literal> command. For example:</para>
+ <screen>
+mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
+</screen>
+ <para>For more details about creating a file system,see
+ <xref linkend="configuringlustre" />. For more details about
+ <literal>mkfs.lustre</literal>, see
+ <xref linkend="systemconfigurationutilities" />.</para>
</section>
<section xml:id="dbdoclet.50438194_55253">
- <title>Setting Parameters with <literal>tunefs.lustre</literal></title>
- <para>If a server (OSS or MDS) is stopped, parameters can be added to an existing file system
- using the <literal>--param</literal> option to the <literal>tunefs.lustre</literal> command.
- For example:</para>
- <screen>oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda</screen>
- <para>With <literal>tunefs.lustre</literal>, parameters are <emphasis>additive</emphasis> -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old <literal>tunefs.lustre</literal> parameters and just use newly-specified parameters, run:</para>
- <screen>mds# tunefs.lustre --erase-params --param=<replaceable>new_parameters</replaceable> </screen>
- <para>The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as <literal><replaceable>obdname|fsname</replaceable>.<replaceable>obdtype</replaceable>.<replaceable>proc_file_name</replaceable>=<replaceable>value</replaceable></literal>. For example:</para>
- <screen>mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1</screen>
- <para>For more details about <literal>tunefs.lustre</literal>, see <xref linkend="systemconfigurationutilities"/>.</para>
+ <title>Setting Parameters with
+ <literal>tunefs.lustre</literal></title>
+ <para>If a server (OSS or MDS) is stopped, parameters can be added to an
+ existing file system using the
+ <literal>--param</literal> option to the
+ <literal>tunefs.lustre</literal> command. For example:</para>
+ <screen>
+oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
+</screen>
+ <para>With
+ <literal>tunefs.lustre</literal>, parameters are
+ <emphasis>additive</emphasis>-- new parameters are specified in addition
+ to old parameters, they do not replace them. To erase all old
+ <literal>tunefs.lustre</literal> parameters and just use newly-specified
+ parameters, run:</para>
+ <screen>
+mds# tunefs.lustre --erase-params --param=
+<replaceable>new_parameters</replaceable>
+</screen>
+ <para>The tunefs.lustre command can be used to set any parameter settable
+ in a /proc/fs/lustre file and that has its own OBD device, so it can be
+ specified as
+ <literal>
+ <replaceable>obdname|fsname</replaceable>.
+ <replaceable>obdtype</replaceable>.
+ <replaceable>proc_file_name</replaceable>=
+ <replaceable>value</replaceable></literal>. For example:</para>
+ <screen>
+mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
+</screen>
+ <para>For more details about
+ <literal>tunefs.lustre</literal>, see
+ <xref linkend="systemconfigurationutilities" />.</para>
</section>
<section xml:id="dbdoclet.50438194_51490">
- <title>Setting Parameters with <literal>lctl</literal></title>
- <para>When the file system is running, the <literal>lctl</literal> command can be used to set parameters (temporary or permanent) and report current parameter values. Temporary parameters are active as long as the server or client is not shut down. Permanent parameters live through server and client reboots.</para>
+ <title>Setting Parameters with
+ <literal>lctl</literal></title>
+ <para>When the file system is running, the
+ <literal>lctl</literal> command can be used to set parameters (temporary
+ or permanent) and report current parameter values. Temporary parameters
+ are active as long as the server or client is not shut down. Permanent
+ parameters live through server and client reboots.</para>
<note>
- <para>The lctl list_param command enables users to list all parameters that can be set. See <xref linkend="dbdoclet.50438194_88217"/>.</para>
+ <para>The lctl list_param command enables users to list all parameters
+ that can be set. See
+ <xref linkend="dbdoclet.50438194_88217" />.</para>
</note>
- <para>For more details about the <literal>lctl</literal> command, see the examples in the sections below and <xref linkend="systemconfigurationutilities"/>.</para>
+ <para>For more details about the
+ <literal>lctl</literal> command, see the examples in the sections below
+ and
+ <xref linkend="systemconfigurationutilities" />.</para>
<section remap="h4">
<title>Setting Temporary Parameters</title>
- <para>Use <literal>lctl set_param</literal> to set temporary parameters on the node where it is run. These parameters map to items in <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The <literal>lctl set_param</literal> command uses this syntax:</para>
- <screen>lctl set_param [-n] <replaceable>obdtype</replaceable>.<replaceable>obdname</replaceable>.<replaceable>proc_file_name</replaceable>=<replaceable>value</replaceable></screen>
+ <para>Use
+ <literal>lctl set_param</literal> to set temporary parameters on the
+ node where it is run. These parameters map to items in
+ <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
+ <literal>lctl set_param</literal> command uses this syntax:</para>
+ <screen>
+lctl set_param [-n]
+<replaceable>obdtype</replaceable>.
+<replaceable>obdname</replaceable>.
+<replaceable>proc_file_name</replaceable>=
+<replaceable>value</replaceable>
+</screen>
<para>For example:</para>
- <screen># lctl set_param osc.*.max_dirty_mb=1024
+ <screen>
+# lctl set_param osc.*.max_dirty_mb=1024
osc.myth-OST0000-osc.max_dirty_mb=32
osc.myth-OST0001-osc.max_dirty_mb=32
osc.myth-OST0002-osc.max_dirty_mb=32
osc.myth-OST0003-osc.max_dirty_mb=32
-osc.myth-OST0004-osc.max_dirty_mb=32</screen>
+osc.myth-OST0004-osc.max_dirty_mb=32
+</screen>
</section>
<section xml:id="dbdoclet.50438194_64195">
<title>Setting Permanent Parameters</title>
- <para>Use the <literal>lctl conf_param</literal> command to set permanent parameters. In general, the <literal>lctl conf_param</literal> command can be used to specify any parameter settable in a <literal>/proc/fs/lustre</literal> file, with its own OBD device. The <literal>lctl conf_param</literal> command uses this syntax (same as the <literal>mkfs.lustre</literal> and <literal>tunefs.lustre</literal> commands):</para>
- <screen><replaceable>obdname|fsname</replaceable>.<replaceable>obdtype</replaceable>.<replaceable>proc_file_name</replaceable>=<replaceable>value</replaceable>) </screen>
- <para>Here are a few examples of <literal>lctl conf_param</literal> commands:</para>
- <screen>mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
+ <para>Use the
+ <literal>lctl conf_param</literal> command to set permanent parameters.
+ In general, the
+ <literal>lctl conf_param</literal> command can be used to specify any
+ parameter settable in a
+ <literal>/proc/fs/lustre</literal> file, with its own OBD device. The
+ <literal>lctl conf_param</literal> command uses this syntax (same as the
+
+ <literal>mkfs.lustre</literal> and
+ <literal>tunefs.lustre</literal> commands):</para>
+ <screen>
+<replaceable>obdname|fsname</replaceable>.
+<replaceable>obdtype</replaceable>.
+<replaceable>proc_file_name</replaceable>=
+<replaceable>value</replaceable>)
+</screen>
+ <para>Here are a few examples of
+ <literal>lctl conf_param</literal> commands:</para>
+ <screen>
+mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
$ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
$ lctl conf_param testfs.llite.max_read_ahead_mb=16
$ lctl conf_param testfs-MDT0000.lov.stripesize=2M
$ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
$ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
-$ lctl conf_param testfs.sys.timeout=40 </screen>
+$ lctl conf_param testfs.sys.timeout=40
+</screen>
<caution>
- <para>Parameters specified with the <literal>lctl conf_param</literal> command are set permanently in the file system's configuration file on the MGS.</para>
+ <para>Parameters specified with the
+ <literal>lctl conf_param</literal> command are set permanently in the
+ file system's configuration file on the MGS.</para>
</caution>
</section>
<section xml:id="dbdoclet.setparamp" condition='l25'>
- <title>Setting Permanent Parameters with lctl set_param -P</title>
- <para> Use the <literal>lctl set_param -P</literal> to set parameters permanently. This command must be issued on the MGS. The given parameter is set on every host using <literal>lctl</literal> upcall. Parameters map to items in <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The <literal>lctl set_param</literal> command uses this syntax:</para>
- <screen>lctl set_param -P <replaceable>obdtype</replaceable>.<replaceable>obdname</replaceable>.<replaceable>proc_file_name</replaceable>=<replaceable>value</replaceable></screen>
- <para>For example:</para>
- <screen># lctl set_param -P osc.*.max_dirty_mb=1024
+ <title>Setting Permanent Parameters with lctl set_param -P</title>
+ <para>Use the
+ <literal>lctl set_param -P</literal> to set parameters permanently. This
+ command must be issued on the MGS. The given parameter is set on every
+ host using
+ <literal>lctl</literal> upcall. Parameters map to items in
+ <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
+ <literal>lctl set_param</literal> command uses this syntax:</para>
+ <screen>
+lctl set_param -P
+<replaceable>obdtype</replaceable>.
+<replaceable>obdname</replaceable>.
+<replaceable>proc_file_name</replaceable>=
+<replaceable>value</replaceable>
+</screen>
+ <para>For example:</para>
+ <screen>
+# lctl set_param -P osc.*.max_dirty_mb=1024
osc.myth-OST0000-osc.max_dirty_mb=32
osc.myth-OST0001-osc.max_dirty_mb=32
osc.myth-OST0002-osc.max_dirty_mb=32
osc.myth-OST0003-osc.max_dirty_mb=32
-osc.myth-OST0004-osc.max_dirty_mb=32 </screen>
- <para>Use <literal>-d </literal> (only with -P) option to delete permanent parameter. Syntax:</para>
- <screen>lctl set_param -P -d<replaceable>obdtype</replaceable>.<replaceable>obdname</replaceable>.<replaceable>proc_file_name</replaceable></screen>
- <para>For example:</para>
- <screen># lctl set_param -P -d osc.*.max_dirty_mb </screen>
+osc.myth-OST0004-osc.max_dirty_mb=32
+</screen>
+ <para>Use
+ <literal>-d</literal>(only with -P) option to delete permanent
+ parameter. Syntax:</para>
+ <screen>
+lctl set_param -P -d
+<replaceable>obdtype</replaceable>.
+<replaceable>obdname</replaceable>.
+<replaceable>proc_file_name</replaceable>
+</screen>
+ <para>For example:</para>
+ <screen>
+# lctl set_param -P -d osc.*.max_dirty_mb
+</screen>
</section>
<section xml:id="dbdoclet.50438194_88217">
<title>Listing Parameters</title>
- <para>To list Lustre or LNET parameters that are available to set, use the <literal>lctl list_param</literal> command. For example:</para>
- <screen>lctl list_param [-FR] <replaceable>obdtype</replaceable>.<replaceable>obdname</replaceable></screen>
- <para>The following arguments are available for the <literal>lctl list_param</literal> command.</para>
- <para><literal>-F</literal> Add '<literal>/</literal>', '<literal>@</literal>' or '<literal>=</literal>' for directories, symlinks and writeable files, respectively</para>
- <para><literal>-R</literal> Recursively lists all parameters under the specified path</para>
+ <para>To list Lustre or LNET parameters that are available to set, use
+ the
+ <literal>lctl list_param</literal> command. For example:</para>
+ <screen>
+lctl list_param [-FR]
+<replaceable>obdtype</replaceable>.
+<replaceable>obdname</replaceable>
+</screen>
+ <para>The following arguments are available for the
+ <literal>lctl list_param</literal> command.</para>
+ <para>
+ <literal>-F</literal> Add '
+ <literal>/</literal>', '
+ <literal>@</literal>' or '
+ <literal>=</literal>' for directories, symlinks and writeable files,
+ respectively</para>
+ <para>
+ <literal>-R</literal> Recursively lists all parameters under the
+ specified path</para>
<para>For example:</para>
- <screen>oss# lctl list_param obdfilter.lustre-OST0000 </screen>
+ <screen>
+oss# lctl list_param obdfilter.lustre-OST0000
+</screen>
</section>
<section xml:id="dbdoclet.50438194_63247">
<title>Reporting Current Parameter Values</title>
- <para>To report current Lustre parameter values, use the <literal>lctl get_param</literal> command with this syntax:</para>
- <screen>lctl get_param [-n] <replaceable>obdtype</replaceable>.<replaceable>obdname</replaceable>.<replaceable>proc_file_name</replaceable></screen>
+ <para>To report current Lustre parameter values, use the
+ <literal>lctl get_param</literal> command with this syntax:</para>
+ <screen>
+lctl get_param [-n]
+<replaceable>obdtype</replaceable>.
+<replaceable>obdname</replaceable>.
+<replaceable>proc_file_name</replaceable>
+</screen>
<para>This example reports data on RPC service times.</para>
- <screen>oss# lctl get_param -n ost.*.ost_io.timeouts
-service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 </screen>
- <para>This example reports the amount of space this client has reserved for writeback cache with each OST:</para>
- <screen>client# lctl get_param osc.*.cur_grant_bytes
+ <screen>
+oss# lctl get_param -n ost.*.ost_io.timeouts
+service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
+</screen>
+ <para>This example reports the amount of space this client has reserved
+ for writeback cache with each OST:</para>
+ <screen>
+client# lctl get_param osc.*.cur_grant_bytes
osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
-osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384</screen>
+osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
+</screen>
</section>
</section>
</section>
<section xml:id="dbdoclet.50438194_41817">
- <title><indexterm><primary>operations</primary><secondary>failover</secondary></indexterm>Specifying NIDs and Failover</title>
- <para>If a node has multiple network interfaces, it may have multiple NIDs, which must all be
- identified so other nodes can choose the NID that is appropriate for their network interfaces.
- Typically, NIDs are specified in a list delimited by commas (<literal>,</literal>). However,
- when failover nodes are specified, the NIDs are delimited by a colon (<literal>:</literal>) or
- by repeating a keyword such as <literal>--mgsnode=</literal> or
- <literal>--servicenode=</literal>). </para>
- <para>To display the NIDs of all servers in networks configured to work with the Lustre file
- system, run (while LNET is running):</para>
- <screen>lctl list_nids</screen>
- <para>In the example below, <literal>mds0</literal> and <literal>mds1</literal> are configured
- as a combined MGS/MDT failover pair and <literal>oss0</literal> and <literal>oss1</literal>
- are configured as an OST failover pair. The Ethernet address for <literal>mds0</literal> is
- 192.168.10.1, and for <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
- <literal>oss0</literal> and <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
- respectively.</para>
- <screen>mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>failover</secondary>
+ </indexterm>Specifying NIDs and Failover</title>
+ <para>If a node has multiple network interfaces, it may have multiple NIDs,
+ which must all be identified so other nodes can choose the NID that is
+ appropriate for their network interfaces. Typically, NIDs are specified in
+ a list delimited by commas (
+ <literal>,</literal>). However, when failover nodes are specified, the NIDs
+ are delimited by a colon (
+ <literal>:</literal>) or by repeating a keyword such as
+ <literal>--mgsnode=</literal> or
+ <literal>--servicenode=</literal>).</para>
+ <para>To display the NIDs of all servers in networks configured to work
+ with the Lustre file system, run (while LNET is running):</para>
+ <screen>
+lctl list_nids
+</screen>
+ <para>In the example below,
+ <literal>mds0</literal> and
+ <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
+ and
+ <literal>oss0</literal> and
+ <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
+ address for
+ <literal>mds0</literal> is 192.168.10.1, and for
+ <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
+ <literal>oss0</literal> and
+ <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
+ respectively.</para>
+ <screen>
+mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
--servicenode=192.168.10.2@tcp0 \
-–servicenode=192.168.10.1@tcp0 /dev/sda1
mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
/mnt/testfs
mds0# umount /mnt/mdt
mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
-mds1# cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status</screen>
- <para>Where multiple NIDs are specified separated by commas (for example,
- <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer to the same host,
- and the Lustre software chooses the <emphasis>best</emphasis> one for communication. When a
- pair of NIDs is separated by a colon (for example,
- <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer to two different
- hosts and are treated as a failover pair (the Lustre software tries the first one, and if that
- fails, it tries the second one.)</para>
- <para>Two options to <literal>mkfs.lustre</literal> can be used to specify failover nodes.
- Introduced in Lustre software release 2.0, the <literal>--servicenode</literal> option is used
- to specify all service NIDs, including those for primary nodes and failover nodes. When the
- <literal>--servicenode</literal>option is used, the first service node to load the target
- device becomes the primary service node, while nodes corresponding to the other specified NIDs
- become failover locations for the target device. An older option,
- <literal>--failnode</literal>, specifies just the NIDS of failover nodes. For more
- information about the <literal>--servicenode</literal> and <literal>--failnode</literal>
- options, see <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="configuringfailover"
- />.</para>
+mds1# cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status
+</screen>
+ <para>Where multiple NIDs are specified separated by commas (for example,
+ <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
+ to the same host, and the Lustre software chooses the
+ <emphasis>best</emphasis>one for communication. When a pair of NIDs is
+ separated by a colon (for example,
+ <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
+ to two different hosts and are treated as a failover pair (the Lustre
+ software tries the first one, and if that fails, it tries the second
+ one.)</para>
+ <para>Two options to
+ <literal>mkfs.lustre</literal> can be used to specify failover nodes.
+ Introduced in Lustre software release 2.0, the
+ <literal>--servicenode</literal> option is used to specify all service NIDs,
+ including those for primary nodes and failover nodes. When the
+ <literal>--servicenode</literal> option is used, the first service node to
+ load the target device becomes the primary service node, while nodes
+ corresponding to the other specified NIDs become failover locations for the
+ target device. An older option,
+ <literal>--failnode</literal>, specifies just the NIDS of failover nodes.
+ For more information about the
+ <literal>--servicenode</literal> and
+ <literal>--failnode</literal> options, see
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="configuringfailover" />.</para>
</section>
<section xml:id="dbdoclet.50438194_70905">
- <title><indexterm><primary>operations</primary><secondary>erasing a file system</secondary></indexterm>Erasing a File System</title>
- <para>If you want to erase a file system and permanently delete all the data in the file system,
- run this command on your targets:</para>
- <screen>$ "mkfs.lustre --reformat"</screen>
- <para>If you are using a separate MGS and want to keep other file systems defined on that MGS, then set the <literal>writeconf</literal> flag on the MDT for that file system. The <literal>writeconf</literal> flag causes the configuration logs to be erased; they are regenerated the next time the servers start.</para>
- <para>To set the <literal>writeconf</literal> flag on the MDT:</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>erasing a file system</secondary>
+ </indexterm>Erasing a File System</title>
+ <para>If you want to erase a file system and permanently delete all the
+ data in the file system, run this command on your targets:</para>
+ <screen>
+$ "mkfs.lustre --reformat"
+</screen>
+ <para>If you are using a separate MGS and want to keep other file systems
+ defined on that MGS, then set the
+ <literal>writeconf</literal> flag on the MDT for that file system. The
+ <literal>writeconf</literal> flag causes the configuration logs to be
+ erased; they are regenerated the next time the servers start.</para>
+ <para>To set the
+ <literal>writeconf</literal> flag on the MDT:</para>
<orderedlist>
<listitem>
<para>Unmount all clients/servers using this file system, run:</para>
- <screen>$ umount /mnt/lustre</screen>
+ <screen>
+$ umount /mnt/lustre
+</screen>
</listitem>
<listitem>
<para>Permanently erase the file system and, presumably, replace it
- with another file system, run:</para>
- <screen>$ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/<emphasis>{mdsdev}</emphasis></screen>
+ with another file system, run:</para>
+ <screen>
+$ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
+<emphasis>{mdsdev}</emphasis>
+</screen>
</listitem>
<listitem>
- <para>If you have a separate MGS (that you do not want to reformat), then add the <literal>--writeconf</literal> flag to <literal>mkfs.lustre</literal> on the MDT, run:</para>
- <screen>$ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=<replaceable>mgs_nid</replaceable> --mdt --index=0 <replaceable>/dev/mds_device</replaceable></screen>
+ <para>If you have a separate MGS (that you do not want to reformat),
+ then add the
+ <literal>--writeconf</literal> flag to
+ <literal>mkfs.lustre</literal> on the MDT, run:</para>
+ <screen>
+$ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
+<replaceable>mgs_nid</replaceable> --mdt --index=0
+<replaceable>/dev/mds_device</replaceable>
+</screen>
</listitem>
</orderedlist>
<note>
- <para>If you have a combined MGS/MDT, reformatting the MDT reformats the MGS as well, causing all configuration information to be lost; you can start building your new file system. Nothing needs to be done with old disks that will not be part of the new file system, just do not mount them.</para>
+ <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
+ MGS as well, causing all configuration information to be lost; you can
+ start building your new file system. Nothing needs to be done with old
+ disks that will not be part of the new file system, just do not mount
+ them.</para>
</note>
</section>
<section xml:id="dbdoclet.50438194_16954">
- <title><indexterm><primary>operations</primary><secondary>reclaiming space</secondary></indexterm>Reclaiming Reserved Disk Space</title>
- <para>All current Lustre installations run the ldiskfs file system internally on service nodes.
- By default, ldiskfs reserves 5% of the disk space to avoid file system fragmentation. In order
- to reclaim this space, run the following command on your OSS for each OST in the file
- system:</para>
- <screen>tune2fs [-m reserved_blocks_percent] /dev/<emphasis>{ostdev}</emphasis></screen>
- <para>You do not need to shut down Lustre before running this command or restart it afterwards.</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>reclaiming space</secondary>
+ </indexterm>Reclaiming Reserved Disk Space</title>
+ <para>All current Lustre installations run the ldiskfs file system
+ internally on service nodes. By default, ldiskfs reserves 5% of the disk
+ space to avoid file system fragmentation. In order to reclaim this space,
+ run the following command on your OSS for each OST in the file
+ system:</para>
+ <screen>
+tune2fs [-m reserved_blocks_percent] /dev/
+<emphasis>{ostdev}</emphasis>
+</screen>
+ <para>You do not need to shut down Lustre before running this command or
+ restart it afterwards.</para>
<warning>
- <para>Reducing the space reservation can cause severe performance degradation as the OST file
- system becomes more than 95% full, due to difficulty in locating large areas of contiguous
- free space. This performance degradation may persist even if the space usage drops below 95%
- again. It is recommended NOT to reduce the reserved disk space below 5%.</para>
+ <para>Reducing the space reservation can cause severe performance
+ degradation as the OST file system becomes more than 95% full, due to
+ difficulty in locating large areas of contiguous free space. This
+ performance degradation may persist even if the space usage drops below
+ 95% again. It is recommended NOT to reduce the reserved disk space below
+ 5%.</para>
</warning>
</section>
<section xml:id="dbdoclet.50438194_69998">
- <title><indexterm><primary>operations</primary><secondary>replacing an OST or MDS</secondary></indexterm>Replacing an Existing OST or MDT</title>
- <para>To copy the contents of an existing OST to a new OST (or an old MDT to a new MDT), follow the process for either OST/MDT backups in
- <xref linkend='dbdoclet.50438207_71633'/> or
- <xref linkend='dbdoclet.50438207_21638'/>. For more information on removing a MDT, see <xref linkend='dbdoclet.rmremotedir'/>.</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>replacing an OST or MDS</secondary>
+ </indexterm>Replacing an Existing OST or MDT</title>
+ <para>To copy the contents of an existing OST to a new OST (or an old MDT
+ to a new MDT), follow the process for either OST/MDT backups in
+ <xref linkend='dbdoclet.50438207_71633' />or
+ <xref linkend='dbdoclet.50438207_21638' />. For more information on
+ removing a MDT, see
+ <xref linkend='dbdoclet.rmremotedir' />.</para>
</section>
<section xml:id="dbdoclet.50438194_30872">
- <title><indexterm><primary>operations</primary><secondary>identifying OSTs</secondary></indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
- <para>Use this procedure to identify the file containing a given object on a given OST.</para>
+ <title>
+ <indexterm>
+ <primary>operations</primary>
+ <secondary>identifying OSTs</secondary>
+ </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
+ <para>Use this procedure to identify the file containing a given object on
+ a given OST.</para>
<orderedlist>
<listitem>
- <para>On the OST (as root), run <literal>debugfs</literal> to display the file identifier (<literal>FID</literal>) of the file associated with the object.</para>
- <para>For example, if the object is <literal>34976</literal> on <literal>/dev/lustre/ost_test2</literal>, the debug command is:
- <screen># debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2 </screen></para>
- <para>The command output is:
- <screen>debugfs 1.42.3.wc3 (15-Aug-2012)
+ <para>On the OST (as root), run
+ <literal>debugfs</literal> to display the file identifier (
+ <literal>FID</literal>) of the file associated with the object.</para>
+ <para>For example, if the object is
+ <literal>34976</literal> on
+ <literal>/dev/lustre/ost_test2</literal>, the debug command is:
+ <screen>
+# debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
+</screen></para>
+ <para>The command output is:
+ <screen>
+debugfs 1.42.3.wc3 (15-Aug-2012)
/dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000
Generation: 2393149953 Version: 0x0000002a:00005f81
crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
Size of extra inode fields: 24
Extended attributes stored in inode body:
- fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000 00 00 00 00 00 00 00 00 " (32)
+ fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
+00 00 00 00 00 00 00 00 " (32)
fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1
EXTENTS:
(0-64):4620544-4620607
</screen></para>
</listitem>
<listitem>
- <para>For Lustre software release 2.x file systems, the parent FID will be of the form
- [0x200000400:0x122:0x0] and can be resolved directly using the <literal>lfs fid2path
- [0x200000404:0x122:0x0] /mnt/lustre</literal> command on any Lustre client, and the
- process is complete.</para>
+ <para>For Lustre software release 2.x file systems, the parent FID will
+ be of the form [0x200000400:0x122:0x0] and can be resolved directly
+ using the
+ <literal>lfs fid2path [0x200000404:0x122:0x0]
+ /mnt/lustre</literal> command on any Lustre client, and the process is
+ complete.</para>
</listitem>
<listitem>
<para>In this example the parent inode FID is an upgraded 1.x inode
- (due to the first part of the FID being below 0x200000400), the
- MDT inode number is <literal>0x24dab9</literal> and generation <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved using
- <literal>debugfs</literal>.</para>
+ (due to the first part of the FID being below 0x200000400), the MDT
+ inode number is
+ <literal>0x24dab9</literal> and generation
+ <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved
+ using
+ <literal>debugfs</literal>.</para>
</listitem>
<listitem>
- <para>On the MDS (as root), use <literal>debugfs</literal> to find the file associated with the inode:</para>
- <screen># debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test </screen>
+ <para>On the MDS (as root), use
+ <literal>debugfs</literal> to find the file associated with the
+ inode:</para>
+ <screen>
+# debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
+</screen>
<para>Here is the command output:</para>
- <screen>debugfs 1.42.3.wc2 (15-Aug-2012)
+ <screen>
+debugfs 1.42.3.wc2 (15-Aug-2012)
/dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
s
Inode Pathname
-2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP</screen>
+2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
+</screen>
</listitem>
</orderedlist>
- <para>The command lists the inode and pathname associated with the object.</para>
+ <para>The command lists the inode and pathname associated with the
+ object.</para>
<note>
- <para><literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may take a long time to complete.</para>
+ <para>
+ <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
+ take a long time to complete.</para>
</note>
<note>
- <para>To find the Lustre file from a disk LBA, follow the steps listed in the document at this URL: <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then, follow the steps above to resolve the Lustre filename.</para>
+ <para>To find the Lustre file from a disk LBA, follow the steps listed in
+ the document at this URL:
+ <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">
+ http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then,
+ follow the steps above to resolve the Lustre filename.</para>
</note>
</section>
</chapter>
-<?xml version='1.0' encoding='UTF-8'?>
-<chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="managingfilesystemio">
- <title xml:id="managingfilesystemio.title">Managing the File System and I/O</title>
- <para>This chapter describes file striping and I/O options, and includes the following sections:</para>
+<?xml version='1.0' encoding='utf-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+xml:id="managingfilesystemio">
+ <title xml:id="managingfilesystemio.title">Managing the File System and
+ I/O</title>
+ <para>This chapter describes file striping and I/O options, and includes the
+ following sections:</para>
<itemizedlist>
<listitem>
- <para><xref linkend="dbdoclet.50438211_17536"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438211_17536" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438211_75549"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438211_75549" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438211_11204"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438211_11204" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438211_80295"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438211_80295" />
+ </para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438211_61024"/></para>
+ <para>
+ <xref linkend="dbdoclet.50438211_61024" />
+ </para>
</listitem>
</itemizedlist>
<section xml:id="dbdoclet.50438211_17536">
- <title><indexterm><primary>I/O</primary></indexterm>
- <indexterm><primary>I/O</primary><secondary>full OSTs</secondary></indexterm>
- Handling Full OSTs</title>
- <para>Sometimes a Lustre file system becomes unbalanced, often due to incorrectly-specified stripe settings, or when very large files are created that are not striped over all of the OSTs. If an OST is full and an attempt is made to write more information to the file system, an error occurs. The procedures below describe how to handle a full OST.</para>
- <para>The MDS will normally handle space balancing automatically at file creation time, and this procedure is normally not needed, but may be desirable in certain circumstances (e.g. when creating very large files that would consume more than the total free space of the full OSTs).</para>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ </indexterm>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>full OSTs</secondary>
+ </indexterm>Handling Full OSTs</title>
+ <para>Sometimes a Lustre file system becomes unbalanced, often due to
+ incorrectly-specified stripe settings, or when very large files are created
+ that are not striped over all of the OSTs. If an OST is full and an attempt
+ is made to write more information to the file system, an error occurs. The
+ procedures below describe how to handle a full OST.</para>
+ <para>The MDS will normally handle space balancing automatically at file
+ creation time, and this procedure is normally not needed, but may be
+ desirable in certain circumstances (e.g. when creating very large files
+ that would consume more than the total free space of the full OSTs).</para>
<section remap="h3">
- <title><indexterm><primary>I/O</primary><secondary>OST space usage</secondary></indexterm>Checking OST Space Usage</title>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>OST space usage</secondary>
+ </indexterm>Checking OST Space Usage</title>
<para>The example below shows an unbalanced file system:</para>
- <screen>client# lfs df -h
+ <screen>
+client# lfs df -h
UUID bytes Used Available \
Use% Mounted on
lustre-MDT0000_UUID 4.4G 214.5M 3.9G \
36% /mnt/lustre[OST:5]
filesystem summary: 11.8G 5.4G 5.8G \
-45% /mnt/lustre</screen>
- <para>In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows:</para>
- <screen>client# lfs setstripe /mnt/lustre 4M 0 -1
+45% /mnt/lustre
+</screen>
+ <para>In this case, OST:2 is almost full and when an attempt is made to
+ write additional information to the file system (even with uniform
+ striping over all the OSTs), the write command fails as follows:</para>
+ <screen>
+client# lfs setstripe /mnt/lustre 4M 0 -1
client# dd if=/dev/zero of=/mnt/lustre/test_3 bs=10M count=100
-dd: writing '/mnt/lustre/test_3': No space left on device
+dd: writing '/mnt/lustre/test_3': No space left on device
98+0 records in
97+0 records out
-1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s</screen>
+1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s
+</screen>
</section>
<section remap="h3">
- <title><indexterm><primary>I/O</primary><secondary>taking OST offline</secondary></indexterm>Taking a Full OST Offline</title>
- <para>To avoid running out of space in the file system, if the OST usage is imbalanced and one or more OSTs are close to being full while there are others that have a lot of space, the full OSTs may optionally be deactivated at the MDS to prevent the MDS from allocating new objects there.</para>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>taking OST offline</secondary>
+ </indexterm>Taking a Full OST Offline</title>
+ <para>To avoid running out of space in the file system, if the OST usage
+ is imbalanced and one or more OSTs are close to being full while there
+ are others that have a lot of space, the full OSTs may optionally be
+ deactivated at the MDS to prevent the MDS from allocating new objects
+ there.</para>
<orderedlist>
<listitem>
<para>Log into the MDS server:</para>
- <screen>client# ssh root@192.168.0.10
-root@192.168.0.10's password:
-Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6</screen>
+ <screen>
+client# ssh root@192.168.0.10
+root@192.168.0.10's password:
+Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6
+</screen>
</listitem>
<listitem>
- <para>Use the <literal>lctl dl</literal> command to show the status of all file system components:</para>
- <screen>mds# lctl dl
+ <para>Use the
+ <literal>lctl dl</literal> command to show the status of all file
+ system components:</para>
+ <screen>
+mds# lctl dl
0 UP mgs MGS MGS 9
1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
2 UP mdt MDS MDS_uuid 3
7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
-10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5</screen>
+10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
+</screen>
</listitem>
<listitem>
- <para>Use <literal>lctl</literal> deactivate to take the full OST offline:</para>
- <screen>mds# lctl --device 7 deactivate</screen>
+ <para>Use
+ <literal>lctl</literal> deactivate to take the full OST
+ offline:</para>
+ <screen>
+mds# lctl --device 7 deactivate
+</screen>
</listitem>
<listitem>
<para>Display the status of the file system components:</para>
- <screen>mds# lctl dl
+ <screen>
+mds# lctl dl
0 UP mgs MGS MGS 9
1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
2 UP mdt MDS MDS_uuid 3
7 IN osc lustre-OST0002-osc lustre-mdtlov_UUID 5
8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
-10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5</screen>
+10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
+</screen>
</listitem>
</orderedlist>
- <para>The device list shows that OST0002 is now inactive. When new files are created in the file system, they will only use the remaining active OSTs. Either manual space rebalancing can be done by migrating data to other OSTs, as shown in the next section, or normal file deletion and creation can be allowed to passively rebalance the space usage.</para>
+ <para>The device list shows that OST0002 is now inactive. When new files
+ are created in the file system, they will only use the remaining active
+ OSTs. Either manual space rebalancing can be done by migrating data to
+ other OSTs, as shown in the next section, or normal file deletion and
+ creation can be allowed to passively rebalance the space usage.</para>
</section>
<section remap="h3">
- <title>
- <indexterm><primary>I/O</primary><secondary>migrating data</secondary></indexterm>
- <indexterm><primary>maintenance</primary><secondary>full OSTs</secondary></indexterm>
- Migrating Data within a File System</title>
- <para>As stripes cannot be moved within the file system, data must be migrated manually by copying and renaming the file, removing the original file, and renaming the new file with the original file name. The simplest way to do this is to use the <literal>lfs_migrate</literal> command (see <xref linkend="dbdoclet.50438206_42260"/>). However, the steps for migrating a file by hand are also shown here for reference.</para>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>migrating data</secondary>
+ </indexterm>
+ <indexterm>
+ <primary>maintenance</primary>
+ <secondary>full OSTs</secondary>
+ </indexterm>Migrating Data within a File System</title>
+ <para>As stripes cannot be moved within the file system, data must be
+ migrated manually by copying and renaming the file, removing the original
+ file, and renaming the new file with the original file name. The simplest
+ way to do this is to use the
+ <literal>lfs_migrate</literal> command (see
+ <xref linkend="dbdoclet.50438206_42260" />). However, the steps for
+ migrating a file by hand are also shown here for reference.</para>
<orderedlist>
<listitem>
<para>Identify the file(s) to be moved.</para>
- <para>In the example below, output from the <literal>getstripe</literal> command indicates that the file <literal>test_2</literal> is located entirely on OST2:</para>
- <screen>client# lfs getstripe /mnt/lustre/test_2
+ <para>In the example below, output from the
+ <literal>getstripe</literal> command indicates that the file
+ <literal>test_2</literal> is located entirely on OST2:</para>
+ <screen>
+client# lfs getstripe /mnt/lustre/test_2
/mnt/lustre/test_2
obdidx objid objid group
- 2 8 0x8 0</screen>
+ 2 8 0x8 0
+</screen>
</listitem>
<listitem>
- <para>To move single object(s), create a new copy and remove the original. Enter:</para>
- <screen>client# cp -a /mnt/lustre/test_2 /mnt/lustre/test_2.tmp
-client# mv /mnt/lustre/test_2.tmp /mnt/lustre/test_2</screen>
+ <para>To move single object(s), create a new copy and remove the
+ original. Enter:</para>
+ <screen>
+client# cp -a /mnt/lustre/test_2 /mnt/lustre/test_2.tmp
+client# mv /mnt/lustre/test_2.tmp /mnt/lustre/test_2
+</screen>
</listitem>
<listitem>
<para>To migrate large files from one or more OSTs, enter:</para>
- <screen>client# lfs find --ost <replaceable>ost_name</replaceable> -size +1G | lfs_migrate -y</screen>
+ <screen>
+client# lfs find --ost
+<replaceable>ost_name</replaceable> -size +1G | lfs_migrate -y
+</screen>
</listitem>
<listitem>
<para>Check the file system balance.</para>
- <para>The <literal>df</literal> output in the example below shows a more balanced system compared to the <literal>df</literal> output in the example in <xref linkend="dbdoclet.50438211_17536"/>.</para>
- <screen>client# lfs df -h
+ <para>The
+ <literal>df</literal> output in the example below shows a more
+ balanced system compared to the
+ <literal>df</literal> output in the example in
+ <xref linkend="dbdoclet.50438211_17536" />.</para>
+ <screen>
+client# lfs df -h
UUID bytes Used Available Use% \
Mounted on
lustre-MDT0000_UUID 4.4G 214.5M 3.9G 4% \
/mnt/lustre[OST:5]
filesystem summary: 11.8G 7.3G 3.9G 61% \
-/mnt/lustre</screen>
+/mnt/lustre
+</screen>
</listitem>
</orderedlist>
</section>
<section remap="h3">
- <title>
- <indexterm><primary>I/O</primary><secondary>bringing OST online</secondary></indexterm>
- <indexterm><primary>maintenance</primary><secondary>bringing OST online</secondary></indexterm>
-
- Returning an Inactive OST Back Online</title>
- <para>Once the deactivated OST(s) no longer are severely imbalanced, due to either active or passive data redistribution, they should be reactivated so they will again have new files allocated on them.</para>
- <screen>[mds]# lctl --device 7 activate
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>bringing OST online</secondary>
+ </indexterm>
+ <indexterm>
+ <primary>maintenance</primary>
+ <secondary>bringing OST online</secondary>
+ </indexterm>Returning an Inactive OST Back Online</title>
+ <para>Once the deactivated OST(s) no longer are severely imbalanced, due
+ to either active or passive data redistribution, they should be
+ reactivated so they will again have new files allocated on them.</para>
+ <screen>
+[mds]# lctl --device 7 activate
[mds]# lctl dl
0 UP mgs MGS MGS 9
1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-816dd1e813 5
</section>
</section>
<section xml:id="dbdoclet.50438211_75549">
- <title>
- <indexterm><primary>I/O</primary><secondary>pools</secondary></indexterm>
- <indexterm><primary>maintenance</primary><secondary>pools</secondary></indexterm>
- <indexterm><primary>pools</primary></indexterm>
- Creating and Managing OST Pools</title>
- <para>The OST pools feature enables users to group OSTs together to make object placement more flexible. A 'pool' is the name associated with an arbitrary subset of OSTs in a Lustre cluster.</para>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>pools</secondary>
+ </indexterm>
+ <indexterm>
+ <primary>maintenance</primary>
+ <secondary>pools</secondary>
+ </indexterm>
+ <indexterm>
+ <primary>pools</primary>
+ </indexterm>Creating and Managing OST Pools</title>
+ <para>The OST pools feature enables users to group OSTs together to make
+ object placement more flexible. A 'pool' is the name associated with an
+ arbitrary subset of OSTs in a Lustre cluster.</para>
<para>OST pools follow these rules:</para>
<itemizedlist>
<listitem>
<para>No ordering of OSTs in a pool is defined or implied.</para>
</listitem>
<listitem>
- <para>Stripe allocation within a pool follows the same rules as the normal stripe allocator.</para>
+ <para>Stripe allocation within a pool follows the same rules as the
+ normal stripe allocator.</para>
</listitem>
<listitem>
- <para>OST membership in a pool is flexible, and can change over time.</para>
+ <para>OST membership in a pool is flexible, and can change over
+ time.</para>
</listitem>
</itemizedlist>
- <para>When an OST pool is defined, it can be used to allocate files. When file or directory striping is set to a pool, only OSTs in the pool are candidates for striping. If a stripe_index is specified which refers to an OST that is not a member of the pool, an error is returned.</para>
- <para>OST pools are used only at file creation. If the definition of a pool changes (an OST is added or removed or the pool is destroyed), already-created files are not affected.</para>
+ <para>When an OST pool is defined, it can be used to allocate files. When
+ file or directory striping is set to a pool, only OSTs in the pool are
+ candidates for striping. If a stripe_index is specified which refers to an
+ OST that is not a member of the pool, an error is returned.</para>
+ <para>OST pools are used only at file creation. If the definition of a pool
+ changes (an OST is added or removed or the pool is destroyed),
+ already-created files are not affected.</para>
<note>
- <para>An error (<literal>EINVAL</literal>) results if you create a file using an empty pool.</para>
+ <para>An error (
+ <literal>EINVAL</literal>) results if you create a file using an empty
+ pool.</para>
</note>
<note>
- <para>If a directory has pool striping set and the pool is subsequently removed, the new files created in this directory have the (non-pool) default striping pattern for that directory applied and no error is returned.</para>
+ <para>If a directory has pool striping set and the pool is subsequently
+ removed, the new files created in this directory have the (non-pool)
+ default striping pattern for that directory applied and no error is
+ returned.</para>
</note>
<section remap="h3">
<title>Working with OST Pools</title>
- <para>OST pools are defined in the configuration log on the MGS. Use the lctl command to:</para>
+ <para>OST pools are defined in the configuration log on the MGS. Use the
+ lctl command to:</para>
<itemizedlist>
<listitem>
<para>Create/destroy a pool</para>
<para>List pools and OSTs in a specific pool</para>
</listitem>
</itemizedlist>
- <para>The lctl command MUST be run on the MGS. Another requirement for managing OST pools is to either have the MDT and MGS on the same node or have a Lustre client mounted on the MGS node, if it is separate from the MDS. This is needed to validate the pool commands being run are correct.</para>
+ <para>The lctl command MUST be run on the MGS. Another requirement for
+ managing OST pools is to either have the MDT and MGS on the same node or
+ have a Lustre client mounted on the MGS node, if it is separate from the
+ MDS. This is needed to validate the pool commands being run are
+ correct.</para>
<caution>
- <para>Running the <literal>writeconf</literal> command on the MDS erases all pools information (as well as any other parameters set using <literal>lctl conf_param</literal>). We recommend that the pools definitions (and <literal>conf_param</literal> settings) be executed using a script, so they can be reproduced easily after a <literal>writeconf</literal> is performed.</para>
+ <para>Running the
+ <literal>writeconf</literal> command on the MDS erases all pools
+ information (as well as any other parameters set using
+ <literal>lctl conf_param</literal>). We recommend that the pools
+ definitions (and
+ <literal>conf_param</literal> settings) be executed using a script, so
+ they can be reproduced easily after a
+ <literal>writeconf</literal> is performed.</para>
</caution>
<para>To create a new pool, run:</para>
- <screen>mgs# lctl pool_new <replaceable>fsname</replaceable>.<replaceable>poolname</replaceable></screen>
+ <screen>
+mgs# lctl pool_new
+<replaceable>fsname</replaceable>.
+<replaceable>poolname</replaceable>
+</screen>
<note>
<para>The pool name is an ASCII string up to 16 characters.</para>
</note>
<para>To add the named OST to a pool, run:</para>
- <screen>mgs# lctl pool_add <replaceable>fsname</replaceable>.<replaceable>poolname</replaceable> <replaceable>ost_list</replaceable></screen>
+ <screen>
+mgs# lctl pool_add
+<replaceable>fsname</replaceable>.
+<replaceable>poolname</replaceable>
+<replaceable>ost_list</replaceable>
+</screen>
<para>Where:</para>
<itemizedlist>
<listitem>
- <para><literal><replaceable>ost_list</replaceable> is <replaceable>fsname</replaceable>-OST<replaceable>index_range</replaceable></literal></para>
+ <para>
+ <literal>
+ <replaceable>ost_list</replaceable>is
+ <replaceable>fsname</replaceable>-OST
+ <replaceable>index_range</replaceable></literal>
+ </para>
</listitem>
<listitem>
- <para><literal><replaceable>index_range</replaceable> is <replaceable>ost_index_start</replaceable>-<replaceable>ost_index_end[,index_range]</replaceable></literal> or <literal><replaceable>ost_index_start</replaceable>-<replaceable>ost_index_end/step</replaceable></literal></para>
+ <para>
+ <literal>
+ <replaceable>index_range</replaceable>is
+ <replaceable>ost_index_start</replaceable>-
+ <replaceable>ost_index_end[,index_range]</replaceable></literal> or
+ <literal>
+ <replaceable>ost_index_start</replaceable>-
+ <replaceable>ost_index_end/step</replaceable></literal></para>
</listitem>
</itemizedlist>
- <para>If the leading <literal><replaceable>fsname</replaceable></literal> and/or ending <literal>_UUID</literal> are missing, they are automatically added.</para>
- <para> For example, to add even-numbered OSTs to <literal>pool1</literal> on file system
- <literal>lustre</literal>, run a single command (<literal>pool_add</literal>) to add many
- OSTs to the pool at one time:</para>
- <para><screen>lctl pool_add lustre.pool1 OST[0-10/2]</screen></para>
+ <para>If the leading
+ <literal>
+ <replaceable>fsname</replaceable>
+ </literal> and/or ending
+ <literal>_UUID</literal> are missing, they are automatically added.</para>
+ <para>For example, to add even-numbered OSTs to
+ <literal>pool1</literal> on file system
+ <literal>lustre</literal>, run a single command (
+ <literal>pool_add</literal>) to add many OSTs to the pool at one
+ time:</para>
+ <para>
+ <screen>
+lctl pool_add lustre.pool1 OST[0-10/2]
+</screen>
+ </para>
<note>
- <para>Each time an OST is added to a pool, a new <literal>llog</literal> configuration record is created. For convenience, you can run a single command.</para>
+ <para>Each time an OST is added to a pool, a new
+ <literal>llog</literal> configuration record is created. For
+ convenience, you can run a single command.</para>
</note>
<para>To remove a named OST from a pool, run:</para>
- <screen>mgs# lctl pool_remove <replaceable>fsname</replaceable>.<replaceable>poolname</replaceable> <replaceable>ost_list</replaceable></screen>
+ <screen>
+mgs# lctl pool_remove
+<replaceable>fsname</replaceable>.
+<replaceable>poolname</replaceable>
+<replaceable>ost_list</replaceable>
+</screen>
<para>To destroy a pool, run:</para>
- <screen>mgs# lctl pool_destroy <replaceable>fsname</replaceable>.<replaceable>poolname</replaceable></screen>
+ <screen>
+mgs# lctl pool_destroy
+<replaceable>fsname</replaceable>.
+<replaceable>poolname</replaceable>
+</screen>
<note>
- <para>All OSTs must be removed from a pool before it can be destroyed.</para>
+ <para>All OSTs must be removed from a pool before it can be
+ destroyed.</para>
</note>
<para>To list pools in the named file system, run:</para>
- <screen>mgs# lctl pool_list <replaceable>fsname|pathname</replaceable></screen>
+ <screen>
+mgs# lctl pool_list
+<replaceable>fsname|pathname</replaceable>
+</screen>
<para>To list OSTs in a named pool, run:</para>
- <screen>lctl pool_list <replaceable>fsname</replaceable>.<replaceable>poolname</replaceable></screen>
+ <screen>
+lctl pool_list
+<replaceable>fsname</replaceable>.
+<replaceable>poolname</replaceable>
+</screen>
<section remap="h4">
<title>Using the lfs Command with OST Pools</title>
- <para>Several lfs commands can be run with OST pools. Use the <literal>lfs setstripe</literal> command to associate a directory with an OST pool. This causes all new regular files and directories in the directory to be created in the pool. The lfs command can be used to list pools in a file system and OSTs in a named pool.</para>
- <para>To associate a directory with a pool, so all new files and directories will be created in the pool, run:</para>
- <screen>client# lfs setstripe --pool|-p pool_name <replaceable>filename|dirname</replaceable> </screen>
+ <para>Several lfs commands can be run with OST pools. Use the
+ <literal>lfs setstripe</literal> command to associate a directory with
+ an OST pool. This causes all new regular files and directories in the
+ directory to be created in the pool. The lfs command can be used to
+ list pools in a file system and OSTs in a named pool.</para>
+ <para>To associate a directory with a pool, so all new files and
+ directories will be created in the pool, run:</para>
+ <screen>
+client# lfs setstripe --pool|-p pool_name
+<replaceable>filename|dirname</replaceable>
+</screen>
<para>To set striping patterns, run:</para>
- <screen>client# lfs setstripe [--size|-s stripe_size] [--offset|-o start_ost]
+ <screen>
+client# lfs setstripe [--size|-s stripe_size] [--offset|-o start_ost]
[--count|-c stripe_count] [--pool|-p pool_name]
- <replaceable>dir|filename</replaceable></screen>
+
+<replaceable>dir|filename</replaceable>
+</screen>
<note>
- <para>If you specify striping with an invalid pool name, because the pool does not exist or the pool name was mistyped, <literal>lfs setstripe</literal> returns an error. Run <literal>lfs pool_list</literal> to make sure the pool exists and the pool name is entered correctly.</para>
+ <para>If you specify striping with an invalid pool name, because the
+ pool does not exist or the pool name was mistyped,
+ <literal>lfs setstripe</literal> returns an error. Run
+ <literal>lfs pool_list</literal> to make sure the pool exists and the
+ pool name is entered correctly.</para>
</note>
<note>
- <para>The <literal>--pool</literal> option for lfs setstripe is compatible with other modifiers. For example, you can set striping on a directory to use an explicit starting index.</para>
+ <para>The
+ <literal>--pool</literal> option for lfs setstripe is compatible with
+ other modifiers. For example, you can set striping on a directory to
+ use an explicit starting index.</para>
</note>
</section>
</section>
<section remap="h3">
- <title><indexterm><primary>pools</primary><secondary>usage tips</secondary></indexterm>Tips for Using OST Pools</title>
+ <title>
+ <indexterm>
+ <primary>pools</primary>
+ <secondary>usage tips</secondary>
+ </indexterm>Tips for Using OST Pools</title>
<para>Here are several suggestions for using OST pools.</para>
<itemizedlist>
<listitem>
- <para>A directory or file can be given an extended attribute (EA), that restricts striping to a pool.</para>
+ <para>A directory or file can be given an extended attribute (EA),
+ that restricts striping to a pool.</para>
</listitem>
<listitem>
- <para>Pools can be used to group OSTs with the same technology or performance (slower or faster), or that are preferred for certain jobs. Examples are SATA OSTs versus SAS OSTs or remote OSTs versus local OSTs.</para>
+ <para>Pools can be used to group OSTs with the same technology or
+ performance (slower or faster), or that are preferred for certain
+ jobs. Examples are SATA OSTs versus SAS OSTs or remote OSTs versus
+ local OSTs.</para>
</listitem>
<listitem>
- <para>A file created in an OST pool tracks the pool by keeping the pool name in the file LOV EA.</para>
+ <para>A file created in an OST pool tracks the pool by keeping the
+ pool name in the file LOV EA.</para>
</listitem>
</itemizedlist>
</section>
</section>
<section xml:id="dbdoclet.50438211_11204">
- <title><indexterm><primary>I/O</primary><secondary>adding an OST</secondary></indexterm>Adding an OST to a Lustre File System</title>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>adding an OST</secondary>
+ </indexterm>Adding an OST to a Lustre File System</title>
<para>To add an OST to existing Lustre file system:</para>
<orderedlist>
<listitem>
<para>Add a new OST by passing on the following commands, run:</para>
- <screen>oss# mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda
+ <screen>
+oss# mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda
oss# mkdir -p /mnt/test/ost12
-oss# mount -t lustre /dev/sda /mnt/test/ost12</screen>
+oss# mount -t lustre /dev/sda /mnt/test/ost12
+</screen>
</listitem>
<listitem>
<para>Migrate the data (possibly).</para>
- <para>The file system is quite unbalanced when new empty OSTs are added. New file creations are automatically balanced. If this is a scratch file system or files are pruned at a regular interval, then no further work may be needed. Files existing prior to the expansion can be rebalanced with an in-place copy, which can be done with a simple script.</para>
- <para>The basic method is to copy existing files to a temporary file, then move the temp file over the old one. This should not be attempted with files which are currently being written to by users or applications. This operation redistributes the stripes over the entire set of OSTs.</para>
+ <para>The file system is quite unbalanced when new empty OSTs are
+ added. New file creations are automatically balanced. If this is a
+ scratch file system or files are pruned at a regular interval, then no
+ further work may be needed. Files existing prior to the expansion can
+ be rebalanced with an in-place copy, which can be done with a simple
+ script.</para>
+ <para>The basic method is to copy existing files to a temporary file,
+ then move the temp file over the old one. This should not be attempted
+ with files which are currently being written to by users or
+ applications. This operation redistributes the stripes over the entire
+ set of OSTs.</para>
<para>A very clever migration script would do the following:</para>
<itemizedlist>
<listitem>
- <para> Examine the current distribution of data.</para>
+ <para>Examine the current distribution of data.</para>
</listitem>
<listitem>
- <para> Calculate how much data should move from each full OST to the empty ones.</para>
+ <para>Calculate how much data should move from each full OST to the
+ empty ones.</para>
</listitem>
<listitem>
- <para> Search for files on a given full OST (using <literal>lfs getstripe</literal>).</para>
+ <para>Search for files on a given full OST (using
+ <literal>lfs getstripe</literal>).</para>
</listitem>
<listitem>
- <para> Force the new destination OST (using <literal>lfs setstripe</literal>).</para>
+ <para>Force the new destination OST (using
+ <literal>lfs setstripe</literal>).</para>
</listitem>
<listitem>
- <para> Copy only enough files to address the imbalance.</para>
+ <para>Copy only enough files to address the imbalance.</para>
</listitem>
</itemizedlist>
</listitem>
</orderedlist>
- <para>If a Lustre file system administrator wants to explore this approach further, per-OST
- disk-usage statistics can be found under
- <literal>/proc/fs/lustre/osc/*/rpc_stats</literal></para>
+ <para>If a Lustre file system administrator wants to explore this approach
+ further, per-OST disk-usage statistics can be found under
+ <literal>/proc/fs/lustre/osc/*/rpc_stats</literal></para>
</section>
<section xml:id="dbdoclet.50438211_80295">
- <title><indexterm><primary>I/O</primary><secondary>direct</secondary></indexterm>Performing Direct I/O</title>
- <para>The Lustre software supports the <literal>O_DIRECT</literal> flag to open.</para>
- <para>Applications using the <literal>read()</literal> and <literal>write()</literal> calls must supply buffers aligned on a page boundary (usually 4 K). If the alignment is not correct, the call returns <literal>-EINVAL</literal>. Direct I/O may help performance in cases where the client is doing a large amount of I/O and is CPU-bound (CPU utilization 100%).</para>
+ <title>
+ <indexterm>
+ <primary>I/O</primary>
+ <secondary>direct</secondary>
+ </indexterm>Performing Direct I/O</title>
+ <para>The Lustre software supports the
+ <literal>O_DIRECT</literal> flag to open.</para>
+ <para>Applications using the
+ <literal>read()</literal> and
+ <literal>write()</literal> calls must supply buffers aligned on a page
+ boundary (usually 4 K). If the alignment is not correct, the call returns
+ <literal>-EINVAL</literal>. Direct I/O may help performance in cases where
+ the client is doing a large amount of I/O and is CPU-bound (CPU utilization
+ 100%).</para>
<section remap="h3">
<title>Making File System Objects Immutable</title>
- <para>An immutable file or directory is one that cannot be modified, renamed or removed. To do this:</para>
- <screen>chattr +i <replaceable>file</replaceable></screen>
- <para>To remove this flag, use <literal>chattr -i</literal></para>
+ <para>An immutable file or directory is one that cannot be modified,
+ renamed or removed. To do this:</para>
+ <screen>
+chattr +i
+<replaceable>file</replaceable>
+</screen>
+ <para>To remove this flag, use
+ <literal>chattr -i</literal></para>
</section>
</section>
<section xml:id="dbdoclet.50438211_61024">
<title>Other I/O Options</title>
- <para>This section describes other I/O options, including checksums, and the ptlrpcd thread pool.</para>
+ <para>This section describes other I/O options, including checksums, and
+ the ptlrpcd thread pool.</para>
<section remap="h3">
<title>Lustre Checksums</title>
- <para>To guard against network data corruption, a Lustre client can perform two types of data checksums: in-memory (for data in client memory) and wire (for data sent over the network). For each checksum type, a 32-bit checksum of the data read or written on both the client and server is computed, to ensure that the data has not been corrupted in transit over the network. The <literal>ldiskfs</literal> backing file system does NOT do any persistent checksumming, so it does not detect corruption of data in the OST file system.</para>
- <para>The checksumming feature is enabled, by default, on individual client nodes. If the client or OST detects a checksum mismatch, then an error is logged in the syslog of the form:</para>
- <screen>LustreError: BAD WRITE CHECKSUM: changed in transit before arrival at OST: \
+ <para>To guard against network data corruption, a Lustre client can
+ perform two types of data checksums: in-memory (for data in client
+ memory) and wire (for data sent over the network). For each checksum
+ type, a 32-bit checksum of the data read or written on both the client
+ and server is computed, to ensure that the data has not been corrupted in
+ transit over the network. The
+ <literal>ldiskfs</literal> backing file system does NOT do any persistent
+ checksumming, so it does not detect corruption of data in the OST file
+ system.</para>
+ <para>The checksumming feature is enabled, by default, on individual
+ client nodes. If the client or OST detects a checksum mismatch, then an
+ error is logged in the syslog of the form:</para>
+ <screen>
+LustreError: BAD WRITE CHECKSUM: changed in transit before arrival at OST: \
from 192.168.1.1@tcp inum 8991479/2386814769 object 1127239/0 extent [10240\
-0-106495]</screen>
- <para>If this happens, the client will re-read or re-write the affected data up to five times to get a good copy of the data over the network. If it is still not possible, then an I/O error is returned to the application.</para>
+0-106495]
+</screen>
+ <para>If this happens, the client will re-read or re-write the affected
+ data up to five times to get a good copy of the data over the network. If
+ it is still not possible, then an I/O error is returned to the
+ application.</para>
<para>To enable both types of checksums (in-memory and wire), run:</para>
- <screen>lctl set_param llite.*.checksum_pages=1</screen>
- <para>To disable both types of checksums (in-memory and wire), run:</para>
- <screen>lctl set_param llite.*.checksum_pages=0</screen>
+ <screen>
+lctl set_param llite.*.checksum_pages=1
+</screen>
+ <para>To disable both types of checksums (in-memory and wire),
+ run:</para>
+ <screen>
+lctl set_param llite.*.checksum_pages=0
+</screen>
<para>To check the status of a wire checksum, run:</para>
- <screen>lctl get_param osc.*.checksums</screen>
+ <screen>
+lctl get_param osc.*.checksums
+</screen>
<section remap="h4">
<title>Changing Checksum Algorithms</title>
- <para>By default, the Lustre software uses the adler32 checksum algorithm, because it is
- robust and has a lower impact on performance than crc32. The Lustre file system
- administrator can change the checksum algorithm via <literal>lctl get_param</literal>,
- depending on what is supported in the kernel.</para>
- <para>To check which checksum algorithm is being used by the Lustre software, run:</para>
- <screen>$ lctl get_param osc.*.checksum_type</screen>
+ <para>By default, the Lustre software uses the adler32 checksum
+ algorithm, because it is robust and has a lower impact on performance
+ than crc32. The Lustre file system administrator can change the
+ checksum algorithm via
+ <literal>lctl get_param</literal>, depending on what is supported in
+ the kernel.</para>
+ <para>To check which checksum algorithm is being used by the Lustre
+ software, run:</para>
+ <screen>
+$ lctl get_param osc.*.checksum_type
+</screen>
<para>To change the wire checksum algorithm, run:</para>
- <screen>$ lctl set_param osc.*.checksum_type=<replaceable>algorithm</replaceable></screen>
+ <screen>
+$ lctl set_param osc.*.checksum_type=
+<replaceable>algorithm</replaceable>
+</screen>
<note>
- <para>The in-memory checksum always uses the adler32 algorithm, if available, and only falls back to crc32 if adler32 cannot be used.</para>
+ <para>The in-memory checksum always uses the adler32 algorithm, if
+ available, and only falls back to crc32 if adler32 cannot be
+ used.</para>
</note>
- <para>In the following example, the <literal>lctl get_param</literal> command is used to
- determine that the Lustre software is using the adler32 checksum algorithm. Then the
- <literal>lctl set_param</literal> command is used to change the checksum algorithm to
- crc32. A second <literal>lctl get_param</literal> command confirms that the crc32 checksum
- algorithm is now in use.</para>
- <screen>$ lctl get_param osc.*.checksum_type
+ <para>In the following example, the
+ <literal>lctl get_param</literal> command is used to determine that the
+ Lustre software is using the adler32 checksum algorithm. Then the
+ <literal>lctl set_param</literal> command is used to change the checksum
+ algorithm to crc32. A second
+ <literal>lctl get_param</literal> command confirms that the crc32
+ checksum algorithm is now in use.</para>
+ <screen>
+$ lctl get_param osc.*.checksum_type
osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32 [adler]
$ lctl set_param osc.*.checksum_type=crc32
osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32
$ lctl get_param osc.*.checksum_type
-osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=[crc32] adler</screen>
+osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=[crc32] adler
+</screen>
</section>
-</section>
-<section remap="h3">
-<title>Ptlrpc Thread Pool </title>
-<para>Releases prior to Lustre software release 2.2 used two portal RPC daemons for each
- client/server pair. One daemon handled all synchronous IO requests, and the second daemon
- handled all asynchronous (non-IO) RPCs. The increasing use of large SMP nodes for Lustre
- servers exposed some scaling issues. The lack of threads for large SMP nodes resulted in
- cases where a single CPU would be 100% utilized and other CPUs would be relativity idle.
- This is especially noticeable when a single client traverses a large directory. </para>
-<para>Lustre software release 2.2.x implements a ptlrpc thread pool, so that multiple threads can be
- created to serve asynchronous RPC requests. The number of threads spawned is controlled at
- module load time using module options. By default one thread is spawned per CPU, with a
- minimum of 2 threads spawned irrespective of module options. </para>
-<para>One of the issues with thread operations is the cost of moving a thread context from one CPU to another with the resulting loss of CPU cache warmth. To reduce this cost, ptlrpc threads can be bound to a CPU. However, if the CPUs are busy, a bound thread may not be able to respond quickly, as the bound CPU may be busy with other tasks and the thread must wait to schedule. </para>
- <para>Because of these considerations, the pool of ptlrpc threads can be a mixture of bound and unbound threads. The system operator can balance the thread mixture based on system size and workload. </para>
-<section>
+ </section>
+ <section remap="h3">
+ <title>Ptlrpc Thread Pool</title>
+ <para>Releases prior to Lustre software release 2.2 used two portal RPC
+ daemons for each client/server pair. One daemon handled all synchronous
+ IO requests, and the second daemon handled all asynchronous (non-IO)
+ RPCs. The increasing use of large SMP nodes for Lustre servers exposed
+ some scaling issues. The lack of threads for large SMP nodes resulted in
+ cases where a single CPU would be 100% utilized and other CPUs would be
+ relativity idle. This is especially noticeable when a single client
+ traverses a large directory.</para>
+ <para>Lustre software release 2.2.x implements a ptlrpc thread pool, so
+ that multiple threads can be created to serve asynchronous RPC requests.
+ The number of threads spawned is controlled at module load time using
+ module options. By default one thread is spawned per CPU, with a minimum
+ of 2 threads spawned irrespective of module options.</para>
+ <para>One of the issues with thread operations is the cost of moving a
+ thread context from one CPU to another with the resulting loss of CPU
+ cache warmth. To reduce this cost, ptlrpc threads can be bound to a CPU.
+ However, if the CPUs are busy, a bound thread may not be able to respond
+ quickly, as the bound CPU may be busy with other tasks and the thread
+ must wait to schedule.</para>
+ <para>Because of these considerations, the pool of ptlrpc threads can be
+ a mixture of bound and unbound threads. The system operator can balance
+ the thread mixture based on system size and workload.</para>
+ <section>
<title>ptlrpcd parameters</title>
- <para>These parameters should be set in <literal>/etc/modprobe.conf </literal> or in the <literal> etc/modprobe.d</literal> directory, as options for the ptlrpc module. <screen>options ptlrpcd max_ptlrpcds=XXX
+ <para>These parameters should be set in
+ <literal>/etc/modprobe.conf</literal> or in the
+ <literal>etc/modprobe.d</literal> directory, as options for the ptlrpc
+ module.
+ <screen>
+options ptlrpcd max_ptlrpcds=XXX
</screen></para>
-<para>Sets the number of ptlrpcd threads created at module load time. The default if not specified is one thread per CPU, including hyper-threaded CPUs. The lower bound is 2 (old prlrpcd behaviour) <screen>options ptlrpcd ptlrpcd_bind_policy=[1-4]
+ <para>Sets the number of ptlrpcd threads created at module load time.
+ The default if not specified is one thread per CPU, including
+ hyper-threaded CPUs. The lower bound is 2 (old prlrpcd behaviour)
+ <screen>
+options ptlrpcd ptlrpcd_bind_policy=[1-4]
</screen></para>
-<para> Controls the binding of threads to CPUs. There are four policy options. </para>
- <itemizedlist>
- <listitem>
- <para><literal role="bold">PDB_POLICY_NONE </literal>(ptlrpcd_bind_policy=1) All threads are unbound.</para>
- </listitem>
- <listitem>
- <para><literal role="bold">PDB_POLICY_FULL </literal>(ptlrpcd_bind_policy=2) All threads attempt to bind to a CPU. </para>
- </listitem>
- <listitem>
- <para><literal role="bold">PDB_POLICY_PAIR </literal>(ptlrpcd_bind_policy=3) This is the default policy. Threads are allocated as a bound/unbound pair. Each thread (bound or free) has a partner thread. The partnering is used by the ptlrpcd load policy, which determines how threads are allocated to CPUs. </para>
- </listitem>
- <listitem>
- <para><literal role="bold">PDB_POLICY_NEIGHBOR </literal>(ptlrpcd_bind_policy=4) Threads are allocated as a bound/unbound pair. Each thread (bound or free) has two partner threads. </para>
- </listitem>
- </itemizedlist>
-
-</section>
-</section>
+ <para>Controls the binding of threads to CPUs. There are four policy
+ options.</para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal role="bold">
+ PDB_POLICY_NONE</literal>(ptlrpcd_bind_policy=1) All threads are
+ unbound.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal role="bold">
+ PDB_POLICY_FULL</literal>(ptlrpcd_bind_policy=2) All threads
+ attempt to bind to a CPU.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal role="bold">
+ PDB_POLICY_PAIR</literal>(ptlrpcd_bind_policy=3) This is the
+ default policy. Threads are allocated as a bound/unbound pair. Each
+ thread (bound or free) has a partner thread. The partnering is used
+ by the ptlrpcd load policy, which determines how threads are
+ allocated to CPUs.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal role="bold">
+ PDB_POLICY_NEIGHBOR</literal>(ptlrpcd_bind_policy=4) Threads are
+ allocated as a bound/unbound pair. Each thread (bound or free) has
+ two partner threads.</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+ </section>
</section>
</chapter>
-<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="understandingfailover">
- <title xml:id="understandingfailover.title">Understanding Failover in a Lustre File System</title>
- <para>This chapter describes failover in a Lustre file system. It includes:</para>
+<?xml version='1.0' encoding='utf-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+xml:id="understandingfailover">
+ <title xml:id="understandingfailover.title">Understanding Failover in a
+ Lustre File System</title>
+ <para>This chapter describes failover in a Lustre file system. It
+ includes:</para>
<itemizedlist>
<listitem>
<para>
- <xref linkend="dbdoclet.50540653_59957"/>
- </para>
+ <xref linkend="dbdoclet.50540653_59957" />
+ </para>
</listitem>
<listitem>
<para>
- <xref linkend="dbdoclet.50540653_97944"/>
- </para>
+ <xref linkend="dbdoclet.50540653_97944" />
+ </para>
</listitem>
</itemizedlist>
<section xml:id="dbdoclet.50540653_59957">
- <title><indexterm><primary>failover</primary></indexterm>
- What is Failover?</title>
- <para>In a high-availability (HA) system, unscheduled downtime is minimized by using redundant
- hardware and software components and software components that automate recovery when a failure
- occurs. If a failure condition occurs, such as the loss of a server or storage device or a
- network or software fault, the system's services continue with minimal interruption.
- Generally, availability is specified as the percentage of time the system is required to be
- available.</para>
- <para>Availability is accomplished by replicating hardware and/or software so that when a
- primary server fails or is unavailable, a standby server can be switched into its place to run
- applications and associated resources. This process, called <emphasis role="italic"
- >failover</emphasis>, is automatic in an HA system and, in most cases, completely
- application-transparent.</para>
- <para>A failover hardware setup requires a pair of servers with a shared resource (typically a
- physical storage device, which may be based on SAN, NAS, hardware RAID, SCSI or Fibre Channel
- (FC) technology). The method of sharing storage should be essentially transparent at the
- device level; the same physical logical unit number (LUN) should be visible from both servers.
- To ensure high availability at the physical storage level, we encourage the use of RAID arrays
- to protect against drive-level failures.</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ </indexterm>What is Failover?</title>
+ <para>In a high-availability (HA) system, unscheduled downtime is minimized
+ by using redundant hardware and software components and software components
+ that automate recovery when a failure occurs. If a failure condition
+ occurs, such as the loss of a server or storage device or a network or
+ software fault, the system's services continue with minimal interruption.
+ Generally, availability is specified as the percentage of time the system
+ is required to be available.</para>
+ <para>Availability is accomplished by replicating hardware and/or software
+ so that when a primary server fails or is unavailable, a standby server can
+ be switched into its place to run applications and associated resources.
+ This process, called
+ <emphasis role="italic">failover</emphasis>, is automatic in an HA system
+ and, in most cases, completely application-transparent.</para>
+ <para>A failover hardware setup requires a pair of servers with a shared
+ resource (typically a physical storage device, which may be based on SAN,
+ NAS, hardware RAID, SCSI or Fibre Channel (FC) technology). The method of
+ sharing storage should be essentially transparent at the device level; the
+ same physical logical unit number (LUN) should be visible from both
+ servers. To ensure high availability at the physical storage level, we
+ encourage the use of RAID arrays to protect against drive-level
+ failures.</para>
<note>
- <para>The Lustre software does not provide redundancy for data; it depends exclusively on
- redundancy of backing storage devices. The backing OST storage should be RAID 5 or,
- preferably, RAID 6 storage. MDT storage should be RAID 1 or RAID 10.</para>
+ <para>The Lustre software does not provide redundancy for data; it
+ depends exclusively on redundancy of backing storage devices. The backing
+ OST storage should be RAID 5 or, preferably, RAID 6 storage. MDT storage
+ should be RAID 1 or RAID 10.</para>
</note>
<section remap="h3">
- <title><indexterm><primary>failover</primary><secondary>capabilities</secondary></indexterm>Failover Capabilities</title>
- <para>To establish a highly-available Lustre file system, power management software or hardware and high availability (HA) software are used to provide the following failover capabilities:</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ <secondary>capabilities</secondary>
+ </indexterm>Failover Capabilities</title>
+ <para>To establish a highly-available Lustre file system, power
+ management software or hardware and high availability (HA) software are
+ used to provide the following failover capabilities:</para>
<itemizedlist>
<listitem>
- <para><emphasis role="bold">Resource fencing</emphasis> - Protects physical storage from simultaneous access by two nodes.</para>
+ <para>
+ <emphasis role="bold">Resource fencing</emphasis>- Protects physical
+ storage from simultaneous access by two nodes.</para>
</listitem>
<listitem>
- <para><emphasis role="bold">Resource management</emphasis> - Starts and stops the Lustre resources as a part of failover, maintains the cluster state, and carries out other resource management tasks.</para>
+ <para>
+ <emphasis role="bold">Resource management</emphasis>- Starts and
+ stops the Lustre resources as a part of failover, maintains the
+ cluster state, and carries out other resource management
+ tasks.</para>
</listitem>
<listitem>
- <para><emphasis role="bold">Health monitoring</emphasis> - Verifies the availability of
- hardware and network resources and responds to health indications provided by the Lustre
- software.</para>
+ <para>
+ <emphasis role="bold">Health monitoring</emphasis>- Verifies the
+ availability of hardware and network resources and responds to health
+ indications provided by the Lustre software.</para>
</listitem>
</itemizedlist>
- <para>These capabilities can be provided by a variety of software and/or hardware solutions.
- For more information about using power management software or hardware and high availability
- (HA) software with a Lustre file system, see <xref linkend="configuringfailover"/>.</para>
- <para>HA software is responsible for detecting failure of the primary Lustre server node and
- controlling the failover.The Lustre software works with any HA software that includes
- resource (I/O) fencing. For proper resource fencing, the HA software must be able to
- completely power off the failed server or disconnect it from the shared storage device. If
- two active nodes have access to the same storage device, data may be severely
- corrupted.</para>
+ <para>These capabilities can be provided by a variety of software and/or
+ hardware solutions. For more information about using power management
+ software or hardware and high availability (HA) software with a Lustre
+ file system, see
+ <xref linkend="configuringfailover" />.</para>
+ <para>HA software is responsible for detecting failure of the primary
+ Lustre server node and controlling the failover.The Lustre software works
+ with any HA software that includes resource (I/O) fencing. For proper
+ resource fencing, the HA software must be able to completely power off
+ the failed server or disconnect it from the shared storage device. If two
+ active nodes have access to the same storage device, data may be severely
+ corrupted.</para>
</section>
<section remap="h3">
- <title><indexterm><primary>failover</primary><secondary>configuration</secondary></indexterm>Types of Failover Configurations</title>
- <para>Nodes in a cluster can be configured for failover in several ways. They are often configured in pairs (for example, two OSTs attached to a shared storage device), but other failover configurations are also possible. Failover configurations include:</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ <secondary>configuration</secondary>
+ </indexterm>Types of Failover Configurations</title>
+ <para>Nodes in a cluster can be configured for failover in several ways.
+ They are often configured in pairs (for example, two OSTs attached to a
+ shared storage device), but other failover configurations are also
+ possible. Failover configurations include:</para>
<itemizedlist>
<listitem>
- <para><emphasis role="bold">Active/passive</emphasis> pair - In this configuration, the active node provides resources and serves data, while the passive node is usually standing by idle. If the active node fails, the passive node takes over and becomes active.</para>
+ <para>
+ <emphasis role="bold">Active/passive</emphasis>pair - In this
+ configuration, the active node provides resources and serves data,
+ while the passive node is usually standing by idle. If the active
+ node fails, the passive node takes over and becomes active.</para>
</listitem>
<listitem>
- <para><emphasis role="bold">Active/active</emphasis> pair - In this configuration, both nodes are active, each providing a subset of resources. In case of a failure, the second node takes over resources from the failed node.</para>
+ <para>
+ <emphasis role="bold">Active/active</emphasis>pair - In this
+ configuration, both nodes are active, each providing a subset of
+ resources. In case of a failure, the second node takes over resources
+ from the failed node.</para>
</listitem>
</itemizedlist>
- <para>In Lustre software releases previous to Lustre software release 2.4, MDSs can be
- configured as an active/passive pair, while OSSs can be deployed in an active/active
- configuration that provides redundancy without extra overhead. Often the standby MDS is the
- active MDS for another Lustre file system or the MGS, so no nodes are idle in the
- cluster.</para>
- <para condition="l24">Lustre software release 2.4 introduces metadata targets for individual
- sub-directories. Active-active failover configurations are available for MDSs that serve
- MDTs on shared storage.</para>
+ <para>In Lustre software releases previous to Lustre software release
+ 2.4, MDSs can be configured as an active/passive pair, while OSSs can be
+ deployed in an active/active configuration that provides redundancy
+ without extra overhead. Often the standby MDS is the active MDS for
+ another Lustre file system or the MGS, so no nodes are idle in the
+ cluster.</para>
+ <para condition="l24">Lustre software release 2.4 introduces metadata
+ targets for individual sub-directories. Active-active failover
+ configurations are available for MDSs that serve MDTs on shared
+ storage.</para>
</section>
</section>
<section xml:id="dbdoclet.50540653_97944">
- <title><indexterm>
- <primary>failover</primary>
- <secondary>and Lustre</secondary>
- </indexterm>Failover Functionality in a Lustre File System</title>
- <para>The failover functionality provided by the Lustre software can be used for the following
- failover scenario. When a client attempts to do I/O to a failed Lustre target, it continues to
- try until it receives an answer from any of the configured failover nodes for the Lustre
- target. A user-space application does not detect anything unusual, except that the I/O may
- take longer to complete.</para>
- <para>Failover in a Lustre file system requires that two nodes be configured as a failover pair,
- which must share one or more storage devices. A Lustre file system can be configured to
- provide MDT or OST failover.</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ <secondary>and Lustre</secondary>
+ </indexterm>Failover Functionality in a Lustre File System</title>
+ <para>The failover functionality provided by the Lustre software can be
+ used for the following failover scenario. When a client attempts to do I/O
+ to a failed Lustre target, it continues to try until it receives an answer
+ from any of the configured failover nodes for the Lustre target. A
+ user-space application does not detect anything unusual, except that the
+ I/O may take longer to complete.</para>
+ <para>Failover in a Lustre file system requires that two nodes be
+ configured as a failover pair, which must share one or more storage
+ devices. A Lustre file system can be configured to provide MDT or OST
+ failover.</para>
<itemizedlist>
<listitem>
- <para>For MDT failover, two MDSs can be configured to serve the same MDT. Only one MDS node
- can serve an MDT at a time.</para>
- <para condition="l24">Lustresoftware release 2.4 allows multiple MDTs. By placing two or
- more MDT partitions on storage shared by two MDSs, one MDS can fail and the remaining MDS
- can begin serving the unserved MDT. This is described as an active/active failover
- pair.</para>
+ <para>For MDT failover, two MDSs can be configured to serve the same
+ MDT. Only one MDS node can serve an MDT at a time.</para>
+ <para condition="l24">Lustresoftware release 2.4 allows multiple MDTs.
+ By placing two or more MDT partitions on storage shared by two MDSs,
+ one MDS can fail and the remaining MDS can begin serving the unserved
+ MDT. This is described as an active/active failover pair.</para>
</listitem>
<listitem>
- <para>For OST failover, multiple OSS nodes can be configured to be able to serve the same
- OST. However, only one OSS node can serve the OST at a time. An OST can be moved between
- OSS nodes that have access to the same storage device using
- <literal>umount/mount</literal> commands.</para>
+ <para>For OST failover, multiple OSS nodes can be configured to be able
+ to serve the same OST. However, only one OSS node can serve the OST at
+ a time. An OST can be moved between OSS nodes that have access to the
+ same storage device using
+ <literal>umount/mount</literal> commands.</para>
</listitem>
</itemizedlist>
- <para>The <literal>--servicenode</literal> option is used to set up nodes in a Lustre file
- system for failover at creation time (using <literal>mkfs.lustre</literal>) or later when the
- Lustre file system is active (using <literal>tunefs.lustre</literal>). For explanations of
- these utilities, see <xref linkend="dbdoclet.50438219_75432"/> and <xref
- linkend="dbdoclet.50438219_39574"/>.</para>
- <para>Failover capability in a Lustre file system can be used to upgrade the Lustre software
- between successive minor versions without cluster downtime. For more information, see <xref
- linkend="upgradinglustre"/>.</para>
- <para>For information about configuring failover, see <xref linkend="configuringfailover"/>.</para>
+ <para>The
+ <literal>--servicenode</literal> option is used to set up nodes in a Lustre
+ file system for failover at creation time (using
+ <literal>mkfs.lustre</literal>) or later when the Lustre file system is
+ active (using
+ <literal>tunefs.lustre</literal>). For explanations of these utilities, see
+
+ <xref linkend="dbdoclet.50438219_75432" />and
+ <xref linkend="dbdoclet.50438219_39574" />.</para>
+ <para>Failover capability in a Lustre file system can be used to upgrade
+ the Lustre software between successive minor versions without cluster
+ downtime. For more information, see
+ <xref linkend="upgradinglustre" />.</para>
+ <para>For information about configuring failover, see
+ <xref linkend="configuringfailover" />.</para>
<note>
- <para>The Lustre software provides failover functionality only at the file system level. In a
- complete failover solution, failover functionality for system-level components, such as node
- failure detection or power control, must be provided by a third-party tool.</para>
+ <para>The Lustre software provides failover functionality only at the
+ file system level. In a complete failover solution, failover
+ functionality for system-level components, such as node failure detection
+ or power control, must be provided by a third-party tool.</para>
</note>
<caution>
- <para>OST failover functionality does not protect against corruption caused by a disk failure.
- If the storage media (i.e., physical disk) used for an OST fails, it cannot be recovered by
- functionality provided in the Lustre software. We strongly recommend that some form of RAID
- be used for OSTs. Lustre functionality assumes that the storage is reliable, so it adds no
- extra reliability features.</para>
+ <para>OST failover functionality does not protect against corruption
+ caused by a disk failure. If the storage media (i.e., physical disk) used
+ for an OST fails, it cannot be recovered by functionality provided in the
+ Lustre software. We strongly recommend that some form of RAID be used for
+ OSTs. Lustre functionality assumes that the storage is reliable, so it
+ adds no extra reliability features.</para>
</caution>
<section remap="h3">
- <title><indexterm><primary>failover</primary><secondary>MDT</secondary></indexterm>MDT Failover Configuration (Active/Passive)</title>
- <para>Two MDSs are typically configured as an active/passive failover pair as shown in <xref linkend="understandingfailover.fig.configmdt"/>. Note that both nodes must have access to shared storage for the MDT(s) and the MGS. The primary (active) MDS manages the Lustre system metadata resources. If the primary MDS fails, the secondary (passive) MDS takes over these resources and serves the MDTs and the MGS.</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ <secondary>MDT</secondary>
+ </indexterm>MDT Failover Configuration (Active/Passive)</title>
+ <para>Two MDSs are typically configured as an active/passive failover
+ pair as shown in
+ <xref linkend="understandingfailover.fig.configmdt" />. Note that both
+ nodes must have access to shared storage for the MDT(s) and the MGS. The
+ primary (active) MDS manages the Lustre system metadata resources. If the
+ primary MDS fails, the secondary (passive) MDS takes over these resources
+ and serves the MDTs and the MGS.</para>
<note>
- <para>In an environment with multiple file systems, the MDSs can be configured in a quasi active/active configuration, with each MDS managing metadata for a subset of the Lustre file system.</para>
+ <para>In an environment with multiple file systems, the MDSs can be
+ configured in a quasi active/active configuration, with each MDS
+ managing metadata for a subset of the Lustre file system.</para>
</note>
<figure>
- <title xml:id="understandingfailover.fig.configmdt"> Lustre failover configuration for a active/passive MDT</title>
+ <title xml:id="understandingfailover.fig.configmdt">Lustre failover
+ configuration for a active/passive MDT</title>
<mediaobject>
<imageobject>
- <imagedata fileref="./figures/MDT_Failover.png"/>
+ <imagedata fileref="./figures/MDT_Failover.png" />
</imageobject>
<textobject>
- <phrase>Lustre failover configuration for an MDT </phrase>
+ <phrase>Lustre failover configuration for an MDT</phrase>
</textobject>
</mediaobject>
</figure>
</section>
<section xml:id='dbdoclet.mdtactiveactive' condition='l24'>
- <title><indexterm><primary>failover</primary><secondary>MDT</secondary></indexterm>MDT Failover Configuration (Active/Active)</title>
- <para>Multiple MDTs became available with the advent of Lustre software release 2.4. MDTs can
- be setup as an active/active failover configuration. A failover cluster is built from two
- MDSs as shown in <xref linkend="understandingfailover.fig.configmdts"/>.</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ <secondary>MDT</secondary>
+ </indexterm>MDT Failover Configuration (Active/Active)</title>
+ <para>Multiple MDTs became available with the advent of Lustre software
+ release 2.4. MDTs can be setup as an active/active failover
+ configuration. A failover cluster is built from two MDSs as shown in
+ <xref linkend="understandingfailover.fig.configmdts" />.</para>
<figure>
- <title xml:id="understandingfailover.fig.configmdts"> Lustre failover configuration for a active/active MDTs </title>
+ <title xml:id="understandingfailover.fig.configmdts">Lustre failover
+ configuration for a active/active MDTs</title>
<mediaobject>
<imageobject>
- <imagedata scalefit="1" width="50%" fileref="figures/MDTs_Failover.png"/>
+ <imagedata scalefit="1" width="50%"
+ fileref="figures/MDTs_Failover.png" />
</imageobject>
<textobject>
<phrase>Lustre failover configuration for two MDTs</phrase>
</figure>
</section>
<section remap="h3">
- <title><indexterm><primary>failover</primary><secondary>OST</secondary></indexterm>OST Failover Configuration (Active/Active)</title>
- <para>OSTs are usually configured in a load-balanced, active/active failover configuration. A failover cluster is built from two OSSs as shown in <xref linkend="understandingfailover.fig.configost"/>.</para>
+ <title>
+ <indexterm>
+ <primary>failover</primary>
+ <secondary>OST</secondary>
+ </indexterm>OST Failover Configuration (Active/Active)</title>
+ <para>OSTs are usually configured in a load-balanced, active/active
+ failover configuration. A failover cluster is built from two OSSs as
+ shown in
+ <xref linkend="understandingfailover.fig.configost" />.</para>
<note>
- <para>OSSs configured as a failover pair must have shared disks/RAID.</para>
+ <para>OSSs configured as a failover pair must have shared
+ disks/RAID.</para>
</note>
<figure>
- <title xml:id="understandingfailover.fig.configost"> Lustre failover configuration for an OSTs </title>
+ <title xml:id="understandingfailover.fig.configost">Lustre failover
+ configuration for an OSTs</title>
<mediaobject>
<imageobject>
- <imagedata scalefit="1" width="100%" fileref="./figures/OST_Failover.png"/>
+ <imagedata scalefit="1" width="100%"
+ fileref="./figures/OST_Failover.png" />
</imageobject>
<textobject>
- <phrase>Lustre failover configuration for an OSTs </phrase>
+ <phrase>Lustre failover configuration for an OSTs</phrase>
</textobject>
</mediaobject>
</figure>
- <para>In an active configuration, 50% of the available OSTs are assigned to one OSS and the remaining OSTs are assigned to the other OSS. Each OSS serves as the primary node for half the OSTs and as a failover node for the remaining OSTs.</para>
- <para>In this mode, if one OSS fails, the other OSS takes over all of the failed OSTs. The clients attempt to connect to each OSS serving the OST, until one of them responds. Data on the OST is written synchronously, and the clients replay transactions that were in progress and uncommitted to disk before the OST failure.</para>
- <para>For more information about configuring failover, see <xref linkend="configuringfailover"
- />.</para>
+ <para>In an active configuration, 50% of the available OSTs are assigned
+ to one OSS and the remaining OSTs are assigned to the other OSS. Each OSS
+ serves as the primary node for half the OSTs and as a failover node for
+ the remaining OSTs.</para>
+ <para>In this mode, if one OSS fails, the other OSS takes over all of the
+ failed OSTs. The clients attempt to connect to each OSS serving the OST,
+ until one of them responds. Data on the OST is written synchronously, and
+ the clients replay transactions that were in progress and uncommitted to
+ disk before the OST failure.</para>
+ <para>For more information about configuring failover, see
+ <xref linkend="configuringfailover" />.</para>
</section>
</section>
</chapter>