-<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="managingstripingfreespace">
+<?xml version='1.0' encoding='UTF-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="managingstripingfreespace">
<title xml:id="managingstripingfreespace.title">Managing File Layout (Striping) and Free
Space</title>
<para>This chapter describes file layout (striping) and I/O options, and includes the following
sections:</para>
<itemizedlist>
<listitem>
- <para><xref linkend="dbdoclet.50438209_79324"/></para>
+ <para><xref linkend="file_striping.how_it_works"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438209_48033"/></para>
+ <para><xref linkend="file_striping.considerations"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438209_78664"/></para>
+ <para><xref linkend="file_striping.lfs_setstripe"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438209_44776"/></para>
+ <para><xref linkend="file_striping.lfs_getstripe"/></para>
</listitem>
<listitem>
- <para><xref linkend="dbdoclet.50438209_10424"/></para>
+ <para><xref linkend="file_striping.managing_free_space"/></para>
</listitem>
<listitem>
<para><xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="wide_striping"/></para>
</listitem>
</itemizedlist>
- <section xml:id="dbdoclet.50438209_79324">
+ <section xml:id="file_striping.how_it_works">
<title>
<indexterm>
<primary>space</primary>
default), the MDS then uses weighted random allocations with a preference for allocating
objects on OSTs with more free space. (This can reduce I/O performance until space usage is
rebalanced again.) For a more detailed description of how striping is allocated, see <xref
- linkend="dbdoclet.50438209_10424"/>.</para>
+ linkend="file_striping.managing_free_space"/>.</para>
<para>Files can only be striped over a finite number of OSTs, based on the
maximum size of the attributes that can be stored on the MDT. If the MDT
is ldiskfs-based without the <literal>ea_inode</literal> feature, a file
can be striped across at most 160 OSTs. With a ZFS-based MDT, or if the
- <literal>ea_inode</literal> feature is enabled for an ldiskfs-based MDT,
+ <literal>ea_inode</literal> feature is enabled for an ldiskfs-based MDT
+ (the default since Lustre 2.13.0),
a file can be striped across up to 2000 OSTs. For more information, see
<xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="wide_striping"/>.
</para>
</section>
- <section xml:id="dbdoclet.50438209_48033">
+ <section xml:id="file_striping.considerations">
<title><indexterm>
<primary>file layout</primary>
<secondary>See striping</secondary>
<para>In cases like these, a file can be striped over as many OSSs as it takes to achieve
the required peak aggregate bandwidth for that file. Striping across a larger number of
OSSs should only be used when the file size is very large and/or is accessed by many nodes
- at a time. Currently, Lustre files can be striped across up to 2000 OSTs, the maximum
- stripe count for an <literal>ldiskfs</literal> file system.</para>
+ at a time. Currently, Lustre files can be striped across up to 2000 OSTs</para>
</listitem>
<listitem>
<para><emphasis role="bold">Improving performance when OSS bandwidth is exceeded.</emphasis>
the I/O rate of the clients/jobs divided by the performance per OSS.</para>
</listitem>
<listitem>
+ <para condition="l2D"><emphasis role="bold">Matching stripes to I/O
+ pattern.</emphasis>When writing to a single file from multiple nodes,
+ having more than one client writing to a stripe can lead to issues
+ with lock exchange, where clients contend over writing to that stripe,
+ even if their I/Os do not overlap. This can be avoided if I/O can be
+ stripe aligned so that each stripe is accessed by only one client.
+ Since Lustre 2.13, the 'overstriping' feature is available, allowing more
+ than stripe per OST. This is particularly helpful for the case where
+ thread count exceeds OST count, making it possible to match stripe count
+ to thread count even in this case.</para>
+ </listitem>
+ <listitem>
<para><emphasis role="bold">Providing space for very large files.</emphasis> Striping is
useful when a single OST does not have enough free space to hold the entire file.</para>
</listitem>
</itemizedlist>
</section>
</section>
- <section xml:id="dbdoclet.50438209_78664">
+ <section xml:id="file_striping.lfs_setstripe">
<title><indexterm>
<primary>striping</primary>
<secondary>configuration</secondary>
</indexterm>Setting the File Layout/Striping Configuration (<literal>lfs
setstripe</literal>)</title>
<para>Use the <literal>lfs setstripe</literal> command to create new files with a specific file layout (stripe pattern) configuration.</para>
- <screen>lfs setstripe [--size|-s stripe_size] [--count|-c stripe_count] \
+ <screen>lfs setstripe [--size|-s stripe_size] [--stripe-count|-c stripe_count] [--overstripe-count|-C stripe_count] \
[--index|-i start_ost] [--pool|-p pool_name] <replaceable>filename|dirname</replaceable> </screen>
<para><emphasis role="bold">
<literal>stripe_size</literal>
<literal>stripe_size</literal> of 0 causes the default stripe size to be used. Otherwise,
the <literal>stripe_size</literal> value must be a multiple of 64 KB.</para>
<para><emphasis role="bold">
- <literal>stripe_count</literal>
+ <literal>stripe_count (--stripe-count, --overstripe-count)</literal>
</emphasis>
- </para>
- <para>The <literal>stripe_count</literal> indicates how many OSTs to use. The default <literal>stripe_count</literal> value is 1. Setting <literal>stripe_count</literal> to 0 causes the default stripe count to be used. Setting <literal>stripe_count</literal> to -1 means stripe over all available OSTs (full OSTs are skipped).</para>
+ </para>
+ <para>The <literal>stripe_count</literal> indicates how many stripes to use.
+ The default <literal>stripe_count</literal> value is 1. Setting
+ <literal>stripe_count</literal> to 0 causes the default stripe count to be
+ used. Setting <literal>stripe_count</literal> to -1 means stripe over all
+ available OSTs (full OSTs are skipped). When --overstripe-count is used,
+ per OST if necessary.</para>
<para><emphasis role="bold">
<literal>start_ost</literal>
</emphasis>
<literal>pool_name</literal>
</emphasis>
</para>
- <para>The <literal>pool_name</literal> specifies the OST pool to which the file will be written.
- This allows limiting the OSTs used to a subset of all OSTs in the file system. For more
- details about using OST pools, see <link xl:href="ManagingFileSystemIO.html#50438211_75549"
- >Creating and Managing OST Pools</link>.</para>
+ <para>The <literal>pool_name</literal> specifies the OST pool to which the
+ file will be written. This allows limiting the OSTs used to a subset of
+ all OSTs in the file system. For more details about using OST pools, see
+ <link xl:href="managingfilesystemio.managing_ost_pools">
+ Creating and Managing OST Pools
+ </link>.</para>
<section remap="h3">
<title>Specifying a File Layout (Striping Pattern) for a Single File</title>
<para>It is possible to specify the file layout when a new file is created using the command <literal>lfs setstripe</literal>. This allows users to override the file system default parameters to tune the file layout more optimally for their application. Execution of an <literal>lfs setstripe</literal> command fails if the file already exists.</para>
- <section xml:id="dbdoclet.50438209_60155">
+ <section xml:id="file_striping.stripe_size">
<title>Setting the Stripe Size</title>
<para>The command to create a new file with a specified stripe size is similar to:</para>
<screen>[client]# lfs setstripe -s 4M /mnt/lustre/new_file</screen>
<para>The command below creates a new file with a stripe count of <literal>-1</literal> to
specify striping over all available OSTs:</para>
<screen>[client]# lfs setstripe -c -1 /mnt/lustre/full_stripe</screen>
- <para>The example below indicates that the file <literal>full_stripe</literal> is striped
+ <para>The example below indicates that the file
+ <literal>full_stripe</literal> is striped
over all six active OSTs in the configuration:</para>
<screen>[client]# lfs getstripe /mnt/lustre/full_stripe
/mnt/lustre/full_stripe
3 5 0x5 0
4 4 0x4 0
5 2 0x2 0</screen>
- <para> This is in contrast to the output in <xref linkend="dbdoclet.50438209_60155"/>, which
- shows only a single object for the file.</para>
+ <para> This is in contrast to the output in
+ <xref linkend="file_striping.stripe_size"/>,
+ which shows only a single object for the file.</para>
</section>
</section>
<section remap="h3">
<para>You can use <literal>lfs setstripe</literal> to create a file on a specific OST. In the
following example, the file <literal>file1</literal> is created on the first OST (OST index
is 0).</para>
- <screen>$ lfs setstripe --count 1 --index 0 file1
+ <screen>$ lfs setstripe --stripe-count 1 --index 0 file1
$ dd if=/dev/zero of=file1 count=1 bs=100M
1+0 records in
1+0 records out
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
-lmm_stripe_offset: 0
- obdidx objid objid group
+lmm_stripe_offset: 0
+ obdidx objid objid group
0 37364 0x91f4 0</screen>
</section>
</section>
- <section xml:id="dbdoclet.50438209_44776">
+ <section xml:id="file_striping.lfs_getstripe">
<title><indexterm><primary>striping</primary><secondary>getting information</secondary></indexterm>Retrieving File Layout/Striping Information (<literal>getstripe</literal>)</title>
<para>The <literal>lfs getstripe</literal> command is used to display information that shows
over which OSTs a file is distributed. For each OST, the index and UUID is displayed, along
<primary>striping</primary>
<secondary>remote directories</secondary>
</indexterm>Locating the MDT for a remote directory</title>
- <para condition="l24">Lustre software release 2.4 can be configured with
- multiple MDTs in the same file system. Each sub-directory can have a
- different MDT. To identify on which MDT a given subdirectory is
- located, pass the <literal>getstripe [--mdt-index|-M]</literal>
- parameters to <literal>lfs</literal>. An example of this command is
- provided in the section <xref linkend="dbdoclet.rmremotedir"/>.</para>
+ <para>Lustre can be configured with multiple MDTs in the same file
+ system. Each directory and file could be located on a different MDT.
+ To identify which MDT a given subdirectory is located, pass the
+ <literal>getstripe [--mdt-index|-M]</literal> parameter to
+ <literal>lfs</literal>. An example of this command is provided in
+ the section <xref linkend="lustremaint.rmremotedir"/>.</para>
</section>
</section>
<section xml:id="pfl" condition='l2A'>
flag <literal>^init</literal> here.</para></note>
</section>
</section>
- <section xml:id="dbdoclet.50438209_10424">
+
+ <section xml:id="striping.sel" condition='l2D'>
+ <title>
+ <indexterm><primary>striping</primary><secondary>SEL</secondary>
+ </indexterm>Self-Extending Layout (SEL)</title>
+ <para>The Lustre Self-Extending Layout (SEL) feature is an extension of the
+ <xref linkend="pfl"/> feature, which allows the MDS to change the defined
+ PFL layout dynamically. With this feature, the MDS monitors the used space
+ on OSTs and swaps the OSTs for the current file when they are low on space.
+ This avoids <literal>ENOSPC</literal> problems for SEL files when
+ applications are writing to them.</para>
+ <para>Whereas PFL delays the instantiation of some components until an IO
+ operation occurs on this region, SEL allows splitting such non-instantiated
+ components in two parts: an “extendable” component and an “extension”
+ component. The extendable component is a regular PFL component, covering
+ just a part of the region, which is small originally. The extension (or SEL)
+ component is a new component type which is always non-instantiated and
+ unassigned, covering the other part of the region. When a write reaches this
+ unassigned space, and the client calls the MDS to have it instantiated, the
+ MDS makes a decision as to whether to grant additional space to the extendable
+ component. The granted region moves from the head of the extension
+ component to the tail of the extendable component, thus the extendable
+ component grows and the SEL one is shortened. Therefore, it allows the file
+ to continue on the same OSTs, or in the case where space is low on one of
+ the current OSTs, to modify the layout to switch to a new component on new
+ OSTs. In particular, it lets IO automatically spill over to a large HDD OST
+ pool once a small SSD OST pool is getting low on space.</para>
+ <para>The default extension policy modifies the layout in the following
+ ways:</para>
+ <orderedlist numeration="arabic">
+ <listitem>
+ <para>Extension: continue on the same OSTs – used when not low on space
+ on any of the OSTs of the current component; a particular extent is
+ granted to the extendable component.</para>
+ </listitem>
+ <listitem>
+ <para>Spill over: switch to next component OSTs – it is used only for
+ not the last component when <emphasis>at least one</emphasis>
+ of the current OSTs is low on space; the whole region of the SEL
+ component moves to the next component and the SEL component is removed
+ in its turn.</para>
+ </listitem>
+ <listitem>
+ <para>Repeating: create a new component with the same layout but on
+ free OSTs – it is used only for the last component when <emphasis>
+ at least one</emphasis> of the current OSTs is low on space; a new
+ component has the same layout but instantiated on different OSTs (from
+ the same pool) which have enough space.</para>
+ </listitem>
+ <listitem>
+ <para>Forced extension: continue with the current component OSTs despite
+ the low on space condition – it is used only for the last component when
+ a repeating attempt detected low on space condition as well - spillover
+ is impossible and there is no sense in the repeating.</para>
+ </listitem>
+ </orderedlist>
+ <note><para>The SEL feature does not require clients to understand the SEL
+ format of already created files, only the MDS support is needed which is
+ introduced in Lustre 2.13. However, old clients will have some limitations
+ as the Lustre tools will not support it.</para></note>
+ <section>
+ <title><literal>lfs setstripe</literal></title>
+ <para>The <literal>lfs setstripe</literal> command is used to create files
+ with composite layouts, as well as add or delete components to or from an
+ existing file. It is extended to support SEL components.</para>
+ <section>
+ <title>Create a SEL file</title>
+ <para><emphasis role="bold">Command</emphasis></para>
+ <screen>lfs setstripe
+[--component-end|-E end1] [STRIPE_OPTIONS] ... <replaceable>filename</replaceable>
+
+STRIPE OPTIONS:
+--extension-size, --ext-size, -z <ext_size></screen>
+ <para>The <literal>-z</literal> option is added to specify the size of
+ the region which is granted to the extendable component on each
+ iteration. While declaring any component, this option turns the declared
+ component to a pair of components: extendable and extension ones.</para>
+ <para><emphasis role="bold">Example</emphasis></para>
+ <para>The following command creates 2 pairs of extendable and
+ extension components:
+ <screen># lfs setstripe -E 1G -z 64M -E -1 -z 256M /mnt/lustre/file</screen>
+ <figure xml:id="managinglayout.fig.sel_createfile">
+ <title>Example: create a SEL file</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" depth="0.8in" align="center"
+ fileref="figures/SEL_Createfile.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Example: create a SEL file</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ <note><para>As usual, only the first PFL component is instantiated at
+ the creation time, thus it is immediately extended to the extension
+ size (64M for the first component), whereas the third component is left
+ zero-length.</para></note>
+ <screen># lfs getstripe /mnt/lustre/file
+/mnt/lustre/file
+ lcm_layout_gen: 4
+ lcm_mirror_count: 1
+ lcm_entry_count: 4
+ lcme_id: 1
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 0
+ lcme_extent.e_end: 67108864
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
+
+ lcme_id: 2
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 67108864
+ lcme_extent.e_end: 1073741824
+ lmm_stripe_count: 0
+ lmm_extension_size: 67108864
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1
+
+ lcme_id: 3
+ lcme_mirror_id: 0
+ lcme_flags: 0
+ lcme_extent.e_start: 1073741824
+ lcme_extent.e_end: 1073741824
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1
+
+ lcme_id: 4
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 1073741824
+ lcme_extent.e_end: EOF
+ lmm_stripe_count: 0
+ lmm_extension_size: 268435456
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1</screen>
+ </section>
+ <section>
+ <title>Create a SEL layout template</title>
+ <para>Similar to PFL, it is possible to set a SEL layout template to
+ a directory. After that, all the files created under it will inherit this
+ layout by default.</para>
+ <screen># lfs setstripe -E 1G -z 64M -E -1 -z 256M /mnt/lustre/dir
+# ./lustre/utils/lfs getstripe /mnt/lustre/dir
+/mnt/lustre/dir
+ lcm_layout_gen: 0
+ lcm_mirror_count: 1
+ lcm_entry_count: 4
+ lcme_id: N/A
+ lcme_mirror_id: N/A
+ lcme_flags: 0
+ lcme_extent.e_start: 0
+ lcme_extent.e_end: 67108864
+ stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
+
+ lcme_id: N/A
+ lcme_mirror_id: N/A
+ lcme_flags: extension
+ lcme_extent.e_start: 67108864
+ lcme_extent.e_end: 1073741824
+ stripe_count: 1 extension_size: 67108864 pattern: raid0 stripe_offset: -1
+
+ lcme_id: N/A
+ lcme_mirror_id: N/A
+ lcme_flags: 0
+ lcme_extent.e_start: 1073741824
+ lcme_extent.e_end: 1073741824
+ stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
+
+ lcme_id: N/A
+ lcme_mirror_id: N/A
+ lcme_flags: extension
+ lcme_extent.e_start: 1073741824
+ lcme_extent.e_end: EOF
+ stripe_count: 1 extension_size: 268435456 pattern: raid0 stripe_offset: -1
+ </screen>
+ </section>
+ </section>
+ <section>
+ <title><literal>lfs getstripe</literal></title>
+ <para><literal>lfs getstripe</literal> commands can be used to list the
+ striping/component information for a given SEL file. Here, only those parameters
+ new for SEL files are shown.</para>
+ <para><emphasis role="bold">Command</emphasis></para>
+ <screen>lfs getstripe
+[--extension-size|--ext-size|-z] <replaceable>filename</replaceable></screen>
+ <para>The <literal>-z</literal> option is added to print the extension
+ size in bytes. For composite files this is the extension size of the
+ first extension component. If a particular component is identified by
+ other options (<literal>--component-id, --component-start</literal>,
+ etc...), this component extension size is printed.</para>
+ <para><emphasis role="bold">Example 1: List a SEL component information
+ </emphasis></para>
+ <para>Suppose we already have a composite file
+ <literal>/mnt/lustre/file</literal>, created by the following command:</para>
+ <screen># lfs setstripe -E 1G -z 64M -E -1 -z 256M /mnt/lustre/file</screen>
+ <para>The 2nd component could be listed with the following command:</para>
+ <screen># lfs getstripe -I2 /mnt/lustre/file
+/mnt/lustre/file
+ lcm_layout_gen: 4
+ lcm_mirror_count: 1
+ lcm_entry_count: 4
+ lcme_id: 2
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 67108864
+ lcme_extent.e_end: 1073741824
+ lmm_stripe_count: 0
+ lmm_extension_size: 67108864
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1
+ </screen>
+ <note><para>As you can see the SEL components are marked by the <literal>
+ extension</literal> flag and <literal>lmm_extension_size</literal> field
+ keeps the specified extension size.</para></note>
+ <para><emphasis role="bold">Example 2: List the extension size</emphasis></para>
+ <para>Having the same file as in the above example, the extension size of
+ the second component could be listed with:</para>
+ <screen># lfs getstripe -z -I2 /mnt/lustre/file
+67108864</screen>
+ <para><emphasis role="bold">Example 3: Extension</emphasis></para>
+ <para>Having the same file as in the above example, suppose there is a
+ write which crosses the end of the first component (64M), and then another
+ write another write which crosses the end of the first component (128M) again,
+ the layout changes as following:</para>
+ <figure xml:id="managinglayout.fig.sel_extension">
+ <title>Example: an extension of a SEL file</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" depth="3.5in" align="center"
+ fileref="figures/SEL_extension.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Example: an extension of a SEL file</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <para>The layout can be printed out by the following command:</para>
+ <screen># lfs getstripe /mnt/lustre/file
+/mnt/lustre/file
+ lcm_layout_gen: 6
+ lcm_mirror_count: 1
+ lcm_entry_count: 4
+ lcme_id: 1
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 0
+ lcme_extent.e_end: 201326592
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
+
+ lcme_id: 2
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 201326592
+ lcme_extent.e_end: 1073741824
+ lmm_stripe_count: 0
+ lmm_extension_size: 67108864
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1
+
+ lcme_id: 3
+ lcme_mirror_id: 0
+ lcme_flags: 0
+ lcme_extent.e_start: 1073741824
+ lcme_extent.e_end: 1073741824
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1
+
+ lcme_id: 4
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 1073741824
+ lcme_extent.e_end: EOF
+ lmm_stripe_count: 0
+ lmm_extension_size: 268435456
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1</screen>
+ <para><emphasis role="bold">Example 4: Spillover</emphasis></para>
+ <para>In case where <literal>OST0</literal> is low on space and an IO
+ happens to a SEL component, a spillover happens: the full region of the
+ SEL component is added to the next component, e.g. in the example above
+ the next layout modification will look like:</para>
+ <figure xml:id="managinglayout.fig.sel_spillover">
+ <title>Example: a spillover in a SEL file</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" depth="2.25in" align="center"
+ fileref="figures/SEL_spillover.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Example: a spillover in a SEL file</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <note><para>Despite the fact the third component was [1G, 1G] originally,
+ while it is not instantiated, instead of getting extended backward, it is
+ moved backward to the start of the previous SEL component (192M) and
+ extended on its extension size (256M) from that position, thus it becomes
+ <literal>[192M, 448M]</literal>.</para></note>
+ <screen># lfs getstripe /mnt/lustre/file
+/mnt/lustre/file
+ lcm_layout_gen: 7
+ lcm_mirror_count: 1
+ lcm_entry_count: 3
+ lcme_id: 1
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 0
+ lcme_extent.e_end: 201326592
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
+
+ lcme_id: 3
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 201326592
+ lcme_extent.e_end: 469762048
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 1
+ lmm_objects:
+ - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x8:0x0] }
+
+ lcme_id: 4
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 469762048
+ lcme_extent.e_end: EOF
+ lmm_stripe_count: 0
+ lmm_extension_size: 268435456
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1</screen>
+ <para><emphasis role="bold">Example 5: Repeating</emphasis></para>
+ <para>Suppose in the example above, <literal>OST0</literal> got
+ enough free space back but <literal>OST1</literal> is low on space,
+ the following write to the last SEL component leads to a new component
+ allocation before the SEL component, which repeats the previous
+ component layout but instantiated on free OSTs:</para>
+ <figure xml:id="managinglayout.fig.sel_repeat">
+ <title>Example: repeat a SEL component</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" depth="2.25in" align="center"
+ fileref="figures/SEL_repeating.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Example: repeat a SEL component
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <screen># lfs getstripe /mnt/lustre/file
+/mnt/lustre/file
+ lcm_layout_gen: 9
+ lcm_mirror_count: 1
+ lcm_entry_count: 4
+ lcme_id: 1
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 0
+ lcme_extent.e_end: 201326592
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
+
+ lcme_id: 3
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 201326592
+ lcme_extent.e_end: 469762048
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 1
+ lmm_objects:
+ - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x8:0x0] }
+
+ lcme_id: 8
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 469762048
+ lcme_extent.e_end: 738197504
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 65535
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x6:0x0] }
+
+ lcme_id: 4
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 738197504
+ lcme_extent.e_end: EOF
+ lmm_stripe_count: 0
+ lmm_extension_size: 268435456
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1</screen>
+ <para><emphasis role="bold">Example 6: Forced extension</emphasis></para>
+ <para>Suppose in the example above, both <literal>OST0</literal> and
+ <literal>OST1</literal> are low on space, the following write to the
+ last SEL component will behave as an extension as there is no sense to
+ repeat.</para>
+ <figure xml:id="managinglayout.fig.pfl_forced">
+ <title>Example: forced extension in a SEL file</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" depth="2.25in" align="center"
+ fileref="figures/SEL_forced.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Example: forced extension in a SEL file.
+ </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <screen># lfs getstripe /mnt/lustre/file
+/mnt/lustre/file
+ lcm_layout_gen: 11
+ lcm_mirror_count: 1
+ lcm_entry_count: 4
+ lcme_id: 1
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 0
+ lcme_extent.e_end: 201326592
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
+
+ lcme_id: 3
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 201326592
+ lcme_extent.e_end: 469762048
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: 1
+ lmm_objects:
+ - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x8:0x0] }
+
+ lcme_id: 8
+ lcme_mirror_id: 0
+ lcme_flags: init
+ lcme_extent.e_start: 469762048
+ lcme_extent.e_end: 1006632960
+ lmm_stripe_count: 1
+ lmm_stripe_size: 1048576
+ lmm_pattern: raid0
+ lmm_layout_gen: 65535
+ lmm_stripe_offset: 0
+ lmm_objects:
+ - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x6:0x0] }
+
+ lcme_id: 4
+ lcme_mirror_id: 0
+ lcme_flags: extension
+ lcme_extent.e_start: 1006632960
+ lcme_extent.e_end: EOF
+ lmm_stripe_count: 0
+ lmm_extension_size: 268435456
+ lmm_pattern: raid0
+ lmm_layout_gen: 0
+ lmm_stripe_offset: -1</screen>
+ </section>
+ <section>
+ <title><literal>lfs find</literal></title>
+ <para><literal>lfs find</literal> commands can be used to search for
+ the files that match the given SEL component paremeters. Here, only
+ those parameters new for the SEL files are shown.</para>
+ <screen>lfs find
+[[!] --extension-size|--ext-size|-z [+-]ext-size[KMG]
+[[!] --component-flags=extension]</screen>
+ <para>The <literal>-z</literal> option is added to specify the extension
+ size to search for. The files which have any component with the
+ extension size matched the given criteria are printed out. As always
+ “+” and “-“ signs are allowed to specify the least and the most size.
+ </para>
+ <para>A new <literal>extension</literal> component flag is added. Only
+ files which have at least one SEL component are printed.</para>
+ <note><para>The negative search for flags searches the files which
+ <emphasis role="strong">have</emphasis> a non-SEL component (not files
+ which <emphasis role="strong">do not have</emphasis> any SEL component).
+ </para></note>
+ <para><emphasis role="bold">Example</emphasis></para>
+ <screen># lfs setstripe --extension-size 64M -c 1 -E -1 /mnt/lustre/file
+
+# lfs find --comp-flags extension /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find ! --comp-flags extension /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find -z 64M /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find -z +64M /mnt/lustre/*
+
+# lfs find -z -64M /mnt/lustre/*
+
+# lfs find -z +63M /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find -z -65M /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find -z 65M /mnt/lustre/*
+
+# lfs find ! -z 64M /mnt/lustre/*
+
+# lfs find ! -z +64M /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find ! -z -64M /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find ! -z +63M /mnt/lustre/*
+
+# lfs find ! -z -65M /mnt/lustre/*
+
+# lfs find ! -z 65M /mnt/lustre/*
+/mnt/lustre/file</screen>
+ </section>
+ </section>
+
+ <section xml:id="foreign_layout" condition='l2D'>
+ <title>
+ <indexterm><primary>striping</primary><secondary>Foreign</secondary>
+ </indexterm>Foreign Layout</title>
+ <para>The Lustre Foreign Layout feature is an extension of both the
+ LOV and LMV formats which allows the creation of empty files and directories
+ with the necessary specifications to point to corresponding objects outside
+ from Lustre namespace.</para>
+ <para>The new LOV/LMV foreign internal format can be represented as:</para>
+ <figure xml:id="managinglayout.fig.foreign_format">
+ <title>LOV/LMV foreign format</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" width="100%"
+ fileref="figures/Foreign_Format.png" />
+ </imageobject>
+ <textobject>
+ <phrase>LOV/LMV foreign format</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <section>
+ <title><literal>lfs set[dir]stripe</literal></title>
+ <para>The <literal>lfs set[dir]stripe</literal> commands are used to
+ create files or directories with foreign layouts, by calling the
+ corresponding API, itself invoking the appropriate ioctl().</para>
+ <section>
+ <title>Create a Foreign file/dir</title>
+ <para><emphasis role="bold">Command</emphasis></para>
+ <screen>lfs set[dir]stripe \
+--foreign[=<foreign_type>] --xattr|-x <layout_string> \
+[--flags <hex_bitmask>] [--mode <mode_bits>] \
+<replaceable>{file,dir}name</replaceable></screen>
+ <para>Both the <literal>--foreign</literal> and
+ <literal>--xattr|-x</literal> options are mandatory.
+ The <literal><foreign_type></literal> (default is "none", meaning
+ no special behavior), and both <literal>--flags</literal> and
+ <literal>--mode</literal> (default is 0666) options are optional.</para>
+ <para><emphasis role="bold">Example</emphasis></para>
+ <para>The following command creates a foreign file of "none" type and
+ with "foo@bar" LOV content and specific mode and flags:
+ <screen># lfs setstripe --foreign=none --flags=0xda08 --mode=0640 \
+--xattr=foo@bar /mnt/lustre/file</screen>
+ <figure xml:id="managinglayout.fig.foreign_createfile">
+ <title>Example: create a foreign file</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" width="100%" align="center"
+ fileref="figures/Foreign_Createfile.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Example: create a foreign file</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ </para>
+ </section>
+ </section>
+ <section>
+ <title><literal>lfs get[dir]stripe</literal></title>
+ <para><literal>lfs get[dir]stripe</literal> commands can be used to
+ retrieve foreign LOV/LMV informations and content.</para>
+ <para><emphasis role="bold">Command</emphasis></para>
+ <screen>lfs get[dir]stripe [-v] <replaceable>filename</replaceable></screen>
+ <para><emphasis role="bold">List foreign layout information
+ </emphasis></para>
+ <para>Suppose we already have a foreign file
+ <literal>/mnt/lustre/file</literal>, created by the following command:</para>
+ <screen># lfs setstripe --foreign=none --flags=0xda08 --mode=0640 \
+--xattr=foo@bar /mnt/lustre/file</screen>
+ <para>The full foreign layout informations can be listed using the
+ following command:</para>
+ <screen># lfs getstripe -v /mnt/lustre/file
+/mnt/lustre/file
+ lfm_magic: 0x0BD70BD0
+ lfm_length: 7
+ lfm_type: none
+ lfm_flags: 0x0000DA08
+ lfm_value: foo@bar
+ </screen>
+ <note><para>As you can see the <literal>lfm_length</literal> field
+ value is the characters number in the variable length
+ <literal>lfm_value</literal> field.</para></note>
+ </section>
+ <section>
+ <title><literal>lfs find</literal></title>
+ <para><literal>lfs find</literal> commands can be used to search for
+ all the foreign files/directories or those that match the given
+ selection paremeters.</para>
+ <screen>lfs find
+[[!] --foreign[=<foreign_type>]</screen>
+ <para>The <literal>--foreign[=<foreign_type>]</literal> option
+ has been added to specify that all [!,but not] files and/or directories
+ with a foreign layout [and [!,but not] of
+ <literal><foreign_type></literal>] will be retrieved.</para>
+ <para><emphasis role="bold">Example</emphasis></para>
+ <screen># lfs setstripe --foreign=none --xattr=foo@bar /mnt/lustre/file
+# touch /mnt/lustre/file2
+
+# lfs find --foreign /mnt/lustre/*
+/mnt/lustre/file
+
+# lfs find ! --foreign /mnt/lustre/*
+/mnt/lustre/file2
+
+# lfs find --foreign=none /mnt/lustre/*
+/mnt/lustre/file</screen>
+ </section>
+ </section>
+
+ <section xml:id="file_striping.managing_free_space">
<title><indexterm>
<primary>space</primary>
<secondary>free space</secondary>
<literal>lctl set_param</literal> command, for example the next command reserve 1GB space
for all OSTs.
<screen>lctl set_param -P osp.*.reserved_mb_low=1024</screen></para>
- <para>This section describes how to check available free space on disks and how free space is
- allocated. It then describes how to set the threshold and weighting factors for the allocation
- algorithms.</para>
- <section xml:id="dbdoclet.checking_free_space">
+ <para>This section describes how to check available free space on disks
+ and how free space is allocated. It then describes how to set the
+ threshold and weighting factors for the allocation algorithms.</para>
+ <section xml:id="file_striping.checking_free_space">
<title>Checking File System Free Space</title>
- <para>Free space is an important consideration in assigning file stripes. The <literal>lfs
- df</literal> command can be used to show available disk space on the mounted Lustre file
- system and space consumption per OST. If multiple Lustre file systems are mounted, a path
- may be specified, but is not required. Options to the <literal>lfs df</literal> command are
- shown below.</para>
+ <para>Free space is an important consideration in assigning file stripes.
+ The <literal>lfs df</literal> command can be used to show available
+ disk space on the mounted Lustre file system and space consumption per
+ OST. If multiple Lustre file systems are mounted, a path may be
+ specified, but is not required. Options to the <literal>lfs df</literal>
+ command are shown below.</para>
<informaltable frame="all">
<tgroup cols="2">
<colspec colname="c1" colwidth="50*"/>
<tbody>
<row>
<entry>
- <para> <literal>-h</literal></para>
+ <para>
+ <literal>-h</literal>, <literal>--human-readable</literal>
+ </para>
+ </entry>
+ <entry>
+ <para> Displays sizes in human readable format (for example: 1K,
+ 234M, 5G) using base-2 (binary) values (i.e. 1G = 1024M).</para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <para>
+ <literal>-H</literal>, <literal>--si</literal>
+ </para>
</entry>
<entry>
- <para> Displays sizes in human readable format (for example: 1K, 234M, 5G).</para>
+ <para>Like <literal>-h</literal>, this displays counts in human
+ readable format, but using base-10 (decimal) values
+ (i.e. 1G = 1000M).</para>
</entry>
</row>
<row>
<para> Lists inodes instead of block usage.</para>
</entry>
</row>
+ <row>
+ <entry>
+ <para> <literal role="bold">-l, --lazy</literal></para>
+ </entry>
+ <entry>
+ <para>Do not attempt to contact any OST or MDT not currently
+ connected to the client. This avoids blocking the
+ <literal>lfs df</literal> output if a target is offline or
+ unreachable, and only returns the space on OSTs that can
+ currently be accessed.</para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <para> <literal role="bold">-p, --pool</literal></para>
+ </entry>
+ <entry>
+ <para>Limit the usage to report only OSTs that are in the
+ specified <replaceable>pool</replaceable>. If multiple
+ Lustre filesystems are mounted, list the OSTs in
+ <replaceable>pool</replaceable> for each filesystem, or
+ limit the display to only a pool for a specific filesystem
+ if <replaceable>fsname.pool</replaceable> is given.
+ Specifying both <replaceable>fsname</replaceable> and
+ <replaceable>pool</replaceable> is equivalent to providing
+ a specific mountpoint.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <para>
+ <literal>-v</literal>, <literal>--verbose</literal>
+ </para>
+ </entry>
+ <entry>
+ <para>Display verbose status of MDTs and OSTs. This may
+ include one or more optional flags at the end of each line.
+ </para>
+ </entry>
+ </row>
</tbody>
</tgroup>
</informaltable>
+ <para>
+ <literal>lfs df</literal> may also report additional target status
+ as the last column in the display, if there are issues with that target.
+ Target states include:
+ </para>
+ <itemizedlist>
+ <listitem><para>
+ <literal>D</literal>: OST/MDT is <literal>Degraded</literal>.
+ The target has a failed drive in the RAID device, or is
+ undergoing RAID reconstruction. This state is marked on
+ the server automatically for ZFS targets via
+ <literal>zed</literal>, or a (user-supplied) script that
+ monitors the target device and sets
+ "<literal>lctl set_param obdfilter.<replaceable>target</replaceable>.degraded=1</literal>"
+ on the OST. This target will be avoided for new
+ allocations, but will still be used to read existing files
+ located there or if there are not enough non-degraded OSTs
+ to make up a widely-striped file.
+ </para></listitem>
+ <listitem><para>
+ <literal>R</literal>: OST/MDT is <literal>Read-only</literal>.
+ The target filesystem is marked read-only due to filesystem
+ corruption detected by ldiskfs or ZFS. No modifications
+ are allowed on this OST, and it needs to be unmounted and
+ <literal>e2fsck</literal> or <literal>zpool scrub</literal>
+ run to repair the underlying filesystem.
+ </para></listitem>
+ <listitem><para>
+ <literal>N</literal>: OST/MDT is <literal>No-precreate</literal>.
+ The target is configured to deny object precreation set by
+ "<literal>lctl set_param obdfilter.<replaceable>target</replaceable>.no_precreate=1</literal>"
+ parameter or the "<literal>-o no_precreate</literal>" mount option.
+ This may be done to add an OST to the filesystem without allowing
+ objects to be allocated on it yet, or for other reasons.
+ </para></listitem>
+ <listitem><para>
+ <literal>S</literal>: OST/MDT is out of <literal>Space</literal>.
+ The target filesystem has less than the minimum required
+ free space and will not be used for new object allocations
+ until it has more free space.
+ </para></listitem>
+ <listitem><para>
+ <literal>I</literal>: OST/MDT is out of <literal>Inodes</literal>.
+ The target filesystem has less than the minimum required
+ free inodes and will not be used for new object allocations
+ until it has more free inodes.
+ </para></listitem>
+ <listitem><para>
+ <literal>f</literal>: OST/MDT is on <literal>flash</literal>.
+ The target filesystem is using a flash (non-rotational)
+ storage device. This is normally detected from the
+ underlying Linux block device, but can be set manually
+ with "<literal>lctl set_param osd-*.*.nonrotational=1</literal>
+ on the respective OSTs. This lower-case status is only
+ shown in conjunction with the <literal>-v</literal> option,
+ since it is not an error condition.
+ </para></listitem>
+ </itemizedlist>
<note>
- <para>The <literal>df -i</literal> and <literal>lfs df -i</literal> commands show the
- <emphasis role="italic">minimum</emphasis> number of inodes that can be created in the
- file system at the current time. If the total number of objects available across all of
- the OSTs is smaller than those available on the MDT(s), taking into account the default
- file striping, then <literal>df -i</literal> will also report a smaller number of inodes
- than could be created. Running <literal>lfs df -i</literal> will report the actual number
- of inodes that are free on each target.</para>
- <para>For ZFS file systems, the number of inodes that can be created is dynamic and depends
- on the free space in the file system. The Free and Total inode counts reported for a ZFS
- file system are only an estimate based on the current usage for each target. The Used
- inode count is the actual number of inodes used by the file system.</para>
+ <para>The <literal>df -i</literal> and <literal>lfs df -i</literal>
+ commands show the <emphasis role="italic">minimum</emphasis> number
+ of inodes that can be created in the file system at the current time.
+ If the total number of objects available across all of the OSTs is
+ smaller than those available on the MDT(s), taking into account the
+ default file striping, then <literal>df -i</literal> will also
+ report a smaller number of inodes than could be created. Running
+ <literal>lfs df -i</literal> will report the actual number of inodes
+ that are free on each target.
+ </para>
+ <para>For ZFS file systems, the number of inodes that can be created
+ is dynamic and depends on the free space in the file system. The
+ Free and Total inode counts reported for a ZFS file system are only
+ an estimate based on the current usage for each target. The Used
+ inode count is the actual number of inodes used by the file system.
+ </para>
</note>
<para><emphasis role="bold">Examples</emphasis></para>
- <screen>[client1] $ lfs df
-UUID 1K-blockS Used Available Use% Mounted on
-mds-lustre-0_UUID 9174328 1020024 8154304 11% /mnt/lustre[MDT:0]
-ost-lustre-0_UUID 94181368 56330708 37850660 59% /mnt/lustre[OST:0]
-ost-lustre-1_UUID 94181368 56385748 37795620 59% /mnt/lustre[OST:1]
-ost-lustre-2_UUID 94181368 54352012 39829356 57% /mnt/lustre[OST:2]
-filesystem summary: 282544104 167068468 39829356 57% /mnt/lustre
+ <screen>client$ lfs df
+UUID 1K-blocks Used Available Use% Mounted on
+testfs-OST0000_UUID 9174328 1020024 8154304 11% /mnt/lustre[MDT:0]
+testfs-OST0000_UUID 94181368 56330708 37850660 59% /mnt/lustre[OST:0]
+testfs-OST0001_UUID 94181368 56385748 37795620 59% /mnt/lustre[OST:1]
+testfs-OST0002_UUID 94181368 54352012 39829356 57% /mnt/lustre[OST:2]
+filesystem summary: 282544104 167068468 39829356 57% /mnt/lustre
-[client1] $ lfs df -h
-UUID bytes Used Available Use% Mounted on
-mds-lustre-0_UUID 8.7G 996.1M 7.8G 11% /mnt/lustre[MDT:0]
-ost-lustre-0_UUID 89.8G 53.7G 36.1G 59% /mnt/lustre[OST:0]
-ost-lustre-1_UUID 89.8G 53.8G 36.0G 59% /mnt/lustre[OST:1]
-ost-lustre-2_UUID 89.8G 51.8G 38.0G 57% /mnt/lustre[OST:2]
-filesystem summary: 269.5G 159.3G 110.1G 59% /mnt/lustre
+[client1] $ lfs df -hv
+UUID bytes Used Available Use% Mounted on
+testfs-MDT0000_UUID 8.7G 996.1M 7.8G 11% /mnt/lustre[MDT:0]
+testfs-OST0000_UUID 89.8G 53.7G 36.1G 59% /mnt/lustre[OST:0] f
+testfs-OST0001_UUID 89.8G 53.8G 36.0G 59% /mnt/lustre[OST:1] f
+testfs-OST0002_UUID 89.8G 51.8G 38.0G 57% /mnt/lustre[OST:2] f
+filesystem summary: 269.5G 159.3G 110.1G 59% /mnt/lustre
-[client1] $ lfs df -i
-UUID Inodes IUsed IFree IUse% Mounted on
-mds-lustre-0_UUID 2211572 41924 2169648 1% /mnt/lustre[MDT:0]
-ost-lustre-0_UUID 737280 12183 725097 1% /mnt/lustre[OST:0]
-ost-lustre-1_UUID 737280 12232 725048 1% /mnt/lustre[OST:1]
-ost-lustre-2_UUID 737280 12214 725066 1% /mnt/lustre[OST:2]
-filesystem summary: 2211572 41924 2169648 1% /mnt/lustre[OST:2]</screen>
+[client1] $ lfs df -iH
+UUID Inodes IUsed IFree IUse% Mounted on
+testfs-MDT0000_UUID 2.21M 41.9k 2.17M 1% /mnt/lustre[MDT:0]
+testfs-OST0000_UUID 737.3k 12.1k 725.1k 1% /mnt/lustre[OST:0]
+testfs-OST0001_UUID 737.3k 12.2k 725.0k 1% /mnt/lustre[OST:1]
+testfs-OST0002_UUID 737.3k 12.2k 725.0k 1% /mnt/lustre[OST:2]
+filesystem summary: 2.21M 41.9k 2.17M 1% /mnt/lustre[OST:2]
+</screen>
</section>
<section remap="h3">
<title><indexterm>
necessarily chosen each time.</para>
</listitem>
</itemizedlist>
- <para>The allocation method is determined by the amount of free-space imbalance on the OSTs.
- When free space is relatively balanced across OSTs, the faster round-robin allocator is
- used, which maximizes network balancing. The weighted allocator is used when any two OSTs
- are out of balance by more than the specified threshold (17% by default). The threshold
- between the two allocation methods is defined in the file
- <literal>/proc/fs/<replaceable>fsname</replaceable>/lov/<replaceable>fsname</replaceable>-mdtlov/qos_threshold_rr</literal>. </para>
- <para>To set the <literal>qos_threshold_r</literal> to <literal>25</literal>, enter this
- command on the
- MGS:<screen>lctl set_param lov.<replaceable>fsname</replaceable>-mdtlov.qos_threshold_rr=25</screen></para>
+ <para>The allocation method is determined by the amount of free-space
+ imbalance on the OSTs. When free space is relatively balanced across
+ OSTs, the faster round-robin allocator is used, which maximizes network
+ balancing. The weighted allocator is used when any two OSTs are out of
+ balance by more than the specified threshold (17% by default). The
+ threshold between the two allocation methods is defined by the
+ <literal>qos_threshold_rr</literal> parameter. </para>
+ <para>To temporarily set the <literal>qos_threshold_rr</literal> to
+ <literal>25</literal>, enter the folowing on each MDS:
+ <screen>mds# lctl set_param lod.<replaceable>fsname</replaceable>*.qos_threshold_rr=25</screen></para>
</section>
<section remap="h3">
<title><indexterm>
<primary>space</primary>
<secondary>location weighting</secondary>
</indexterm>Adjusting the Weighting Between Free Space and Location</title>
- <para>The weighting priority used by the weighted allocator is set in the file
- <literal>/proc/fs/<replaceable>fsname</replaceable>/lov/<replaceable>fsname</replaceable>-mdtlov/qos_prio_free</literal>.
- Increasing the value of <literal>qos_prio_free</literal> puts more weighting on the amount
- of free space available on each OST and less on how stripes are distributed across OSTs. The
- default value is <literal>91</literal> (percent). When the free space priority is set to
+ <para>The weighting priority used by the weighted allocator is set by the
+ the <literal>qos_prio_free</literal> parameter.
+ Increasing the value of <literal>qos_prio_free</literal> puts more
+ weighting on the amount of free space available on each OST and less
+ on how stripes are distributed across OSTs. The default value is
+ <literal>91</literal> (percent). When the free space priority is set to
<literal>100</literal> (percent), weighting is based entirely on free space and location
is no longer used by the striping algorithm. </para>
- <para>To change the allocator weighting to <literal>100</literal>, enter this command on the
+ <para>To permanently change the allocator weighting to <literal>100</literal>, enter this command on the
MGS:</para>
- <screen>lctl conf_param <replaceable>fsname</replaceable>-MDT0000.lov.qos_prio_free=100</screen>
+ <screen>lctl conf_param <replaceable>fsname</replaceable>-MDT0000-*.lod.qos_prio_free=100</screen>
<para> .</para>
<note>
<para>When <literal>qos_prio_free</literal> is set to <literal>100</literal>, a weighted
<literal>ea_inode</literal> feature on the MDT:
<screen>tune2fs -O ea_inode /dev/<replaceable>mdtdev</replaceable></screen>
</para>
+ <note condition='l2D'><para>Since Lustre 2.13 the
+ <literal>ea_inode</literal> feature is enabled by default on all newly
+ formatted ldiskfs MDT filesystems.</para></note>
<note><para>The maximum stripe count for a single file does not limit the
maximum number of OSTs that are in the filesystem as a whole, only the
maximum possible size and maximum aggregate bandwidth for the file.
</para></note>
</section>
</chapter>
+<!--
+ vim:expandtab:shiftwidth=2:tabstop=8:
+ -->