LUDOC-306 dne: add more description for DNE usage

author Andreas Dilger <andreas.dilger@intel.com>

Mon, 28 Mar 2016 21:35:16 +0000 (15:35 -0600)

committer Richard Henwood <richard.henwood@intel.com>

Wed, 30 Mar 2016 16:15:18 +0000 (16:15 +0000)
author Andreas Dilger <andreas.dilger@intel.com>
Mon, 28 Mar 2016 21:35:16 +0000 (15:35 -0600)
committer Richard Henwood <richard.henwood@intel.com>
Wed, 30 Mar 2016 16:15:18 +0000 (16:15 +0000)
diff --git a/BackupAndRestore.xml b/BackupAndRestore.xml

index c8a955f..77f3fb1 100644 (file)
--- a/BackupAndRestore.xml
+++ b/BackupAndRestore.xml
@@ -390,8 +390,8 @@ Changelog records consumed: 42</screen>
      <title>
      <indexterm>
        <primary>backup</primary>
-      <secondary>MDS/OST device level</secondary>
-    </indexterm>Backing Up and Restoring an MDS or OST (Device Level)</title>
+      <secondary>MDT/OST device level</secondary>
+    </indexterm>Backing Up and Restoring an MDT or OST (Device Level)</title>
      <para>In some cases, it is useful to do a full device-level backup of an
      individual device (MDT or OST), before replacing hardware, performing
      maintenance, etc. Doing full device-level backups ensures that all of the
diff --git a/Glossary.xml b/Glossary.xml

index a7d104d..172f7d7 100644 (file)
--- a/Glossary.xml
+++ b/Glossary.xml
@@ -82,17 +82,17 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US">
      <glossentry xml:id="DNE" condition='l24'>
        <glossterm>Distributed namespace (DNE)</glossterm>
        <glossdef>
-               <para>A collection of metadata targets serving a single file
+        <para>A collection of metadata targets serving a single file
          system namespace. Prior to DNE, Lustre file systems were limited to a
          single metadata target for the entire name space. Without the ability 
          to distribute metadata load over multiple targets, Lustre file system
          performance is limited. Lustre was enhanced with DNE functionality in
          two development phases. After completing the first phase of development
          in Lustre software version 2.4, <emphasis>Remote Directories</emphasis>
-        allowed the metadata for sub-directories to be serviced by an
+        allows the metadata for sub-directories to be serviced by an
          independent MDT(s). After completing the second phase of development in
          Lustre software version 2.8, <emphasis>Striped Directories</emphasis>
-        allowed files in a single directory to be serviced by multiple MDTs.
+        allows files in a single directory to be serviced by multiple MDTs.
          </para>
        </glossdef>
      </glossentry>
@@ -666,7 +666,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US">
      <glossentry xml:id="remotedirectories" condition='l24'>
        <glossterm>Remote directory</glossterm>
        <glossdef>
-               <para>A remote directory describes a feature of
+        <para>A remote directory describes a feature of
          Lustre where metadata for files in a given directory may be
          stored on a different MDT than the metadata for the parent
          directory. Remote directories only became possible with the
@@ -746,7 +746,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US">
      <glossentry xml:id="stripeddirectory" condition='l28'>
        <glossterm>Striped Directory</glossterm>
        <glossdef>
-               <para>A striped directory is a feature of Lustre
+        <para>A striped directory is a feature of Lustre
          software where metadata for files in a given directory are
          distributed evenly over multiple MDTs. Striped directories
          are only available in Lustre software version 2.8 or later.
diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml

index 841bd5f..fc46c29 100644 (file)
--- a/LustreMaintenance.xml
+++ b/LustreMaintenance.xml
@@ -268,48 +268,56 @@ Changing a Server NID</title>
           <para>where <replaceable>devicename</replaceable> is the Lustre target name, e.g.
              <literal>testfs-OST0013</literal></para>
          </listitem>
-       <listitem>
-         <para>If the MGS and MDS share a partition, stop the MGS:</para>
-         <screen>umount <replaceable>mount_point</replaceable></screen>
-       </listitem>
+        <listitem>
+          <para>If the MGS and MDS share a partition, stop the MGS:</para>
+          <screen>umount <replaceable>mount_point</replaceable></screen>
+        </listitem>
        </orderedlist>
-      <note><para>The <literal>replace_nids</literal> command also cleans all old, invalidated records out of the configuration log, while preserving all other current settings.</para></note> 
-      <note><para>The previous configuration log is backed up on the MGS disk with the suffix <literal>'.bak'</literal>.</para></note>
+      <note><para>The <literal>replace_nids</literal> command also cleans
+      all old, invalidated records out of the configuration log, while
+      preserving all other current settings.</para></note> 
+      <note><para>The previous configuration log is backed up on the MGS
+      disk with the suffix <literal>'.bak'</literal>.</para></note>
      </section>
      <section xml:id="dbdoclet.addingamdt" condition='l24'>
        <title><indexterm>
          <primary>maintenance</primary>
          <secondary>adding an MDT</secondary>
        </indexterm>Adding a New MDT to a Lustre File System</title>
-        <para>Additional MDTs can be added to serve one or more remote sub-directories within the
-      file system. It is possible to have multiple remote sub-directories reference the same MDT.
-      However, the root directory will always be located on MDT0. To add a new MDT into the file
-      system:</para>
+        <para>Additional MDTs can be added using the DNE feature to serve one
+        or more remote sub-directories within a filesystem, in order to
+        increase the total number of files that can be created in the
+        filesystem, to increase aggregate metadata performance, or to isolate
+        user or application workloads from other users of the filesystem. It
+        is possible to have multiple remote sub-directories reference the
+        same MDT.  However, the root directory will always be located on
+        MDT0. To add a new MDT into the file system:</para>
        <orderedlist>
          <listitem>
-                       <para>Discover the maximum MDT index. Each MDTs must have unique index.</para>
-               <screen>
+          <para>Discover the maximum MDT index. Each MDTs must have unique index.</para>
+<screen>
  client$ lctl dl | grep mdc
  36 UP mdc testfs-MDT0000-mdc-ffff88004edf3c00 4c8be054-144f-9359-b063-8477566eb84e 5
  37 UP mdc testfs-MDT0001-mdc-ffff88004edf3c00 4c8be054-144f-9359-b063-8477566eb84e 5
  38 UP mdc testfs-MDT0002-mdc-ffff88004edf3c00 4c8be054-144f-9359-b063-8477566eb84e 5
  39 UP mdc testfs-MDT0003-mdc-ffff88004edf3c00 4c8be054-144f-9359-b063-8477566eb84e 5
-               </screen>
+</screen>
          </listitem>
          <listitem>
-                       <para>Add the new block device as a new MDT at the next available index. In this example, the next available index is 4.</para>
-               <screen>
-mds# mkfs.lustre --reformat --fsname=<replaceable>filesystem_name</replaceable> --mdt --mgsnode=<replaceable>mgsnode</replaceable> --index 4 <replaceable>/dev/mdt4_device</replaceable>
-               </screen>
+          <para>Add the new block device as a new MDT at the next available
+          index. In this example, the next available index is 4.</para>
+<screen>
+mds# mkfs.lustre --reformat --fsname=<replaceable>testfs</replaceable> --mdt --mgsnode=<replaceable>mgsnode</replaceable> --index 4 <replaceable>/dev/mdt4_device</replaceable>
+</screen>
          </listitem>
          <listitem>
-                       <para>Mount the MDTs.</para>
-               <screen>
+          <para>Mount the MDTs.</para>
+<screen>
  mds# mount –t lustre <replaceable>/dev/mdt4_blockdevice</replaceable> /mnt/mdt4
-               </screen>
+</screen>
          </listitem>
        </orderedlist>
-       </section>
+    </section>
      <section xml:id="dbdoclet.50438199_22527">
        <title><indexterm><primary>maintenance</primary><secondary>adding a OST</secondary></indexterm>
  Adding a New OST to a Lustre File System</title>
diff --git a/LustreOperations.xml b/LustreOperations.xml

index aa46def..cbe0f63 100644 (file)
--- a/LustreOperations.xml
+++ b/LustreOperations.xml
@@ -376,9 +376,9 @@ client# lfs mkdir –i
  </screen>
      <para>This command will allocate the sub-directory
      <literal>remote_dir</literal> onto the MDT of index
-    <literal>mdtindex</literal>. For more information on adding additional MDTs
+    <literal>mdt_index</literal>. For more information on adding additional MDTs
      and 
-    <literal>mdtindex</literal> see
+    <literal>mdt_index</literal> see
      <xref linkend='dbdoclet.addmdtindex' />.</para>
      <warning>
        <para>An administrator can allocate remote sub-directories to separate
@@ -420,9 +420,11 @@ client# lfs mkdir –i
        <primary>striping</primary>
        <secondary>metadata</secondary>
      </indexterm>Creating a directory striped across multiple MDTs</title>
-       <para>Lustre 2.8 enables individual files in a given directory to
-    record their metadata on separate MDTs (a <emphasis>striped
-    directory</emphasis>). The result of this is that metadata requests for
+    <para>The Lustre 2.8 DNE feature enables individual files in a given
+    directory to store their metadata on separate MDTs (a <emphasis>striped
+    directory</emphasis>) once additional MDTs have been added to the
+    filesystem, see <xref linkend="dbdoclet.addingamdt"/>.
+    The result of this is that metadata requests for
      files in a striped directory are serviced by multiple MDTs and metadata
      service load is distributed over all the MDTs that service a given
      directory. By distributing metadata service load over multiple MDTs,
@@ -430,13 +432,16 @@ client# lfs mkdir –i
      performance. Prior to the development of this feature all files in a
      directory must record their metadata on a single MDT.</para>
      <para>This command to stripe a directory over
-       <replaceable>mdt_count</replaceable> MDTs is:
-       </para>
+    <replaceable>mdt_count</replaceable> MDTs is:
+    </para>
      <screen>
-client# lfs setdirstripe -c
+client# lfs mkdir -c
  <replaceable>mdt_count</replaceable>
  <replaceable>/mount_point/new_directory</replaceable>
  </screen>
+    <para>The striped directory feature is most useful for distributing
+    single large directories (50k entries or more) across multiple MDTs,
+    since it incurs more overhead than non-striped directories.</para>
    </section>
    <section xml:id="dbdoclet.50438194_88980">
      <title>
diff --git a/LustreRecovery.xml b/LustreRecovery.xml

index 67a1832..f9c4210 100644 (file)
--- a/LustreRecovery.xml
+++ b/LustreRecovery.xml
@@ -112,17 +112,21 @@
          recovery will take as long as is needed for the single MDS to be restarted.</para>
        <para>When <xref linkend="imperativerecovery"/> is enabled, clients are notified of an MDS restart (either the backup or a restored primary). Clients always may detect an MDS failure either by timeouts of in-flight requests or idle-time ping messages. In either case the clients then connect to the new backup MDS and use the Metadata Replay protocol. Metadata Replay is responsible for ensuring that the backup MDS re-acquires state resulting from transactions whose effects were made visible to clients, but which were not committed to the disk.</para>
        <para>The reconnection to a new (or restarted) MDS is managed by the file system configuration loaded by the client when the file system is first mounted. If a failover MDS has been configured (using the <literal>--failnode=</literal> option to <literal>mkfs.lustre</literal> or <literal>tunefs.lustre</literal>), the client tries to reconnect to both the primary and backup MDS until one of them responds that the failed MDT is again available. At that point, the client begins recovery. For more information, see <xref linkend="metadatereplay"/>.</para>
-      <para>Transaction numbers are used to ensure that operations are replayed in the order they
-        were originally performed, so that they are guaranteed to succeed and present the same file
-        system state as before the failure. In addition, clients inform the new server of their
-        existing lock state (including locks that have not yet been granted). All metadata and lock
-        replay must complete before new, non-recovery operations are permitted. In addition, only
-        clients that were connected at the time of MDS failure are permitted to reconnect during the
-        recovery window, to avoid the introduction of state changes that might conflict with what is
-        being replayed by previously-connected clients.</para>
-               <para condition="l24">Lustre software release 2.4 introduces multiple metadata targets. If
-        multiple metadata targets are in use, active-active failover is possible. See <xref
-          linkend="dbdoclet.mdtactiveactive"/> for more information.</para>
+      <para>Transaction numbers are used to ensure that operations are
+      replayed in the order they were originally performed, so that they
+      are guaranteed to succeed and present the same file system state as
+      before the failure. In addition, clients inform the new server of their
+      existing lock state (including locks that have not yet been granted).
+      All metadata and lock replay must complete before new, non-recovery
+      operations are permitted. In addition, only clients that were connected
+      at the time of MDS failure are permitted to reconnect during the recovery
+      window, to avoid the introduction of state changes that might conflict
+      with what is being replayed by previously-connected clients.</para>
+      <para condition="l24">Lustre software release 2.4 introduces multiple
+      metadata targets. If multiple MDTs are in use, active-active failover
+      is possible (e.g. two MDS nodes, each actively serving one or more
+      different MDTs for the same filesystem). See
+      <xref linkend="dbdoclet.mdtactiveactive"/> for more information.</para>
      </section>
      <section remap="h3">
        <title><indexterm><primary>recovery</primary><secondary>OST failure</secondary></indexterm>OST Failure (Failover)</title>
diff --git a/ManagingFileSystemIO.xml b/ManagingFileSystemIO.xml

index 3f3142b..16e8bc8 100644 (file)
--- a/ManagingFileSystemIO.xml
+++ b/ManagingFileSystemIO.xml
@@ -33,31 +33,31 @@ xml:id="managingfilesystemio">
  client# lfs df -h
  UUID                       bytes           Used            Available       \
  Use%            Mounted on
-lustre-MDT0000_UUID        4.4G            214.5M          3.9G            \
-4%              /mnt/lustre[MDT:0]
-lustre-OST0000_UUID        2.0G            751.3M          1.1G            \
-37%             /mnt/lustre[OST:0]
-lustre-OST0001_UUID        2.0G            755.3M          1.1G            \
-37%             /mnt/lustre[OST:1]
-lustre-OST0002_UUID        2.0G            1.7G            155.1M          \
-86%             /mnt/lustre[OST:2] &lt;-
-lustre-OST0003_UUID        2.0G            751.3M          1.1G            \
-37%             /mnt/lustre[OST:3]
-lustre-OST0004_UUID        2.0G            747.3M          1.1G            \
-37%             /mnt/lustre[OST:4]
-lustre-OST0005_UUID        2.0G            743.3M          1.1G            \
-36%             /mnt/lustre[OST:5]
+testfs-MDT0000_UUID        4.4G            214.5M          3.9G            \
+4%              /mnt/testfs[MDT:0]
+testfs-OST0000_UUID        2.0G            751.3M          1.1G            \
+37%             /mnt/testfs[OST:0]
+testfs-OST0001_UUID        2.0G            755.3M          1.1G            \
+37%             /mnt/testfs[OST:1]
+testfs-OST0002_UUID        2.0G            1.7G            155.1M          \
+86%             /mnt/testfs[OST:2] ****
+testfs-OST0003_UUID        2.0G            751.3M          1.1G            \
+37%             /mnt/testfs[OST:3]
+testfs-OST0004_UUID        2.0G            747.3M          1.1G            \
+37%             /mnt/testfs[OST:4]
+testfs-OST0005_UUID        2.0G            743.3M          1.1G            \
+36%             /mnt/testfs[OST:5]
   
  filesystem summary:        11.8G           5.4G            5.8G            \
-45%             /mnt/lustre
+45%             /mnt/testfs
  </screen>
-      <para>In this case, OST:2 is almost full and when an attempt is made to
+      <para>In this case, OST0002 is almost full and when an attempt is made to
        write additional information to the file system (even with uniform
        striping over all the OSTs), the write command fails as follows:</para>
        <screen>
-client# lfs setstripe /mnt/lustre 4M 0 -1
-client# dd if=/dev/zero of=/mnt/lustre/test_3 bs=10M count=100
-dd: writing '/mnt/lustre/test_3': No space left on device
+client# lfs setstripe /mnt/testfs 4M 0 -1
+client# dd if=/dev/zero of=/mnt/testfs/test_3 bs=10M count=100
+dd: writing '/mnt/testfs/test_3': No space left on device
  98+0 records in
  97+0 records out
  1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s
@@ -92,14 +92,14 @@ mds# lctl dl
  0 UP mgs MGS MGS 9 
  1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
  2 UP mdt MDS MDS_uuid 3
-3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
-4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
-5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
-6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
-7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
-8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
-9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
-10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
+3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
+4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 5
+5 UP osc testfs-OST0000-osc testfs-mdtlov_UUID 5
+6 UP osc testfs-OST0001-osc testfs-mdtlov_UUID 5
+7 UP osc testfs-OST0002-osc testfs-mdtlov_UUID 5
+8 UP osc testfs-OST0003-osc testfs-mdtlov_UUID 5
+9 UP osc testfs-OST0004-osc testfs-mdtlov_UUID 5
+10 UP osc testfs-OST0005-osc testfs-mdtlov_UUID 5
  </screen>
          </listitem>
          <listitem>
@@ -117,14 +117,14 @@ mds# lctl dl
  0 UP mgs MGS MGS 9
  1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
  2 UP mdt MDS MDS_uuid 3
-3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
-4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
-5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
-6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
-7 IN osc lustre-OST0002-osc lustre-mdtlov_UUID 5
-8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
-9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
-10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
+3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
+4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 5
+5 UP osc testfs-OST0000-osc testfs-mdtlov_UUID 5
+6 UP osc testfs-OST0001-osc testfs-mdtlov_UUID 5
+7 IN osc testfs-OST0002-osc testfs-mdtlov_UUID 5
+8 UP osc testfs-OST0003-osc testfs-mdtlov_UUID 5
+9 UP osc testfs-OST0004-osc testfs-mdtlov_UUID 5
+10 UP osc testfs-OST0005-osc testfs-mdtlov_UUID 5
  </screen>
          </listitem>
        </orderedlist>
@@ -148,12 +148,13 @@ mds# lctl dl
          <secondary>full OSTs</secondary>
        </indexterm>Migrating Data within a File System</title>
  
-         <para condition='l28'>Lustre software version 2.8 includes a
-      feature to migrate metadata between MDTs. This migration can only be
-      performed on whole directories. To migrate the contents of
-      <literal>/lustre/testremote</literal> from the current MDT to
-      MDT index 0, the sequence of commands is as follows:</para>
-      <screen>$ cd /lustre
+      <para condition='l28'>Lustre software version 2.8 includes a feature
+      to migrate metadata (directories and inodes therein) between MDTs.
+      This migration can only be performed on whole directories. For example,
+      to migrate the contents of the <literal>/testfs/testremote</literal>
+      directory from the MDT it currently resides on to MDT0000, the
+      sequence of commands is as follows:</para>
+      <screen>$ cd /testfs
  lfs getdirstripe -M ./testremote <lineannotation>which MDT is dir on?</lineannotation>
  1
  $ for i in $(seq 3); do touch ./testremote/${i}.txt; done <lineannotation>create test files</lineannotation>
@@ -169,16 +170,15 @@ $ for i in $(seq 3); do lfs getstripe -M ./testremote/${i}.txt; done <lineannota
  0
  0</screen>
        <para>For more information, see <literal>man lfs</literal></para>
-         <warning><para>Currently, only whole directories can be migrated
+      <warning><para>Currently, only whole directories can be migrated
        between MDTs. During migration each file receives a new identifier
-      (FID). As a consequence, the file receives a new inode number. File
-      system tools (for example, backup and archiving tools) may behave
-      incorrectly with files that are unchanged except for a new inode number.
+      (FID). As a consequence, the file receives a new inode number. Some
+      system tools (for example, backup and archiving tools) may consider
+      the migrated files to be new, even though the contents are unchanged.
        </para></warning>
-      <para>As stripes cannot be moved within the file system, data must be
-      migrated manually by copying and renaming the file, removing the original
-      file, and renaming the new file with the original file name. The simplest
-      way to do this is to use the
+      <para>If there is a need to migrate the file data from the current
+      OST(s) to new OSTs, the data must be migrated (copied) to the new
+      location.  The simplest way to do this is to use the
        <literal>lfs_migrate</literal> command (see
        <xref linkend="dbdoclet.50438206_42260" />). However, the steps for
        migrating a file by hand are also shown here for reference.</para>
@@ -186,25 +186,26 @@ $ for i in $(seq 3); do lfs getstripe -M ./testremote/${i}.txt; done <lineannota
          <listitem>
            <para>Identify the file(s) to be moved.</para>
            <para>In the example below, output from the
-          <literal>getstripe</literal> command indicates that the file
-          <literal>test_2</literal> is located entirely on OST2:</para>
+          <literal>lfs getstripe</literal> command below shows that the
+          <literal>test_2</literal>file is located entirely on OST0002:</para>
            <screen>
-client# lfs getstripe /mnt/lustre/test_2
-/mnt/lustre/test_2
+client# lfs getstripe /mnt/testfs/test_2
+/mnt/testfs/test_2
  obdidx     objid   objid   group
       2      8     0x8       0
  </screen>
          </listitem>
          <listitem>
-          <para>To move single object(s), create a new copy and remove the
-          original. Enter:</para>
+          <para>To move the data, create a copy and remove the original:</para>
            <screen>
-client# cp -a /mnt/lustre/test_2 /mnt/lustre/test_2.tmp
-client# mv /mnt/lustre/test_2.tmp /mnt/lustre/test_2
+client# cp -a /mnt/testfs/test_2 /mnt/testfs/test_2.tmp
+client# mv /mnt/testfs/test_2.tmp /mnt/testfs/test_2
  </screen>
          </listitem>
          <listitem>
-          <para>To migrate large files from one or more OSTs, enter:</para>
+          <para>If the space usage of OSTs is severely imbalanced, it is
+          possible to find and migrate large files from their current location
+          onto OSTs that have more space, one could run:</para>
            <screen>
  client# lfs find --ost 
  <replaceable>ost_name</replaceable> -size +1G | lfs_migrate -y
@@ -213,31 +214,31 @@ client# lfs find --ost
          <listitem>
            <para>Check the file system balance.</para>
            <para>The 
-          <literal>df</literal> output in the example below shows a more
+          <literal>lfs df</literal> output in the example below shows a more
            balanced system compared to the 
-          <literal>df</literal> output in the example in 
+          <literal>lfs df</literal> output in the example in 
            <xref linkend="dbdoclet.50438211_17536" />.</para>
            <screen>
  client# lfs df -h
  UUID                 bytes         Used            Available       Use%    \
          Mounted on
-lustre-MDT0000_UUID   4.4G         214.5M          3.9G            4%      \
-        /mnt/lustre[MDT:0]
-lustre-OST0000_UUID   2.0G         1.3G            598.1M          65%     \
-        /mnt/lustre[OST:0]
-lustre-OST0001_UUID   2.0G         1.3G            594.1M          65%     \
-        /mnt/lustre[OST:1]
-lustre-OST0002_UUID   2.0G         913.4M          1000.0M         45%     \
-        /mnt/lustre[OST:2]
-lustre-OST0003_UUID   2.0G         1.3G            602.1M          65%     \
-        /mnt/lustre[OST:3]
-lustre-OST0004_UUID   2.0G         1.3G            606.1M          64%     \
-        /mnt/lustre[OST:4]
-lustre-OST0005_UUID   2.0G         1.3G            610.1M          64%     \
-        /mnt/lustre[OST:5]
+testfs-MDT0000_UUID   4.4G         214.5M          3.9G            4%      \
+        /mnt/testfs[MDT:0]
+testfs-OST0000_UUID   2.0G         1.3G            598.1M          65%     \
+        /mnt/testfs[OST:0]
+testfs-OST0001_UUID   2.0G         1.3G            594.1M          65%     \
+        /mnt/testfs[OST:1]
+testfs-OST0002_UUID   2.0G         913.4M          1000.0M         45%     \
+        /mnt/testfs[OST:2]
+testfs-OST0003_UUID   2.0G         1.3G            602.1M          65%     \
+        /mnt/testfs[OST:3]
+testfs-OST0004_UUID   2.0G         1.3G            606.1M          64%     \
+        /mnt/testfs[OST:4]
+testfs-OST0005_UUID   2.0G         1.3G            610.1M          64%     \
+        /mnt/testfs[OST:5]
   
  filesystem summary:  11.8G 7.3G            3.9G    61%                     \
-/mnt/lustre
+/mnt/testfs
  </screen>
          </listitem>
        </orderedlist>
@@ -261,14 +262,14 @@ filesystem summary:  11.8G 7.3G            3.9G    61%                     \
    0 UP mgs MGS MGS 9
    1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-816dd1e813 5
    2 UP mdt MDS MDS_uuid 3
-  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
-  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
-  5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
-  6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
-  7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
-  8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
-  9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
- 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID
+  3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
+  4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 5
+  5 UP osc testfs-OST0000-osc testfs-mdtlov_UUID 5
+  6 UP osc testfs-OST0001-osc testfs-mdtlov_UUID 5
+  7 UP osc testfs-OST0002-osc testfs-mdtlov_UUID 5
+  8 UP osc testfs-OST0003-osc testfs-mdtlov_UUID 5
+  9 UP osc testfs-OST0004-osc testfs-mdtlov_UUID 5
+ 10 UP osc testfs-OST0005-osc testfs-mdtlov_UUID
  </screen>
      </section>
    </section>
@@ -397,12 +398,12 @@ mgs# lctl pool_add
        <literal>_UUID</literal> are missing, they are automatically added.</para>
        <para>For example, to add even-numbered OSTs to 
        <literal>pool1</literal> on file system 
-      <literal>lustre</literal>, run a single command (
+      <literal>testfs</literal>, run a single command (
        <literal>pool_add</literal>) to add many OSTs to the pool at one
        time:</para>
        <para>
          <screen>
-lctl pool_add lustre.pool1 OST[0-10/2]
+lctl pool_add testfs.pool1 OST[0-10/2]
  </screen>
        </para>
        <note>
@@ -509,9 +510,9 @@ client# lfs setstripe [--size|-s stripe_size] [--offset|-o start_ost]
        <listitem>
          <para>Add a new OST by passing on the following commands, run:</para>
          <screen>
-oss# mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda
-oss# mkdir -p /mnt/test/ost12
-oss# mount -t lustre /dev/sda /mnt/test/ost12
+oss# mkfs.lustre --fsname=testfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda
+oss# mkdir -p /mnt/testfs/ost12
+oss# mount -t lustre /dev/sda /mnt/testfs/ost12
  </screen>
        </listitem>
        <listitem>
@@ -653,11 +654,11 @@ $ lctl set_param osc.*.checksum_type=
          checksum algorithm is now in use.</para>
          <screen>
  $ lctl get_param osc.*.checksum_type
-osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32 [adler]
+osc.testfs-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32 [adler]
  $ lctl set_param osc.*.checksum_type=crc32
-osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32
+osc.testfs-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32
  $ lctl get_param osc.*.checksum_type
-osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=[crc32] adler
+osc.testfs-OST0000-osc-ffff81012b2c48e0.checksum_type=[crc32] adler
  </screen>
        </section>
      </section>
diff --git a/SettingUpLustreSystem.xml b/SettingUpLustreSystem.xml

index 6053a21..8c5b26c 100644 (file)
--- a/SettingUpLustreSystem.xml
+++ b/SettingUpLustreSystem.xml
@@ -52,7 +52,7 @@
          <para>Running the MDS and a client on the same machine can cause recovery and deadlock issues and impact the performance of other Lustre clients.</para>
        </listitem>
      </itemizedlist>
-       </warning>
+    </warning>
      <para>Only servers running on 64-bit CPUs are tested and supported. 64-bit CPU clients are
        typically used for testing to match expected customer usage and avoid limitations due to the 4
        GB limit for RAM size, 1 GB low-memory limitation, and 16 TB file size limit of 32-bit CPUs.
@@ -86,14 +86,36 @@
        <para>For maximum performance, the MDT should be configured as RAID1 with an internal journal and two disks from different controllers.</para>
        <para>If you need a larger MDT, create multiple RAID1 devices from pairs of disks, and then make a RAID0 array of the RAID1 devices. This ensures maximum reliability because multiple disk failures only have a small chance of hitting both disks in the same RAID1 device.</para>
        <para>Doing the opposite (RAID1 of a pair of RAID0 devices) has a 50% chance that even two disk failures can cause the loss of the whole MDT device. The first failure disables an entire half of the mirror and the second failure has a 50% chance of disabling the remaining mirror.</para>
-      <para condition='l24'>If multiple MDTs are going to be present in the system, each MDT should be specified for the anticipated usage and load.</para>
-      <warning condition='l24'><para>MDT0 contains the root of the Lustre file system. If MDT0 is unavailable for any reason, the
-          file system cannot be used.</para></warning>
-      <note condition='l24'><para>Additional MDTs can be dedicated to sub-directories off the root file system provided by MDT0.
-          Subsequent directories may also be configured to have their own MDT. If an MDT serving a
-          subdirectory becomes unavailable this subdirectory and all directories beneath it will
-          also become unavailable. Configuring multiple levels of MDTs is an experimental feature
-          for the Lustre software release 2.4.</para></note>
+      <para condition='l24'>If multiple MDTs are going to be present in the
+      system, each MDT should be specified for the anticipated usage and load.
+      For details on how to add additional MDTs to the filesystem, see
+      <xref linkend="dbdoclet.addingamdt"/>.</para>
+      <warning condition='l24'><para>MDT0 contains the root of the Lustre file
+      system. If MDT0 is unavailable for any reason, the file system cannot be
+      used.</para></warning>
+      <note condition='l24'><para>Using the DNE feature it is possible to
+      dedicate additional MDTs to sub-directories off the file system root
+      directory stored on MDT0, or arbitrarily for lower-level subdirectories.
+      using the <literal>lfs mkdir -i <replaceable>mdt_index</replaceable></literal> command.
+      If an MDT serving a subdirectory becomes unavailable, any subdirectories
+      on that MDT and all directories beneath it will also become inaccessible.
+      Configuring multiple levels of MDTs is an experimental feature for the
+      2.4 release, and is fully functional in the 2.8 release.  This is
+      typically useful for top-level directories to assign different users
+      or projects to separate MDTs, or to distribute other large working sets
+      of files to multiple MDTs.</para></note>
+      <note condition='l28'><para>Starting in the 2.8 release it is possible
+      to spread a single large directory across multiple MDTs using the DNE
+      striped directory feature by specifying multiple stripes (or shards)
+      at creation time using the
+      <literal>lfs mkdir -c <replaceable>stripe_count</replaceable></literal>
+      command, where <replaceable>stripe_count</replaceable> is often the
+      number of MDTs in the filesystem.  Striped directories should typically
+      not be used for all directories in the filesystem, since this incurs
+      extra overhead compared to non-striped directories, but is useful for
+      larger directories (over 50k entries) where many output files are being
+      created at one time.
+      </para></note>
      </section>
      <section remap="h3">
        <title><indexterm><primary>setup</primary><secondary>OST</secondary></indexterm>OST Storage Hardware Considerations</title>
@@ -159,24 +181,52 @@
            <primary>space</primary>
            <secondary>determining MDT requirements</secondary>
          </indexterm> Determining MDT Space Requirements</title>
-      <para>When calculating the MDT size, the important factor to consider is the number of files
-        to be stored in the file system. This determines the number of inodes needed, which drives
-        the MDT sizing. To be on the safe side, plan for 2 KB per inode on the MDT, which is the
-        default value. Attached storage required for Lustre file system metadata is typically 1-2
-        percent of the file system capacity depending upon file size.</para>
-      <para>For example, if the average file size is 5 MB and you have 100 TB of usable OST space, then you can calculate the minimum number of inodes as follows:</para>
+      <para>When calculating the MDT size, the important factor to consider
+      is the number of files to be stored in the file system. This determines
+      the number of inodes needed, which drives the MDT sizing. To be on the
+      safe side, plan for 2 KB per ldiskfs inode on the MDT, which is the
+      default value. Attached storage required for Lustre file system metadata
+      is typically 1-2 percent of the file system capacity depending upon
+      file size.</para>
+      <note condition='l24'><para>Starting in release 2.4, using the DNE
+      remote directory feature it is possible to increase the metadata
+      capacity of a single filesystem by configuting additional MDTs into
+      the filesystem, see <xref linkend="dbdoclet.addingamdt"/>.  In order
+      to start creating new files and directories on the new MDT(s) they
+      need to be attached into the namespace at one or more subdirectories
+      using the <literal>lfs mkdir</literal> command.</para></note>
+      <para>For example, if the average file size is 5 MB and you have
+      100 TB of usable OST space, then you can calculate the minimum number
+      of inodes as follows:</para>
        <informalexample>
          <para>(100 TB * 1024 GB/TB * 1024 MB/GB) / 5 MB/inode = 20 million inodes</para>
        </informalexample>
-      <para>We recommend that you use at least twice the minimum number of inodes to allow for future expansion and allow for an average file size smaller than expected. Thus, the required space is:</para>
+      <para>It is recommended that the MDT have at least twice the minimum
+      number of inodes to allow for future expansion and allow for an average
+      file size smaller than expected. Thus, the required space is:</para>
        <informalexample>
-        <para>2 KB/inode * 40 million inodes = 80 GB</para>
+        <para>2 KB/inode x 20 million inodes x 2 = 80 GB</para>
        </informalexample>
-      <para>If the average file size is small, 4 KB for example, the Lustre file system is not very
-        efficient as the MDT uses as much space as the OSTs. However, this is not a common
-        configuration for a Lustre environment.</para>
+      <para>If the average file size is small, 4 KB for example, the Lustre
+      file system is not very efficient as the MDT will use as much space
+      for each file as the space used on the OST. However, this is not a
+      common configuration for a Lustre environment.</para>
        <note>
-        <para>If the MDT is too small, this can cause all the space on the OSTs to be unusable. Be sure to determine the appropriate size of the MDT needed to support the file system before formatting the file system. It is difficult to increase the number of inodes after the file system is formatted.</para>
+        <para>If the MDT is too small, this can cause the space on the OSTs
+        to be inaccessible since no new files can be created. Be sure to
+        determine the appropriate size of the MDT needed to support the file
+        system before formatting the file system. It is possible to increase the
+        number of inodes after the file system is formatted, depending on the
+        storage.  For ldiskfs MDT filesystems the <literal>resize2fs</literal>
+        tool can be used if the underlying block device is on a LVM logical
+        volume.  For ZFS new (mirrored) VDEVs can be added to the MDT pool.
+        Inodes will be added approximately in proportion to space added.</para>
+      </note>
+      <note condition='l24'><para>It is also possible to increase the number
+        of inodes available, as well as increasing the aggregate metadata
+        performance, by adding additional MDTs using the DNE remote directory
+        feature available in Lustre release 2.4 and later, see
+        <xref linkend="dbdoclet.addingamdt"/>.</para>
        </note>
      </section>
      <section remap="h3">
@@ -523,11 +573,17 @@
                  <para> 10 million files (ldiskfs), 2^48 (ZFS)</para>
                </entry>
                <entry>
-                <para>The Lustre software uses the ldiskfs hashed directory code, which has a limit
-                  of about 10 million files depending on the length of the file name. The limit on
-                  subdirectories is the same as the limit on regular files.</para>
-                <para>Lustre file systems are tested with ten million files in a single
-                  directory.</para>
+                <para>The Lustre software uses the ldiskfs hashed directory
+                code, which has a limit of about 10 million files, depending
+                on the length of the file name. The limit on subdirectories
+                is the same as the limit on regular files.</para>
+                <note condition='l28'><para>Starting in the 2.8 release it is
+                possible to exceed this limit by striping a single directory
+                over multiple MDTs with the <literal>lfs mkdir -c</literal>
+                command, which increases the single directory limit by a
+                factor of the number of directory stripes used.</para></note>
+                <para>Lustre file systems are tested with ten million files
+                in a single directory.</para>
                </entry>
              </row>
              <row>
@@ -536,14 +592,24 @@
                </entry>
                <entry>
                  <para> 4 billion (ldiskfs), 256 trillion (ZFS)</para>
-                <para condition='l24'>4096 times the per-MDT limit</para>
-              </entry>
-              <entry>
-                <para>The ldiskfs file system imposes an upper limit of 4 billion inodes. By default, the MDS file system is formatted with 2KB of space per inode, meaning 1 billion inodes per file system of 2 TB.</para>
-                <para>This can be increased initially, at the time of MDS file system creation. For more information, see <xref linkend="settinguplustresystem"/>.</para>
-                               <para condition="l24">Each additional MDT can hold up to the above maximum number of additional files, depending
-                  on available space and the distribution directories and files in the file
-                  system.</para>
+                <para condition='l24'>up to 256 times the per-MDT limit</para>
+              </entry>
+              <entry>
+                <para>The ldiskfs filesystem imposes an upper limit of
+                4 billion inodes per filesystem. By default, the MDT
+                filesystem is formatted with one inode per 2KB of space,
+                meaning 512 million inodes per TB of MDT space. This can be
+                increased initially at the time of MDT filesystem creation.
+                For more information, see
+                <xref linkend="settinguplustresystem"/>.</para>
+                <para condition="l24">The ZFS filesystem
+                dynamically allocates inodes and does not have a fixed ratio
+                of inodes per unit of MDT space, but consumes approximately
+                4KB of space per inode, depending on the configuration.</para>
+                <para condition="l24">Each additional MDT can hold up to the
+                above maximum number of additional files, depending on
+                available space and the distribution directories and files
+                in the filesystem.</para>
                </entry>
              </row>
              <row>
@@ -554,7 +620,8 @@
                  <para> 255 bytes (filename)</para>
                </entry>
                <entry>
-                <para>This limit is 255 bytes for a single filename, the same as the limit in the underlying file systems.</para>
+                <para>This limit is 255 bytes for a single filename, the
+                same as the limit in the underlying filesystems.</para>
                </entry>
              </row>
              <row>
diff --git a/UnderstandingLustre.xml b/UnderstandingLustre.xml

index 5332990..a9d831f 100644 (file)
--- a/UnderstandingLustre.xml
+++ b/UnderstandingLustre.xml
@@ -133,7 +133,7 @@ xml:id="understandinglustre">
                  <para>
                    <emphasis>Aggregate:</emphasis>
                  </para>
-                <para>2.5 TB/sec I/O</para>
+                <para>10 TB/sec I/O</para>
                </entry>
                <entry>
                  <para>
@@ -187,7 +187,7 @@ xml:id="understandinglustre">
                  <para>
                    <emphasis>Single OSS:</emphasis>
                  </para>
-                <para>5 GB/sec</para>
+                <para>10 GB/sec</para>
                  <para>
                    <emphasis>Aggregate:</emphasis>
                  </para>
@@ -197,7 +197,7 @@ xml:id="understandinglustre">
                  <para>
                    <emphasis>Single OSS:</emphasis>
                  </para>
-                <para>2.0+ GB/sec</para>
+                <para>6.0+ GB/sec</para>
                  <para>
                    <emphasis>Aggregate:</emphasis>
                  </para>
@@ -489,7 +489,7 @@ xml:id="understandinglustre">
        <itemizedlist>
          <listitem>
            <para>
-          <emphasis role="bold">Metadata Server (MDS)</emphasis>- The MDS makes
+          <emphasis role="bold">Metadata Servers (MDS)</emphasis>- The MDS makes
            metadata stored in one or more MDTs available to Lustre clients. Each
            MDS manages the names and directories in the Lustre file system(s)
            and provides network request handling for one or more local
@@ -497,7 +497,7 @@ xml:id="understandinglustre">
          </listitem>
          <listitem>
            <para>
-          <emphasis role="bold">Metadata Target (MDT</emphasis>) - For Lustre
+          <emphasis role="bold">Metadata Targets (MDT</emphasis>) - For Lustre
            software release 2.3 and earlier, each file system has one MDT. The
            MDT stores metadata (such as filenames, directories, permissions and
            file layout) on storage attached to an MDS. Each file system has one
@@ -506,19 +506,14 @@ xml:id="understandinglustre">
            fails, a standby MDS can serve the MDT and make it available to
            clients. This is referred to as MDS failover.</para>
            <para condition="l24">Since Lustre software release 2.4, multiple
-          MDTs are supported. Each file system has at least one MDT. An MDT on
-          a shared storage target can be available via multiple MDSs, although
-          only one MDS can export the MDT to the clients at one time. Two MDS
-          machines share storage for two or more MDTs. After the failure of one
-          MDS, the remaining MDS begins serving the MDT(s) of the failed
-          MDS.</para>
-          <para condition="l28">Since Lustre software release 2.8,
-          multiple MDTs can be employed to share the inode records for files
-          contained in a single directory. A directory for which inode records
-          are distributed across multiple MDTs is known as a <emphasis>striped
-          directory</emphasis>. In the case of a Lustre filesystem the inode
-          records maybe also be referred to as the 'metadata' portion of the
-          file record.</para>
+          MDTs are supported in the Distributed Namespace Environment (DNE).
+          In addition to the primary MDT that holds the filesystem root, it
+          is possible to add additional MDS nodes, each with their own MDTs,
+          to hold sub-directory trees of the filesystem.</para>
+          <para condition="l28">Since Lustre software release 2.8, DNE also
+          allows the filesystem to distribute files of a single directory over
+          multiple MDT nodes. A directory which is distributed across multiple
+          MDTs is known as a <emphasis>striped directory</emphasis>.</para>
          </listitem>
          <listitem>
            <para>
@@ -556,6 +551,13 @@ xml:id="understandinglustre">
        Several clients can write to different parts of the same file
        simultaneously, while, at the same time, other clients can read from the
        file.</para>
+      <para>A logical metadata volume (LMV) aggregates the MDCs to provide
+      transparent access across all the MDTs in a similar manner as the LOV
+      does for file access.  This allows the client to see the directory tree
+      on multiple MDTs as a single coherent namespace, and striped directories
+      are merged on the clients to form a single visible directory to users
+      and applications.
+      </para>
        <para>
        <xref linkend="understandinglustre.tab.storagerequire" />provides the
        requirements for attached storage for each Lustre file system component
@@ -613,11 +615,11 @@ xml:id="understandinglustre">
                  </para>
                </entry>
                <entry>
-                <para>1-16 TB per OST, 1-8 OSTs per OSS</para>
+                <para>1-128 TB per OST, 1-8 OSTs per OSS</para>
                </entry>
                <entry>
                  <para>Good bus bandwidth. Recommended that storage be balanced
-                evenly across OSSs.</para>
+                evenly across OSSs and matched to network bandwidth.</para>
                </entry>
              </row>
              <row>
@@ -627,7 +629,7 @@ xml:id="understandinglustre">
                  </para>
                </entry>
                <entry>
-                <para>None</para>
+                <para>No local storage needed</para>
                </entry>
                <entry>
                  <para>Low latency, high bandwidth network.</para>
@@ -700,7 +702,7 @@ xml:id="understandinglustre">
      MDTs). This change enabled future support for multiple MDTs (introduced in
      Lustre software release 2.4) and ZFS (introduced in Lustre software release
      2.4).</para>
-    <para>Also introduced in release 2.0 is a feature call
+    <para>Also introduced in release 2.0 is an ldiskfs feature named
      <emphasis role="italic">FID-in-dirent</emphasis>(also known as
      <emphasis role="italic">dirdata</emphasis>) in which the FID is stored as
      part of the name of the file in the parent directory. This feature
@@ -708,13 +710,12 @@ xml:id="understandinglustre">
      <literal>ls</literal> command executions by reducing disk I/O. The
      FID-in-dirent is generated at the time the file is created.</para>
      <note>
-      <para>The FID-in-dirent feature is not compatible with the Lustre
-      software release 1.8 format. Therefore, when an upgrade from Lustre
-      software release 1.8 to a Lustre software release 2.x is performed, the
-      FID-in-dirent feature is not automatically enabled. For upgrades from
-      Lustre software release 1.8 to Lustre software releases 2.0 through 2.3,
-      FID-in-dirent can be enabled manually but only takes effect for new
-      files.</para>
+      <para>The FID-in-dirent feature is not backward compatible with the
+      release 1.8 ldiskfs disk format. Therefore, when an upgrade from
+      release 1.8 to release 2.x is performed, the FID-in-dirent feature is
+      not automatically enabled. For upgrades from release 1.8 to releases
+      2.0 through 2.3, FID-in-dirent can be enabled manually but only takes
+      effect for new files.</para>
        <para>For more information about upgrading from Lustre software release
        1.8 and enabling FID-in-dirent for existing files, see
        <xref xmlns:xlink="http://www.w3.org/1999/xlink"
@@ -739,8 +740,8 @@ xml:id="understandinglustre">
          if it is invalid or missing. The
          <emphasis role="italic">linkEA</emphasis>consists of the file name and
          parent FID. It is stored as an extended attribute in the file
-        itself. Thus, the linkEA can be used to reconstruct the full path name of
-        a file.</para>
+        itself. Thus, the linkEA can be used to reconstruct the full path name
+       of a file.</para>
        </listitem>
      </itemizedlist></para>
      <para>Information about where file data is located on the OST(s) is stored
author	Andreas Dilger <andreas.dilger@intel.com>
	Mon, 28 Mar 2016 21:35:16 +0000 (15:35 -0600)
committer	Richard Henwood <richard.henwood@intel.com>
	Wed, 30 Mar 2016 16:15:18 +0000 (16:15 +0000)
BackupAndRestore.xml		patch \| blob \| history
Glossary.xml		patch \| blob \| history
LustreMaintenance.xml		patch \| blob \| history
LustreOperations.xml		patch \| blob \| history
LustreRecovery.xml		patch \| blob \| history
ManagingFileSystemIO.xml		patch \| blob \| history
SettingUpLustreSystem.xml		patch \| blob \| history
UnderstandingLustre.xml		patch \| blob \| history