X-Git-Url: https://git.whamcloud.com/?a=blobdiff_plain;f=UnderstandingLustre.xml;h=ffde1f780da0e3a3955aa76b1675f88ea81fdb03;hb=fcafa5bebef80213b4a1822edd83edc26894eccb;hp=dd95bc5789d3385a1e7ddcf9c7b05d4493f0fb19;hpb=09da8e9464945525cc66087da4446e4ba9958564;p=doc%2Fmanual.git

diff --git a/UnderstandingLustre.xml b/UnderstandingLustre.xml
index dd95bc5..ffde1f7 100644
--- a/UnderstandingLustre.xml
+++ b/UnderstandingLustre.xml
@@ -1,87 +1,107 @@
-<?xml version='1.0' encoding='UTF-8'?>
-<chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0"
-  xml:lang="en-US" xml:id="understandinglustre">
-  <title xml:id="understandinglustre.title">Understanding  Lustre Architecture</title>
-  <info/>
-  <para>This chapter describes the Lustre architecture and features of Lustre. It includes the
-    following sections:</para>
+<?xml version='1.0' encoding='utf-8'?>
+<chapter xmlns="http://docbook.org/ns/docbook"
+xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+xml:id="understandinglustre">
+  <title xml:id="understandinglustre.title">Understanding Lustre
+  Architecture</title>
+  <para>This chapter describes the Lustre architecture and features of the
+  Lustre file system. It includes the following sections:</para>
   <itemizedlist>
     <listitem>
       <para>
-        <xref linkend="understandinglustre.whatislustre"/>
+        <xref linkend="understandinglustre.whatislustre" />
       </para>
     </listitem>
     <listitem>
       <para>
-        <xref linkend="understandinglustre.components"/>
+        <xref linkend="understandinglustre.components" />
       </para>
     </listitem>
     <listitem>
       <para>
-        <xref linkend="understandinglustre.storageio"/>
+        <xref linkend="understandinglustre.storageio" />
       </para>
     </listitem>
   </itemizedlist>
   <section xml:id="understandinglustre.whatislustre">
-    <title><indexterm>
-        <primary>Lustre</primary>
-      </indexterm>What a Lustre File System Is (and What It Isn&apos;t)</title>
-    <para>The Lustre architecture is a storage architecture for clusters. The central component of
-      the Lustre architecture is the Lustre file system, which is supported on the Linux operating
-      system and provides a POSIX-compliant UNIX file system interface.</para>
-    <para>The Lustre storage architecture is used for many different kinds of clusters. It is best
-      known for powering many of the largest high-performance computing (HPC) clusters worldwide,
-      with tens of thousands of client systems, petabytes (PB) of storage and hundreds of gigabytes
-      per second (GB/sec) of I/O throughput. Many HPC sites use a Lustre file system as a site-wide
-      global file system, serving dozens of clusters.</para>
-    <para>The ability of a Lustre file system to scale capacity and performance for any need reduces
-      the need to deploy many separate file systems, such as one for each compute cluster. Storage
-      management is simplified by avoiding the need to copy data between compute clusters. In
-      addition to aggregating storage capacity of many servers, the I/O throughput is also
-      aggregated and scales with additional servers. Moreover, throughput and/or capacity can be
-      easily increased by adding servers dynamically.</para>
-    <para>While a Lustre file system can function in many work environments, it is not necessarily
-      the best choice for all applications. It is best suited for uses that exceed the capacity that
-      a single server can provide, though in some use cases, a Lustre file system can perform better
-      with a single server than other file systems due to its strong locking and data
-      coherency.</para>
+    <title>
+    <indexterm>
+      <primary>Lustre</primary>
+    </indexterm>What a Lustre File System Is (and What It Isn't)</title>
+    <para>The Lustre architecture is a storage architecture for clusters. The
+    central component of the Lustre architecture is the Lustre file system,
+    which is supported on the Linux operating system and provides a POSIX
+    <superscript>*</superscript>standard-compliant UNIX file system
+    interface.</para>
+    <para>The Lustre storage architecture is used for many different kinds of
+    clusters. It is best known for powering many of the largest
+    high-performance computing (HPC) clusters worldwide, with tens of thousands
+    of client systems, petabytes (PiB) of storage and hundreds of gigabytes per
+    second (GB/sec) of I/O throughput. Many HPC sites use a Lustre file system
+    as a site-wide global file system, serving dozens of clusters.</para>
+    <para>The ability of a Lustre file system to scale capacity and performance
+    for any need reduces the need to deploy many separate file systems, such as
+    one for each compute cluster. Storage management is simplified by avoiding
+    the need to copy data between compute clusters. In addition to aggregating
+    storage capacity of many servers, the I/O throughput is also aggregated and
+    scales with additional servers. Moreover, throughput and/or capacity can be
+    easily increased by adding servers dynamically.</para>
+    <para>While a Lustre file system can function in many work environments, it
+    is not necessarily the best choice for all applications. It is best suited
+    for uses that exceed the capacity that a single server can provide, though
+    in some use cases, a Lustre file system can perform better with a single
+    server than other file systems due to its strong locking and data
+    coherency.</para>
     <para>A Lustre file system is currently not particularly well suited for
-      &quot;peer-to-peer&quot; usage models where clients and servers are running on the same node,
-      each sharing a small amount of storage, due to the lack of Lustre-level data replication. In
-      such uses, if one client/server fails, then the data stored on that node will not be
-      accessible until the node is restarted.</para>
+    "peer-to-peer" usage models where clients and servers are running on the
+    same node, each sharing a small amount of storage, due to the lack of data
+    replication at the Lustre software level. In such uses, if one
+    client/server fails, then the data stored on that node will not be
+    accessible until the node is restarted.</para>
     <section remap="h3">
-      <title><indexterm>
-          <primary>Lustre</primary>
-          <secondary>features</secondary>
-        </indexterm>Lustre Features</title>
-      <para>Lustre file systems run on a variety of vendor&apos;s kernels. For more details, see the
-          <link xl:href="http://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix">Lustre Support
-          Matrix</link> on the Intel Lustre community wiki.</para>
-      <para>A Lustre installation can be scaled up or down with respect to the number of client
-        nodes, disk storage and bandwidth. Scalability and performance are dependent on available
-        disk and network bandwidth and the processing power of the servers in the system. A Lustre
-        file system can be deployed in a wide variety of configurations that can be scaled well
-        beyond the size and performance observed in production systems to date.</para>
-      <para><xref linkend="understandinglustre.tab1"/> shows the practical range of scalability and
-        performance characteristics of a Lustre file system and some test results in production
-        systems.</para>
-      <table frame="all">
-        <title xml:id="understandinglustre.tab1">Lustre Scalability and Performance</title>
+      <title>
+      <indexterm>
+        <primary>Lustre</primary>
+        <secondary>features</secondary>
+      </indexterm>Lustre Features</title>
+      <para>Lustre file systems run on a variety of vendor's kernels. For more
+      details, see the Lustre Test Matrix
+      <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+      linkend="dbdoclet.50438261_99193" />.</para>
+      <para>A Lustre installation can be scaled up or down with respect to the
+      number of client nodes, disk storage and bandwidth. Scalability and
+      performance are dependent on available disk and network bandwidth and the
+      processing power of the servers in the system. A Lustre file system can
+      be deployed in a wide variety of configurations that can be scaled well
+      beyond the size and performance observed in production systems to
+      date.</para>
+      <para>
+      <xref linkend="understandinglustre.tab1" /> shows some of the
+      scalability and performance characteristics of a Lustre file system.
+      For a full list of Lustre file and filesystem limits see
+      <xref linkend="settinguplustresystem.tab2"/>.</para>
+      <table frame="all" xml:id="understandinglustre.tab1">
+        <title>Lustre File System Scalability and Performance</title>
         <tgroup cols="3">
-          <colspec colname="c1" colwidth="1*"/>
-          <colspec colname="c2" colwidth="2*"/>
-          <colspec colname="c3" colwidth="3*"/>
+          <colspec colname="c1" colwidth="1*" />
+          <colspec colname="c2" colwidth="2*" />
+          <colspec colname="c3" colwidth="3*" />
           <thead>
             <row>
               <entry>
-                <para><emphasis role="bold">Feature</emphasis></para>
+                <para>
+                  <emphasis role="bold">Feature</emphasis>
+                </para>
               </entry>
               <entry>
-                <para><emphasis role="bold">Current Practical Range</emphasis></para>
+                <para>
+                  <emphasis role="bold">Current Practical Range</emphasis>
+                </para>
               </entry>
               <entry>
-                <para><emphasis role="bold">Tested in Production</emphasis></para>
+                <para>
+                  <emphasis role="bold">Known Production Usage</emphasis>
+                </para>
               </entry>
             </row>
           </thead>
@@ -89,360 +109,498 @@
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">Client Scalability</emphasis></para>
+                  <emphasis role="bold">Client Scalability</emphasis>
+                </para>
               </entry>
               <entry>
-                <para> 100-100000</para>
+                <para>100-100000</para>
               </entry>
               <entry>
-                <para> 50000+ clients, many in the 10000 to 20000 range</para>
+                <para>50000+ clients, many in the 10000 to 20000 range</para>
               </entry>
             </row>
             <row>
               <entry>
-                <para><emphasis role="bold">Client Performance</emphasis></para>
+                <para>
+                  <emphasis role="bold">Client Performance</emphasis>
+                </para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single client: </emphasis></para>
+                  <emphasis>Single client:</emphasis>
+                </para>
                 <para>I/O 90% of network bandwidth</para>
-                <para><emphasis>Aggregate:</emphasis></para>
-                <para>2.5 TB/sec I/O</para>
+                <para>
+                  <emphasis>Aggregate:</emphasis>
+                </para>
+                <para>10 TB/sec I/O</para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single client: </emphasis></para>
-                <para>2 GB/sec I/O, 1000 metadata ops/sec</para>
-                <para><emphasis>Aggregate:</emphasis></para>
-                <para>240 GB/sec I/O </para>
+                  <emphasis>Single client:</emphasis>
+                </para>
+                <para>4.5 GB/sec I/O (FDR IB, OPA1),
+		1000 metadata ops/sec</para>
+                <para>
+                  <emphasis>Aggregate:</emphasis>
+                </para>
+                <para>2.5 TB/sec I/O </para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">OSS Scalability</emphasis></para>
+                  <emphasis role="bold">OSS Scalability</emphasis>
+                </para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single OSS:</emphasis></para>
-                <para>1-32 OSTs per OSS,</para>
-                <para>128TB per OST</para>
+                  <emphasis>Single OSS:</emphasis>
+                </para>
+                <para>1-32 OSTs per OSS</para>
                 <para>
-                  <emphasis>OSS count:</emphasis></para>
-                <para>500 OSSs, with up to 4000 OSTs</para>
+                  <emphasis>Single OST:</emphasis>
+                </para>
+                <para>300M objects, 256TiB per OST (ldiskfs)</para>
+                <para>500M objects, 256TiB per OST (ZFS)</para>
+                <para>
+                  <emphasis>OSS count:</emphasis>
+                </para>
+                <para>1000 OSSs, with up to 4000 OSTs</para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single OSS:</emphasis></para>
-                <para>8 OSTs per OSS,</para>
-                <para>16TB per OST</para>
+                  <emphasis>Single OSS:</emphasis>
+                </para>
+                <para>32x 8TiB OSTs per OSS (ldiskfs),</para>
+                <para>8x 32TiB OSTs per OSS (ldiskfs)</para>
+                <para>1x 72TiB OST per OSS (ZFS)</para>
                 <para>
-                  <emphasis>OSS count:</emphasis></para>
-                <para>450 OSSs with 1000 4TB OSTs</para>
-                <para>192 OSSs with 1344 8TB OSTs</para>
+                  <emphasis>OSS count:</emphasis>
+                </para>
+                <para>450 OSSs with 1000 4TiB OSTs</para>
+                <para>192 OSSs with 1344 8TiB OSTs</para>
+                <para>768 OSSs with 768 72TiB OSTs</para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">OSS Performance</emphasis></para>
+                  <emphasis role="bold">OSS Performance</emphasis>
+                </para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single OSS:</emphasis></para>
-                <para> 5 GB/sec</para>
+                  <emphasis>Single OSS:</emphasis>
+                </para>
+                <para>15 GB/sec</para>
                 <para>
-                  <emphasis>Aggregate:</emphasis></para>
-                <para> 2.5 TB/sec</para>
+                  <emphasis>Aggregate:</emphasis>
+                </para>
+                <para>10 TB/sec</para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single OSS:</emphasis></para>
-                <para> 2.0+ GB/sec</para>
+                  <emphasis>Single OSS:</emphasis>
+                </para>
+                <para>10 GB/sec</para>
                 <para>
-                  <emphasis>Aggregate:</emphasis></para>
-                <para> 240 GB/sec</para>
+                  <emphasis>Aggregate:</emphasis>
+                </para>
+                <para>2.5 TB/sec</para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">MDS Scalability</emphasis></para>
+                  <emphasis role="bold">MDS Scalability</emphasis>
+                </para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single MDS:</emphasis></para>
-                <para> 4 billion files</para>
+                  <emphasis>Single MDS:</emphasis>
+                </para>
+		<para>1-4 MDTs per MDS</para>
+                <para>
+                  <emphasis>Single MDT:</emphasis>
+                </para>
+                <para>4 billion files, 8TiB per MDT (ldiskfs)</para>
+		<para>64 billion files, 64TiB per MDT (ZFS)</para>
                 <para>
-                  <emphasis>MDS count:</emphasis></para>
-                <para> 1 primary + 1 backup</para>
-                <para condition="l24">Since Lustre* Release 2.4: up to 4096 MDSs and up to 4096
-                  MDTs.</para>
+                  <emphasis>MDS count:</emphasis>
+                </para>
+                <para>1 primary + 1 standby</para>
+                <para condition="l24">256 MDSs, with up to 256 MDTs</para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single MDS:</emphasis></para>
-                <para> 750 million files</para>
+                  <emphasis>Single MDS:</emphasis>
+                </para>
+                <para>3 billion files</para>
                 <para>
-                  <emphasis>MDS count:</emphasis></para>
-                <para> 1 primary + 1 backup</para>
+                  <emphasis>MDS count:</emphasis>
+                </para>
+                <para>7 MDS with 7 2TiB MDTs in production</para>
+                <para>256 MDS with 256 64GiB MDTs in testing</para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">MDS Performance</emphasis></para>
+                  <emphasis role="bold">MDS Performance</emphasis>
+                </para>
               </entry>
               <entry>
-                <para> 35000/s create operations,</para>
-                <para> 100000/s metadata stat operations</para>
+                <para>50000/s create operations,</para>
+                <para>200000/s metadata stat operations</para>
               </entry>
               <entry>
-                <para> 15000/s create operations,</para>
-                <para> 35000/s metadata stat operations</para>
+                <para>15000/s create operations,</para>
+                <para>50000/s metadata stat operations</para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">File system Scalability</emphasis></para>
+                  <emphasis role="bold">File system Scalability</emphasis>
+                </para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single File:</emphasis></para>
-                <para>2.5 PB max file size</para>
+                  <emphasis>Single File:</emphasis>
+                </para>
+                <para>32 PiB max file size (ldiskfs)</para>
+		<para>2^63 bytes (ZFS)</para>
                 <para>
-                  <emphasis>Aggregate:</emphasis></para>
-                <para>512 PB space, 4 billion files</para>
+                  <emphasis>Aggregate:</emphasis>
+                </para>
+                <para>512 PiB space, 1 trillion files</para>
               </entry>
               <entry>
                 <para>
-                  <emphasis>Single File:</emphasis></para>
-                <para>multi-TB max file size</para>
+                  <emphasis>Single File:</emphasis>
+                </para>
+                <para>multi-TiB max file size</para>
                 <para>
-                  <emphasis>Aggregate:</emphasis></para>
-                <para>10 PB space, 750 million files</para>
+                  <emphasis>Aggregate:</emphasis>
+                </para>
+                <para>55 PiB space, 8 billion files</para>
               </entry>
             </row>
           </tbody>
         </tgroup>
       </table>
-      <para>Other Lustre features are:</para>
+      <para>Other Lustre software features are:</para>
       <itemizedlist>
         <listitem>
-          <para><emphasis role="bold">Performance-enhanced ext4 file system:</emphasis> The Lustre
-            file system uses an improved version of the ext4 journaling file system to store data
-            and metadata. This version, called <emphasis role="italic"
-              ><literal>ldiskfs</literal></emphasis>, has been enhanced to improve performance and
-            provide additional functionality needed by the Lustre file system.</para>
+          <para>
+          <emphasis role="bold">Performance-enhanced ext4 file
+          system:</emphasis>The Lustre file system uses an improved version of
+          the ext4 journaling file system to store data and metadata. This
+          version, called
+          <emphasis role="italic">
+            <literal>ldiskfs</literal>
+          </emphasis>, has been enhanced to improve performance and provide
+          additional functionality needed by the Lustre file system.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">POSIX* compliance:</emphasis> The full POSIX test suite passes
-            in an identical manner to a local ext4 filesystem, with limited exceptions on Lustre
-            clients. In a cluster, most operations are atomic so that clients never see stale data
-            or metadata. The Lustre software supports mmap() file I/O.</para>
+          <para condition="l24">With the Lustre software release 2.4 and later,
+          it is also possible to use ZFS as the backing filesystem for Lustre
+          for the MDT, OST, and MGS storage. This allows Lustre to leverage the
+          scalability and data integrity features of ZFS for individual storage
+          targets.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">High-performance heterogeneous networking:</emphasis> The
-            Lustre software supports a variety of high performance, low latency networks and permits
-            Remote Direct Memory Access (RDMA) for Infiniband* (OFED) and other advanced networks
-            for fast and efficient network transport. Multiple RDMA networks can be bridged using
-            Lustre routing for maximum performance. The Lustre software also includes integrated
-            network diagnostics.</para>
+          <para>
+          <emphasis role="bold">POSIX standard compliance:</emphasis>The full
+          POSIX test suite passes in an identical manner to a local ext4 file
+          system, with limited exceptions on Lustre clients. In a cluster, most
+          operations are atomic so that clients never see stale data or
+          metadata. The Lustre software supports mmap() file I/O.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">High-availability:</emphasis> The Lustre file system supports
-            active/active failover using shared storage partitions for OSS targets (OSTs). Lustre
-            Release 2.3 and earlier releases offer active/passive failover using a shared storage
-            partition for the MDS target (MDT).</para>
-          <para condition="l24">With Lustre Release 2.4 or later servers and clients it is possible
-            to configure active/active failover of multiple MDTs. This allows application
-            transparent recovery. The Lustre file system can work with a variety of high
-            availability (HA) managers to allow automated failover and has no single point of
-            failure (NSPF). Multiple mount protection (MMP) provides integrated protection from
-            errors in highly-available systems that would otherwise cause file system
-            corruption.</para>
+          <para>
+          <emphasis role="bold">High-performance heterogeneous
+          networking:</emphasis>The Lustre software supports a variety of high
+          performance, low latency networks and permits Remote Direct Memory
+          Access (RDMA) for InfiniBand
+          <superscript>*</superscript>(utilizing OpenFabrics Enterprise
+          Distribution (OFED<superscript>*</superscript>), Intel OmniPathÂ®,
+	  and other advanced networks for fast
+          and efficient network transport. Multiple RDMA networks can be
+          bridged using Lustre routing for maximum performance. The Lustre
+          software also includes integrated network diagnostics.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Security:</emphasis> By default TCP connections are only
-            allowed from privileged ports. UNIX group membership is verified on the MDS.</para>
+          <para>
+          <emphasis role="bold">High-availability:</emphasis>The Lustre file
+          system supports active/active failover using shared storage
+          partitions for OSS targets (OSTs). Lustre software release 2.3 and
+          earlier releases offer active/passive failover using a shared storage
+          partition for the MDS target (MDT). The Lustre file system can work
+          with a variety of high availability (HA) managers to allow automated
+          failover and has no single point of failure (NSPF). This allows
+          application transparent recovery. Multiple mount protection (MMP)
+          provides integrated protection from errors in highly-available
+          systems that would otherwise cause file system corruption.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Access control list (ACL), extended attributes:</emphasis> the
-            Lustre security model follows that of a UNIX file system, enhanced with POSIX ACLs.
-            Noteworthy additional features include root squash.</para>
+          <para condition="l24">With Lustre software release 2.4 or later
+          servers and clients it is possible to configure active/active
+          failover of multiple MDTs. This allows scaling the metadata
+          performance of Lustre filesystems with the addition of MDT storage
+          devices and MDS nodes.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Interoperability:</emphasis> The Lustre file system runs on a
-            variety of CPU architectures and mixed-endian clusters and is interoperable between
-            successive major Lustre software releases.</para>
+          <para>
+          <emphasis role="bold">Security:</emphasis>By default TCP connections
+          are only allowed from privileged ports. UNIX group membership is
+          verified on the MDS.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Object-based architecture:</emphasis> Clients are isolated
-            from the on-disk file structure enabling upgrading of the storage architecture without
-            affecting the client.</para>
+          <para>
+          <emphasis role="bold">Access control list (ACL), extended
+          attributes:</emphasis>the Lustre security model follows that of a
+          UNIX file system, enhanced with POSIX ACLs. Noteworthy additional
+          features include root squash.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Byte-granular file and fine-grained metadata
-              locking:</emphasis> Many clients can read and modify the same file or directory
-            concurrently. The Lustre distributed lock manager (LDLM) ensures that files are coherent
-            between all clients and servers in the file system. The MDT LDLM manages locks on inode
-            permissions and pathnames. Each OST has its own LDLM for locks on file stripes stored
-            thereon, which scales the locking performance as the file system grows.</para>
+          <para>
+          <emphasis role="bold">Interoperability:</emphasis>The Lustre file
+          system runs on a variety of CPU architectures and mixed-endian
+          clusters and is interoperable between successive major Lustre
+          software releases.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Quotas:</emphasis> User and group quotas are available for a
-            Lustre file system.</para>
+          <para>
+          <emphasis role="bold">Object-based architecture:</emphasis>Clients
+          are isolated from the on-disk file structure enabling upgrading of
+          the storage architecture without affecting the client.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Capacity growth:</emphasis> The size of a Lustre file system
-            and aggregate cluster bandwidth can be increased without interruption by adding a new
-            OSS with OSTs to the cluster.</para>
+          <para>
+          <emphasis role="bold">Byte-granular file and fine-grained metadata
+          locking:</emphasis>Many clients can read and modify the same file or
+          directory concurrently. The Lustre distributed lock manager (LDLM)
+          ensures that files are coherent between all clients and servers in
+          the file system. The MDT LDLM manages locks on inode permissions and
+          pathnames. Each OST has its own LDLM for locks on file stripes stored
+          thereon, which scales the locking performance as the file system
+          grows.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Controlled striping:</emphasis> The layout of files across
-            OSTs can be configured on a per file, per directory, or per file system basis. This
-            allows file I/O to be tuned to specific application requirements within a single file
-            system. The Lustre file system uses RAID-0 striping and balances space usage across
-            OSTs.</para>
+          <para>
+          <emphasis role="bold">Quotas:</emphasis>User and group quotas are
+          available for a Lustre file system.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Network data integrity protection:</emphasis> A checksum of
-            all data sent from the client to the OSS protects against corruption during data
-            transfer.</para>
+          <para>
+          <emphasis role="bold">Capacity growth:</emphasis>The size of a Lustre
+          file system and aggregate cluster bandwidth can be increased without
+          interruption by adding new OSTs and MDTs to the cluster.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">MPI I/O:</emphasis> The Lustre architecture has a dedicated
-            MPI ADIO layer that optimizes parallel I/O to match the underlying file system
-            architecture.</para>
+          <para>
+          <emphasis role="bold">Controlled file layout:</emphasis>The layout of
+          files across OSTs can be configured on a per file, per directory, or
+          per file system basis. This allows file I/O to be tuned to specific
+          application requirements within a single file system. The Lustre file
+          system uses RAID-0 striping and balances space usage across
+          OSTs.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">NFS and CIFS export:</emphasis>  Lustre files can be re-exported using NFS (via Linux knfsd) or CIFS (via Samba) enabling them to be shared with non-Linux clients, such as Microsoft* Windows* and Apple* Mac OS X*.</para>
+          <para>
+          <emphasis role="bold">Network data integrity protection:</emphasis>A
+          checksum of all data sent from the client to the OSS protects against
+          corruption during data transfer.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Disaster recovery tool:</emphasis> The Lustre file system
-            provides a distributed file system check (lfsck) that can restore consistency between
-            storage components in case of a major file system error. A Lustre file system can
-            operate even in the presence of file system inconsistencies, so lfsck is not required
-            before returning the file system to production.</para>
+          <para>
+          <emphasis role="bold">MPI I/O:</emphasis>The Lustre architecture has
+          a dedicated MPI ADIO layer that optimizes parallel I/O to match the
+          underlying file system architecture.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Performance monitoring:</emphasis> The Lustre file system
-            offers a variety of mechanisms to examine performance and tuning.</para>
+          <para>
+          <emphasis role="bold">NFS and CIFS export:</emphasis>Lustre files can
+          be re-exported using NFS (via Linux knfsd or Ganesha) or CIFS (via
+	  Samba), enabling them to be shared with non-Linux clients such as
+	  Microsoft<superscript>*</superscript>Windows,
+          <superscript>*</superscript>Apple
+          <superscript>*</superscript>Mac OS X
+          <superscript>*</superscript>, and others.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Open source:</emphasis> The Lustre software is licensed under
-            the GPL 2.0 license for use with Linux.</para>
+          <para>
+          <emphasis role="bold">Disaster recovery tool:</emphasis>The Lustre
+          file system provides an online distributed file system check (LFSCK)
+          that can restore consistency between storage components in case of a
+          major file system error. A Lustre file system can operate even in the
+          presence of file system inconsistencies, and LFSCK can run while the
+          filesystem is in use, so LFSCK is not required to complete before
+          returning the file system to production.</para>
+        </listitem>
+        <listitem>
+          <para>
+          <emphasis role="bold">Performance monitoring:</emphasis>The Lustre
+          file system offers a variety of mechanisms to examine performance and
+          tuning.</para>
+        </listitem>
+        <listitem>
+          <para>
+          <emphasis role="bold">Open source:</emphasis>The Lustre software is
+          licensed under the GPL 2.0 license for use with the Linux operating
+          system.</para>
         </listitem>
       </itemizedlist>
     </section>
   </section>
   <section xml:id="understandinglustre.components">
-    <title><indexterm>
-        <primary>Lustre</primary>
-        <secondary>components</secondary>
-      </indexterm>Lustre Components</title>
-    <para>An installation of the Lustre software includes a management server (MGS) and one or more
-      Lustre file systems interconnected with Lustre networking (LNET).</para>
-    <para>A basic configuration of Lustre components is shown in <xref
-        linkend="understandinglustre.fig.cluster"/>.</para>
-    <figure>
-      <title xml:id="understandinglustre.fig.cluster">Lustre* components in a basic cluster </title>
+    <title>
+    <indexterm>
+      <primary>Lustre</primary>
+      <secondary>components</secondary>
+    </indexterm>Lustre Components</title>
+    <para>An installation of the Lustre software includes a management server
+    (MGS) and one or more Lustre file systems interconnected with Lustre
+    networking (LNet).</para>
+    <para>A basic configuration of Lustre file system components is shown in
+    <xref linkend="understandinglustre.fig.cluster" />.</para>
+    <figure xml:id="understandinglustre.fig.cluster">
+      <title>Lustre file system components in a basic cluster</title>
       <mediaobject>
         <imageobject>
-          <imagedata scalefit="1" width="100%" fileref="./figures/Basic_Cluster.png"/>
+          <imagedata scalefit="1" width="100%"
+          fileref="./figures/Basic_Cluster.png" />
         </imageobject>
         <textobject>
-          <phrase> Lustre* components in a basic cluster </phrase>
+          <phrase>Lustre file system components in a basic cluster</phrase>
         </textobject>
       </mediaobject>
     </figure>
     <section remap="h3">
-      <title><indexterm>
-          <primary>Lustre</primary>
-          <secondary>MGS</secondary>
-        </indexterm>Management Server (MGS)</title>
-      <para>The MGS stores configuration information for all the Lustre file systems in a cluster
-        and provides this information to other Lustre components. Each Lustre target contacts the
-        MGS to provide information, and Lustre clients contact the MGS to retrieve
-        information.</para>
-      <para>It is preferable that the MGS have its own storage space so that it can be managed
-        independently. However, the MGS can be co-located and share storage space with an MDS as
-        shown in <xref linkend="understandinglustre.fig.cluster"/>.</para>
+      <title>
+      <indexterm>
+        <primary>Lustre</primary>
+        <secondary>MGS</secondary>
+      </indexterm>Management Server (MGS)</title>
+      <para>The MGS stores configuration information for all the Lustre file
+      systems in a cluster and provides this information to other Lustre
+      components. Each Lustre target contacts the MGS to provide information,
+      and Lustre clients contact the MGS to retrieve information.</para>
+      <para>It is preferable that the MGS have its own storage space so that it
+      can be managed independently. However, the MGS can be co-located and
+      share storage space with an MDS as shown in
+      <xref linkend="understandinglustre.fig.cluster" />.</para>
     </section>
     <section remap="h3">
       <title>Lustre File System Components</title>
-      <para>Each Lustre file system consists of the following components:</para>
+      <para>Each Lustre file system consists of the following
+      components:</para>
       <itemizedlist>
         <listitem>
-          <para><emphasis role="bold">Metadata Server (MDS)</emphasis> - The MDS makes metadata
-            stored in one or more MDTs available to Lustre clients. Each MDS manages the names and
-            directories in the Lustre file system(s) and provides network request handling for one
-            or more local MDTs.</para>
+          <para>
+          <emphasis role="bold">Metadata Servers (MDS)</emphasis>- The MDS makes
+          metadata stored in one or more MDTs available to Lustre clients. Each
+          MDS manages the names and directories in the Lustre file system(s)
+          and provides network request handling for one or more local
+          MDTs.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Metadata Target (MDT</emphasis> ) - For Lustre Release 2.3 and
-            earlier, each file system has one MDT. The MDT stores metadata (such as filenames,
-            directories, permissions and file layout) on storage attached to an MDS. Each file
-            system has one MDT. An MDT on a shared storage target can be available to multiple MDSs,
-            although only one can access it at a time. If an active MDS fails, a standby MDS can
-            serve the MDT and make it available to clients. This is referred to as MDS
-            failover.</para>
-          <para condition="l24">Since Lustre Release 2.4, multiple MDTs are supported. Each file
-            system has at least one MDT. An MDT on a shared storage target can be available via
-            multiple MDSs, although only one MDS can export the MDT to the clients at one time. Two
-            MDS machines share storage for two or more MDTs. After the failure of one MDS, the
-            remaining MDS begins serving the MDT(s) of the failed MDS.</para>
+          <para>
+          <emphasis role="bold">Metadata Targets (MDT</emphasis>) - For Lustre
+          software release 2.3 and earlier, each file system has one MDT. The
+          MDT stores metadata (such as filenames, directories, permissions and
+          file layout) on storage attached to an MDS. Each file system has one
+          MDT. An MDT on a shared storage target can be available to multiple
+          MDSs, although only one can access it at a time. If an active MDS
+          fails, a standby MDS can serve the MDT and make it available to
+          clients. This is referred to as MDS failover.</para>
+          <para condition="l24">Since Lustre software release 2.4, multiple
+          MDTs are supported in the Distributed Namespace Environment (DNE).
+          In addition to the primary MDT that holds the filesystem root, it
+          is possible to add additional MDS nodes, each with their own MDTs,
+          to hold sub-directory trees of the filesystem.</para>
+          <para condition="l28">Since Lustre software release 2.8, DNE also
+          allows the filesystem to distribute files of a single directory over
+          multiple MDT nodes. A directory which is distributed across multiple
+          MDTs is known as a <emphasis>striped directory</emphasis>.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Object Storage Servers (OSS)</emphasis> : The OSS provides
-            file I/O service and network request handling for one or more local OSTs. Typically, an
-            OSS serves between two and eight OSTs, up to 16 TB each. A typical configuration is an
-            MDT on a dedicated node, two or more OSTs on each OSS node, and a client on each of a
-            large number of compute nodes.</para>
+          <para>
+          <emphasis role="bold">Object Storage Servers (OSS)</emphasis>: The
+          OSS provides file I/O service and network request handling for one or
+          more local OSTs. Typically, an OSS serves between two and eight OSTs,
+          up to 16 TiB each. A typical configuration is an MDT on a dedicated
+          node, two or more OSTs on each OSS node, and a client on each of a
+          large number of compute nodes.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Object Storage Target (OST)</emphasis> : User file data is
-            stored in one or more objects, each object on a separate OST in a Lustre file system.
-            The number of objects per file is configurable by the user and can be tuned to optimize
-            performance for a given workload.</para>
+          <para>
+          <emphasis role="bold">Object Storage Target (OST)</emphasis>: User
+          file data is stored in one or more objects, each object on a separate
+          OST in a Lustre file system. The number of objects per file is
+          configurable by the user and can be tuned to optimize performance for
+          a given workload.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Lustre clients</emphasis> : Lustre clients are computational,
-            visualization or desktop nodes that are running Lustre client software, allowing them to
-            mount the Lustre file system.</para>
+          <para>
+          <emphasis role="bold">Lustre clients</emphasis>: Lustre clients are
+          computational, visualization or desktop nodes that are running Lustre
+          client software, allowing them to mount the Lustre file
+          system.</para>
         </listitem>
       </itemizedlist>
-      <para>The Lustre client software provides an interface between the Linux virtual file system
-        and the Lustre servers. The client software includes a management client (MGC), a metadata
-        client (MDC), and multiple object storage clients (OSCs), one corresponding to each OST in
-        the file system.</para>
-      <para>A logical object volume (LOV) aggregates the OSCs to provide transparent access across
-        all the OSTs. Thus, a client with the Lustre file system mounted sees a single, coherent,
-        synchronized namespace. Several clients can write to different parts of the same file
-        simultaneously, while, at the same time, other clients can read from the file.</para>
-      <para><xref linkend="understandinglustre.tab.storagerequire"/> provides the requirements for
-        attached storage for each Lustre file system component and describes desirable
-        characteristics of the hardware used.</para>
-      <table frame="all">
-        <title xml:id="understandinglustre.tab.storagerequire"><indexterm>
-            <primary>Lustre</primary>
-            <secondary>requirements</secondary>
-          </indexterm>Storage and hardware requirements for Lustre* components</title>
+      <para>The Lustre client software provides an interface between the Linux
+      virtual file system and the Lustre servers. The client software includes
+      a management client (MGC), a metadata client (MDC), and multiple object
+      storage clients (OSCs), one corresponding to each OST in the file
+      system.</para>
+      <para>A logical object volume (LOV) aggregates the OSCs to provide
+      transparent access across all the OSTs. Thus, a client with the Lustre
+      file system mounted sees a single, coherent, synchronized namespace.
+      Several clients can write to different parts of the same file
+      simultaneously, while, at the same time, other clients can read from the
+      file.</para>
+      <para>A logical metadata volume (LMV) aggregates the MDCs to provide
+      transparent access across all the MDTs in a similar manner as the LOV
+      does for file access.  This allows the client to see the directory tree
+      on multiple MDTs as a single coherent namespace, and striped directories
+      are merged on the clients to form a single visible directory to users
+      and applications.
+      </para>
+      <para>
+      <xref linkend="understandinglustre.tab.storagerequire" />provides the
+      requirements for attached storage for each Lustre file system component
+      and describes desirable characteristics of the hardware used.</para>
+      <table frame="all" xml:id="understandinglustre.tab.storagerequire">
+        <title>
+        <indexterm>
+          <primary>Lustre</primary>
+          <secondary>requirements</secondary>
+        </indexterm>Storage and hardware requirements for Lustre file system
+        components</title>
         <tgroup cols="3">
-          <colspec colname="c1" colwidth="1*"/>
-          <colspec colname="c2" colwidth="3*"/>
-          <colspec colname="c3" colwidth="3*"/>
+          <colspec colname="c1" colwidth="1*" />
+          <colspec colname="c2" colwidth="3*" />
+          <colspec colname="c3" colwidth="3*" />
           <thead>
             <row>
               <entry>
-                <para><emphasis role="bold"/></para>
+                <para>
+                  <emphasis role="bold" />
+                </para>
               </entry>
               <entry>
-                <para><emphasis role="bold">Required attached storage</emphasis></para>
+                <para>
+                  <emphasis role="bold">Required attached storage</emphasis>
+                </para>
               </entry>
               <entry>
-                <para><emphasis role="bold">Desirable hardware characteristics</emphasis></para>
+                <para>
+                  <emphasis role="bold">Desirable hardware
+                  characteristics</emphasis>
+                </para>
               </entry>
             </row>
           </thead>
@@ -450,217 +608,307 @@
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">MDSs</emphasis></para>
+                  <emphasis role="bold">MDSs</emphasis>
+                </para>
               </entry>
               <entry>
-                <para> 1-2% of file system capacity</para>
+                <para>1-2% of file system capacity</para>
               </entry>
               <entry>
-                <para> Adequate CPU power, plenty of memory, fast disk storage.</para>
+                <para>Adequate CPU power, plenty of memory, fast disk
+                storage.</para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">OSSs</emphasis></para>
+                  <emphasis role="bold">OSSs</emphasis>
+                </para>
               </entry>
               <entry>
-                <para> 1-16 TB per OST, 1-8 OSTs per OSS</para>
+                <para>1-128 TiB per OST, 1-8 OSTs per OSS</para>
               </entry>
               <entry>
-                <para> Good bus bandwidth. Recommended that storage be balanced evenly across
-                  OSSs.</para>
+                <para>Good bus bandwidth. Recommended that storage be balanced
+                evenly across OSSs and matched to network bandwidth.</para>
               </entry>
             </row>
             <row>
               <entry>
                 <para>
-                  <emphasis role="bold">Clients</emphasis></para>
+                  <emphasis role="bold">Clients</emphasis>
+                </para>
               </entry>
               <entry>
-                <para> None</para>
+                <para>No local storage needed</para>
               </entry>
               <entry>
-                <para> Low latency, high bandwidth network.</para>
+                <para>Low latency, high bandwidth network.</para>
               </entry>
             </row>
           </tbody>
         </tgroup>
       </table>
-      <para>For additional hardware requirements and considerations, see <xref
-          linkend="settinguplustresystem"/>.</para>
+      <para>For additional hardware requirements and considerations, see
+      <xref linkend="settinguplustresystem" />.</para>
     </section>
     <section remap="h3">
-      <title><indexterm>
-          <primary>Lustre</primary>
-          <secondary>LNET</secondary>
-        </indexterm>Lustre Networking (LNET)</title>
-      <para>Lustre Networking (LNET) is a custom networking API that provides the communication
-        infrastructure that handles metadata and file I/O data for the Lustre file system servers
-        and clients. For more information about LNET, see <xref
-          linkend="understandinglustrenetworking"/>.</para>
+      <title>
+      <indexterm>
+        <primary>Lustre</primary>
+        <secondary>LNet</secondary>
+      </indexterm>Lustre Networking (LNet)</title>
+      <para>Lustre Networking (LNet) is a custom networking API that provides
+      the communication infrastructure that handles metadata and file I/O data
+      for the Lustre file system servers and clients. For more information
+      about LNet, see
+      <xref linkend="understandinglustrenetworking" />.</para>
     </section>
     <section remap="h3">
-      <title><indexterm>
+      <title>
+      <indexterm>
+        <primary>Lustre</primary>
+        <secondary>cluster</secondary>
+      </indexterm>Lustre Cluster</title>
+      <para>At scale, a Lustre file system cluster can include hundreds of OSSs
+      and thousands of clients (see
+      <xref linkend="understandinglustre.fig.lustrescale" />). More than one
+      type of network can be used in a Lustre cluster. Shared storage between
+      OSSs enables failover capability. For more details about OSS failover,
+      see
+      <xref linkend="understandingfailover" />.</para>
+      <figure xml:id="understandinglustre.fig.lustrescale">
+        <title>
+        <indexterm>
           <primary>Lustre</primary>
-          <secondary>cluster</secondary>
-        </indexterm>Lustre Cluster</title>
-      <para>At scale, the Lustre cluster can include hundreds of OSSs and thousands of clients (see
-          <xref linkend="understandinglustre.fig.lustrescale"/>). More than one type of network can
-        be used in a Lustre cluster. Shared storage between OSSs enables failover capability. For
-        more details about OSS failover, see <xref linkend="understandingfailover"/>.</para>
-      <figure>
-        <title xml:id="understandinglustre.fig.lustrescale"><indexterm>
-            <primary>Lustre</primary>
-            <secondary>at scale</secondary>
-          </indexterm>Lustre* cluster at scale</title>
+          <secondary>at scale</secondary>
+        </indexterm>Lustre cluster at scale</title>
         <mediaobject>
           <imageobject>
-            <imagedata scalefit="1" width="100%" fileref="./figures/Scaled_Cluster.png"/>
+            <imagedata scalefit="1" width="100%"
+            fileref="./figures/Scaled_Cluster.png" />
           </imageobject>
           <textobject>
-            <phrase> Lustre* clustre at scale </phrase>
+            <phrase>Lustre file system cluster at scale</phrase>
           </textobject>
         </mediaobject>
       </figure>
     </section>
   </section>
   <section xml:id="understandinglustre.storageio">
-    <title><indexterm>
-        <primary>Lustre</primary>
-        <secondary>storage</secondary>
-      </indexterm>
-      <indexterm>
-        <primary>Lustre</primary>
-        <secondary>I/O</secondary>
-      </indexterm> Lustre Storage and I/O</title>
-    <para>In a Lustre file system, a file stored on the MDT points to one or more objects associated
-      with a data file, as shown in <xref linkend="understandinglustre.fig.mdtost"/>. Each object
-      contains data and is stored on an OST. If the MDT file points to one object, all the file data
-      is stored in that object. If the file points to more than one object, the file data is
-      &apos;striped&apos; across the objects (using RAID 0) and each object is stored on a different
-      OST. (For more information about how striping is implemented in a Lustre file system, see
-        <xref linkend="dbdoclet.50438250_89922"/>)</para>
-    <para>In <xref linkend="understandinglustre.fig.mdtost"/>, each filename points to an inode. The
-      inode contains all of the file attributes, such as owner, access permissions, Lustre striping
-      layout, access time, and access control. Multiple filenames may point to the same
-      inode.</para>
-    <figure>
-      <title xml:id="understandinglustre.fig.mdtost">MDT file points to objects on OSTs containing
-        file data</title>
+    <title>
+    <indexterm>
+      <primary>Lustre</primary>
+      <secondary>storage</secondary>
+    </indexterm>
+    <indexterm>
+      <primary>Lustre</primary>
+      <secondary>I/O</secondary>
+    </indexterm>Lustre File System Storage and I/O</title>
+    <para>In Lustre software release 2.0, Lustre file identifiers (FIDs) were
+    introduced to replace UNIX inode numbers for identifying files or objects.
+    A FID is a 128-bit identifier that contains a unique 64-bit sequence
+    number, a 32-bit object ID (OID), and a 32-bit version number. The sequence
+    number is unique across all Lustre targets in a file system (OSTs and
+    MDTs). This change enabled future support for multiple MDTs (introduced in
+    Lustre software release 2.4) and ZFS (introduced in Lustre software release
+    2.4).</para>
+    <para>Also introduced in release 2.0 is an ldiskfs feature named
+    <emphasis role="italic">FID-in-dirent</emphasis>(also known as
+    <emphasis role="italic">dirdata</emphasis>) in which the FID is stored as
+    part of the name of the file in the parent directory. This feature
+    significantly improves performance for
+    <literal>ls</literal> command executions by reducing disk I/O. The
+    FID-in-dirent is generated at the time the file is created.</para>
+    <note>
+      <para>The FID-in-dirent feature is not backward compatible with the
+      release 1.8 ldiskfs disk format. Therefore, when an upgrade from
+      release 1.8 to release 2.x is performed, the FID-in-dirent feature is
+      not automatically enabled. For upgrades from release 1.8 to releases
+      2.0 through 2.3, FID-in-dirent can be enabled manually but only takes
+      effect for new files.</para>
+      <para>For more information about upgrading from Lustre software release
+      1.8 and enabling FID-in-dirent for existing files, see
+      <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+      linkend="upgradinglustre" />Chapter 16 âUpgrading a Lustre File
+      Systemâ.</para>
+    </note>
+    <para condition="l24">The LFSCK file system consistency checking tool
+    released with Lustre software release 2.4 provides functionality that
+    enables FID-in-dirent for existing files. It includes the following
+    functionality:
+    <itemizedlist>
+      <listitem>
+        <para>Generates IGIF mode FIDs for existing files from a 1.8 version
+        file system files.</para>
+      </listitem>
+      <listitem>
+        <para>Verifies the FID-in-dirent for each file and regenerates the
+        FID-in-dirent if it is invalid or missing.</para>
+      </listitem>
+      <listitem>
+        <para>Verifies the linkEA entry for each and regenerates the linkEA
+        if it is invalid or missing. The
+        <emphasis role="italic">linkEA</emphasis> consists of the file name and
+        parent FID. It is stored as an extended attribute in the file
+        itself. Thus, the linkEA can be used to reconstruct the full path name
+	of a file.</para>
+      </listitem>
+    </itemizedlist></para>
+    <para>Information about where file data is located on the OST(s) is stored
+    as an extended attribute called layout EA in an MDT object identified by
+    the FID for the file (see
+    <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+    linkend="Fig1.3_LayoutEAonMDT" />). If the file is a regular file (not a
+    directory or symbol link), the MDT object points to 1-to-N OST object(s) on
+    the OST(s) that contain the file data. If the MDT file points to one
+    object, all the file data is stored in that object. If the MDT file points
+    to more than one object, the file data is
+    <emphasis role="italic">striped</emphasis> across the objects using RAID 0,
+    and each object is stored on a different OST. (For more information about
+    how striping is implemented in a Lustre file system, see
+    <xref linkend="dbdoclet.50438250_89922" />.</para>
+    <figure xml:id="Fig1.3_LayoutEAonMDT">
+      <title>Layout EA on MDT pointing to file data on OSTs</title>
       <mediaobject>
         <imageobject>
-          <imagedata scalefit="1" width="100%" fileref="./figures/Metadata_File.png"/>
+          <imagedata scalefit="1" width="80%"
+          fileref="./figures/Metadata_File.png" />
         </imageobject>
         <textobject>
-          <phrase> MDT file points to objects on OSTs containing file data </phrase>
+          <phrase>Layout EA on MDT pointing to file data on OSTs</phrase>
         </textobject>
       </mediaobject>
     </figure>
-    <para>When a client opens a file, the <literal>fileopen</literal> operation transfers the file
-      layout from the MDS to the client. The client then uses this information to perform I/O on the
-      file, directly interacting with the OSS nodes where the objects are stored. This process is
-      illustrated in <xref linkend="understandinglustre.fig.fileio"/>.</para>
-    <figure>
-      <title xml:id="understandinglustre.fig.fileio">File open and file I/O in Lustre*</title>
+    <para>When a client wants to read from or write to a file, it first fetches
+    the layout EA from the MDT object for the file. The client then uses this
+    information to perform I/O on the file, directly interacting with the OSS
+    nodes where the objects are stored.
+    <?oxy_custom_start type="oxy_content_highlight" color="255,255,0"?>
+    This process is illustrated in
+    <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+    linkend="Fig1.4_ClientReqstgData" /><?oxy_custom_end?>
+    .</para>
+    <figure xml:id="Fig1.4_ClientReqstgData">
+      <title>Lustre client requesting file data</title>
       <mediaobject>
         <imageobject>
-          <imagedata scalefit="1" width="100%" fileref="./figures/File_Write.png"/>
+          <imagedata scalefit="1" width="75%"
+          fileref="./figures/File_Write.png" />
         </imageobject>
         <textobject>
-          <phrase> File open and file I/O in Lustre* </phrase>
+          <phrase>Lustre client requesting file data</phrase>
         </textobject>
       </mediaobject>
     </figure>
-    <para>Each file on the MDT contains the layout of the associated data file, including the OST
-      number and object identifier. Clients request the file layout from the MDS and then perform
-      file I/O operations by communicating directly with the OSSs that manage that file data.</para>
-    <para>The available bandwidth of a Lustre file system is determined as follows:</para>
+    <para>The available bandwidth of a Lustre file system is determined as
+    follows:</para>
     <itemizedlist>
       <listitem>
-        <para>The <emphasis>network bandwidth</emphasis> equals the aggregated bandwidth of the OSSs
-          to the targets.</para>
+        <para>The
+        <emphasis>network bandwidth</emphasis> equals the aggregated bandwidth
+        of the OSSs to the targets.</para>
       </listitem>
       <listitem>
-        <para>The <emphasis>disk bandwidth</emphasis> equals the sum of the disk bandwidths of the
-          storage targets (OSTs) up to the limit of the network bandwidth.</para>
+        <para>The
+        <emphasis>disk bandwidth</emphasis> equals the sum of the disk
+        bandwidths of the storage targets (OSTs) up to the limit of the network
+        bandwidth.</para>
       </listitem>
       <listitem>
-        <para>The <emphasis>aggregate bandwidth</emphasis> equals the minimum of the disk bandwidth
-          and the network bandwidth.</para>
+        <para>The
+        <emphasis>aggregate bandwidth</emphasis> equals the minimum of the disk
+        bandwidth and the network bandwidth.</para>
       </listitem>
       <listitem>
-        <para>The <emphasis>available file system space</emphasis> equals the sum of the available
-          space of all the OSTs.</para>
+        <para>The
+        <emphasis>available file system space</emphasis> equals the sum of the
+        available space of all the OSTs.</para>
       </listitem>
     </itemizedlist>
     <section xml:id="dbdoclet.50438250_89922">
       <title>
-        <indexterm>
-          <primary>Lustre</primary>
-          <secondary>striping</secondary>
-        </indexterm>
-        <indexterm>
-          <primary>striping</primary>
-          <secondary>overview</secondary>
-        </indexterm> Lustre File System and Striping</title>
-      <para>One of the main factors leading to the high performance of Lustre file systems is the
-        ability to stripe data across multiple OSTs in a round-robin fashion. Users can optionally
-        configure for each file the number of stripes, stripe size, and OSTs that are used.</para>
-      <para>Striping can be used to improve performance when the aggregate bandwidth to a single
-        file exceeds the bandwidth of a single OST. The ability to stripe is also useful when a
-        single OST does not have enough free space to hold an entire file. For more information
-        about benefits and drawbacks of file striping, see <xref linkend="dbdoclet.50438209_48033"
-        />.</para>
-      <para>Striping allows segments or &apos;chunks&apos; of data in a file to be stored on
-        different OSTs, as shown in <xref linkend="understandinglustre.fig.filestripe"/>. In the
-        Lustre file system, a RAID 0 pattern is used in which data is &quot;striped&quot; across a
-        certain number of objects. The number of objects in a single file is called the
-          <literal>stripe_count</literal>.</para>
-      <para>Each object contains a chunk of data from the file. When the chunk of data being written
-        to a particular object exceeds the <literal>stripe_size</literal>, the next chunk of data in
-        the file is stored on the next object.</para>
-      <para>Default values for <literal>stripe_count</literal> and <literal>stripe_size</literal>
-        are set for the file system. The default value for <literal>stripe_count</literal> is 1
-        stripe for file and the default value for <literal>stripe_size</literal> is 1MB. The user
-        may change these values on a per directory or per file basis. For more details, see <xref
-          linkend="dbdoclet.50438209_78664"/>.</para>
-      <para><xref linkend="understandinglustre.fig.filestripe"/>, the <literal>stripe_size</literal>
-        for File C is larger than the <literal>stripe_size</literal> for File A, allowing more data
-        to be stored in a single stripe for File C. The <literal>stripe_count</literal> for File A
-        is 3, resulting in data striped across three objects, while the
-          <literal>stripe_count</literal> for File B and File C is 1.</para>
-      <para>No space is reserved on the OST for unwritten data. File A in <xref
-          linkend="understandinglustre.fig.filestripe"/>.</para>
-      <figure>
-        <title xml:id="understandinglustre.fig.filestripe">File striping on a Lustre* file
-          system</title>
+      <indexterm>
+        <primary>Lustre</primary>
+        <secondary>striping</secondary>
+      </indexterm>
+      <indexterm>
+        <primary>striping</primary>
+        <secondary>overview</secondary>
+      </indexterm>Lustre File System and Striping</title>
+      <para>One of the main factors leading to the high performance of Lustre
+      file systems is the ability to stripe data across multiple OSTs in a
+      round-robin fashion. Users can optionally configure for each file the
+      number of stripes, stripe size, and OSTs that are used.</para>
+      <para>Striping can be used to improve performance when the aggregate
+      bandwidth to a single file exceeds the bandwidth of a single OST. The
+      ability to stripe is also useful when a single OST does not have enough
+      free space to hold an entire file. For more information about benefits
+      and drawbacks of file striping, see
+      <xref linkend="dbdoclet.50438209_48033" />.</para>
+      <para>Striping allows segments or 'chunks' of data in a file to be stored
+      on different OSTs, as shown in
+      <xref linkend="understandinglustre.fig.filestripe" />. In the Lustre file
+      system, a RAID 0 pattern is used in which data is "striped" across a
+      certain number of objects. The number of objects in a single file is
+      called the
+      <literal>stripe_count</literal>.</para>
+      <para>Each object contains a chunk of data from the file. When the chunk
+      of data being written to a particular object exceeds the
+      <literal>stripe_size</literal>, the next chunk of data in the file is
+      stored on the next object.</para>
+      <para>Default values for
+      <literal>stripe_count</literal> and
+      <literal>stripe_size</literal> are set for the file system. The default
+      value for
+      <literal>stripe_count</literal> is 1 stripe for file and the default value
+      for
+      <literal>stripe_size</literal> is 1MB. The user may change these values on
+      a per directory or per file basis. For more details, see
+      <xref linkend="dbdoclet.50438209_78664" />.</para>
+      <para>
+      <xref linkend="understandinglustre.fig.filestripe" />, the
+      <literal>stripe_size</literal> for File C is larger than the
+      <literal>stripe_size</literal> for File A, allowing more data to be stored
+      in a single stripe for File C. The
+      <literal>stripe_count</literal> for File A is 3, resulting in data striped
+      across three objects, while the
+      <literal>stripe_count</literal> for File B and File C is 1.</para>
+      <para>No space is reserved on the OST for unwritten data. File A in
+      <xref linkend="understandinglustre.fig.filestripe" />.</para>
+      <figure xml:id="understandinglustre.fig.filestripe">
+        <title>File striping on a
+        Lustre file system</title>
         <mediaobject>
           <imageobject>
-            <imagedata scalefit="1" width="100%" fileref="./figures/File_Striping.png"/>
+            <imagedata scalefit="1" width="100%"
+            fileref="./figures/File_Striping.png" />
           </imageobject>
           <textobject>
-            <phrase>File striping pattern across three OSTs for three different data files. The file
-              is sparse and missing chunk 6. </phrase>
+            <phrase>File striping pattern across three OSTs for three different
+            data files. The file is sparse and missing chunk 6.</phrase>
           </textobject>
         </mediaobject>
       </figure>
-      <para>The maximum file size is not limited by the size of a single target. In a Lustre file
-        system,   files can be striped across multiple objects (up to 2000), and each object can be
-        up to 16 TB in size with ldiskfs. This leads to a maximum file size of 31.25 PB. (Note that
-        a Lustre file system can support files up to 2^64 bytes depending on the backing storage
-        used by OSTs.)</para>
+      <para>The maximum file size is not limited by the size of a single
+      target. In a Lustre file system, files can be striped across multiple
+      objects (up to 2000), and each object can be up to 16 TiB in size with
+      ldiskfs, or up to 256PiB with ZFS. This leads to a maximum file size of
+      31.25 PiB for ldiskfs or 8EiB with ZFS. Note that a Lustre file system can
+      support files up to 2^63 bytes (8EiB), limited only by the space available
+      on the OSTs.</para>
       <note>
-        <para>Versions of the Lustre software prior to Release 2.2 limited the  maximum stripe count
-          for a single file to 160 OSTs.</para>
+        <para>Versions of the Lustre software prior to Release 2.2 limited the
+        maximum stripe count for a single file to 160 OSTs.</para>
       </note>
-      <para>Although a single file can only be striped over 2000 objects, Lustre file systems can
-        have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O
-        bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000
-        servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to
-        utilize the full file system bandwidth.</para>
-      <para>For more information about striping, see <xref linkend="managingstripingfreespace"
-        />.</para>
+      <para>Although a single file can only be striped over 2000 objects,
+      Lustre file systems can have thousands of OSTs. The I/O bandwidth to
+      access a single file is the aggregated I/O bandwidth to the objects in a
+      file, which can be as much as a bandwidth of up to 2000 servers. On
+      systems with more than 2000 OSTs, clients can do I/O using multiple files
+      to utilize the full file system bandwidth.</para>
+      <para>For more information about striping, see
+      <xref linkend="managingstripingfreespace" />.</para>
     </section>
   </section>
 </chapter>