LUDOC-321 style: ensure ID attributes are unique.

[doc/manual.git] / SettingUpLustreSystem.xml
diff --git a/SettingUpLustreSystem.xml b/SettingUpLustreSystem.xml

index 72396c3..9e64b6c 100644 (file)
--- a/SettingUpLustreSystem.xml
+++ b/SettingUpLustreSystem.xml
@@ -1,7 +1,9 @@
  <?xml version='1.0' encoding='UTF-8'?>
  <chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="settinguplustresystem">
-  <title xml:id="settinguplustresystem.title">Setting Up a Lustre File System</title>
-  <para>This chapter describes hardware configuration requirements for a Lustre file system including:</para>
+  <title xml:id="settinguplustresystem.title">Determining Hardware Configuration Requirements and
+    Formatting Options</title>
+  <para>This chapter describes hardware configuration requirements for a Lustre file system
+    including:</para>
    <itemizedlist>
      <listitem>
        <para>
@@ -34,74 +36,164 @@
    <indexterm><primary>setup</primary><secondary>hardware</secondary></indexterm>        
    <indexterm><primary>design</primary><see>setup</see></indexterm>        
            Hardware Considerations</title>
-    <para>Lustre can work with any kind of block storage device such as single disks, software RAID, hardware RAID, or a logical volume manager. In contrast to some networked file systems, the block devices are only attached to the MDS and OSS nodes in Lustre and are not accessed by the clients directly.</para>
+    <para>A Lustre file system can utilize any kind of block storage device such as single disks,
+      software RAID, hardware RAID, or a logical volume manager. In contrast to some networked file
+      systems, the block devices are only attached to the MDS and OSS nodes in a Lustre file system
+      and are not accessed by the clients directly.</para>
      <para>Since the block devices are accessed by only one or two server nodes, a storage area network (SAN) that is accessible from all the servers is not required. Expensive switches are not needed because point-to-point connections between the servers and the storage arrays normally provide the simplest and best attachments. (If failover capability is desired, the storage must be attached to multiple servers.)</para>
      <para>For a production environment, it is preferable that the MGS have separate storage to allow future expansion to multiple file systems. However, it is possible to run the MDS and MGS on the same machine and have them share the same storage device.</para>
      <para>For best performance in a production environment, dedicated clients are required. For a non-production Lustre environment or for testing, a Lustre client and server can run on the same machine. However, dedicated clients are the only supported configuration.</para>
-    <para>Performance and other issues can occur when an MDS or OSS and a client are running on the same machine:</para>
+    <warning><para>Performance and recovery issues can occur if you put a client on an MDS or OSS:</para>
      <itemizedlist>
        <listitem>
-        <para>Running the MDS and a client on the same machine can cause recovery and deadlock issues and impact the performance of other Lustre clients.</para>
+        <para>Running the OSS and a client on the same machine can cause issues with low memory and memory pressure. If the client consumes all the memory and then tries to write data to the file system, the OSS will need to allocate pages to receive data from the client but will not be able to perform this operation due to low memory. This can cause the client to hang.</para>
        </listitem>
        <listitem>
-        <para>Running the OSS and a client on the same machine can cause issues with low memory and memory pressure. If the client consumes all the memory and then tries to write data to the file system, the OSS will need to allocate pages to receive data from the client but will not be able to perform this operation due to low memory. This can cause the client to hang.</para>
+        <para>Running the MDS and a client on the same machine can cause recovery and deadlock issues and impact the performance of other Lustre clients.</para>
        </listitem>
      </itemizedlist>
-    <para>Only servers running on 64-bit CPUs are tested and supported. 64-bit CPU clients are typically used for testing to match expected customer usage and avoid limitations due to the 4 GB limit for RAM size, 1 GB low-memory limitation, and 16 TB file size limit of 32-bit CPUs. Also, due to kernel API limitations, performing backups of Lustre 2.x. filesystems on 32-bit clients may cause backup tools to confuse files that have the same 32-bit inode number.</para>
-    <para>The storage attached to the servers typically uses RAID to provide fault tolerance and can optionally be organized with logical volume management (LVM). It is then formatted by Lustre as a file system. Lustre OSS and MDS servers read, write and modify data in the format imposed by the file system.</para>
-    <para>Lustre uses journaling file system technology on both the MDTs and OSTs. For a MDT, as much as a 20 percent performance gain can be obtained by placing the journal on a separate device.</para>
-    <para>The MDS can effectively utilize a lot of CPU cycles. A minimium of four processor cores are recommended. More are advisable for files systems with many clients.</para>
+       </warning>
+    <para>Only servers running on 64-bit CPUs are tested and supported. 64-bit CPU clients are
+      typically used for testing to match expected customer usage and avoid limitations due to the 4
+      GB limit for RAM size, 1 GB low-memory limitation, and 16 TB file size limit of 32-bit CPUs.
+      Also, due to kernel API limitations, performing backups of Lustre software release 2.x. file
+      systems on 32-bit clients may cause backup tools to confuse files that have the same 32-bit
+      inode number.</para>
+    <para>The storage attached to the servers typically uses RAID to provide fault tolerance and can
+      optionally be organized with logical volume management (LVM), which is then formatted as a
+      Lustre file system. Lustre OSS and MDS servers read, write and modify data in the format
+      imposed by the file system.</para>
+    <para>The Lustre file system uses journaling file system technology on both the MDTs and OSTs.
+      For a MDT, as much as a 20 percent performance gain can be obtained by placing the journal on
+      a separate device.</para>
+    <para>The MDS can effectively utilize a lot of CPU cycles. A minimum of four processor cores are recommended. More are advisable for files systems with many clients.</para>
      <note>
        <para>Lustre clients running on architectures with different endianness are supported. One limitation is that the PAGE_SIZE kernel macro on the client must be as large as the PAGE_SIZE of the server. In particular, ia64 or PPC clients with large pages (up to 64kB pages) can run with x86 servers (4kB pages). If you are running x86 clients with ia64 or PPC servers, you must compile the ia64 kernel with a 4kB PAGE_SIZE (so the server page size is not larger than the client page size). </para>
      </note>
      <section remap="h3">
-        <title><indexterm><primary>setup</primary><secondary>MDT</secondary></indexterm>
-            MDT Storage Hardware Considerations</title>
-      <para>The data access pattern for MDS storage is a database-like access pattern with many seeks and read-and-writes of small amounts of data. High throughput to MDS storage is not important. Storage types that provide much lower seek times, such as high-RPM SAS or SSD drives can be used for the MDT.</para>
+        <title><indexterm>
+          <primary>setup</primary>
+          <secondary>MDT</secondary>
+        </indexterm> MGT and MDT Storage Hardware Considerations</title>
+      <para>MGT storage requirements are small (less than 100 MB even in the largest Lustre file
+        systems), and the data on an MGT is only accessed on a server/client mount, so disk
+        performance is not a consideration.  However, this data is vital for file system access, so
+        the MGT should be reliable storage, preferably mirrored RAID1.</para>
+      <para>MDS storage is accessed in a database-like access pattern with many seeks and
+        read-and-writes of small amounts of data. High throughput to MDS storage is not important.
+        Storage types that provide much lower seek times, such as high-RPM SAS or SSD drives can be
+        used for the MDT.</para>
        <para>For maximum performance, the MDT should be configured as RAID1 with an internal journal and two disks from different controllers.</para>
        <para>If you need a larger MDT, create multiple RAID1 devices from pairs of disks, and then make a RAID0 array of the RAID1 devices. This ensures maximum reliability because multiple disk failures only have a small chance of hitting both disks in the same RAID1 device.</para>
        <para>Doing the opposite (RAID1 of a pair of RAID0 devices) has a 50% chance that even two disk failures can cause the loss of the whole MDT device. The first failure disables an entire half of the mirror and the second failure has a 50% chance of disabling the remaining mirror.</para>
+      <para condition='l24'>If multiple MDTs are going to be present in the system, each MDT should be specified for the anticipated usage and load.</para>
+      <warning condition='l24'><para>MDT0 contains the root of the Lustre file system. If MDT0 is unavailable for any reason, the
+          file system cannot be used.</para></warning>
+      <note condition='l24'><para>Additional MDTs can be dedicated to sub-directories off the root file system provided by MDT0.
+          Subsequent directories may also be configured to have their own MDT. If an MDT serving a
+          subdirectory becomes unavailable this subdirectory and all directories beneath it will
+          also become unavailable. Configuring multiple levels of MDTs is an experimental feature
+          for the Lustre software release 2.4.</para></note>
      </section>
      <section remap="h3">
        <title><indexterm><primary>setup</primary><secondary>OST</secondary></indexterm>OST Storage Hardware Considerations</title>
-      <para>The data access pattern for the OSS storage is a streaming I/O pattern that is dependent on the access patterns of applications being used. Each OSS can manage multiple object storage targets (OSTs), one for each volume with I/O traffic load-balanced between servers and targets. An OSS should be configured to have a balance between the network bandwidth and the attached storage bandwidth to prevent bottlenecks in the I/O path. Depending on the server hardware, an OSS typically serves between 2 and 8 targets, with each target up to 16 terabytes (TBs) in size.</para>
-      <para>Lustre file system capacity is the sum of the capacities provided by the targets. For example, 64 OSSs, each with two 8 TB targets, provide a file system with a capacity of nearly 1 PB. If each OST uses ten 1 TB SATA disks (8 data disks plus 2 parity disks in a RAID 6 configuration), it may be possible to get 50 MB/sec from each drive, providing up to 400 MB/sec of disk bandwidth per OST. If this system is used as storage backend with a system network like InfiniBand that provides a similar bandwidth, then each OSS could provide 800 MB/sec of end-to-end I/O throughput. (Although the architectural constraints described here are simple, in practice it takes careful hardware selection, benchmarking and integration to obtain such results.)</para>
+      <para>The data access pattern for the OSS storage is a streaming I/O pattern that is dependent on the access patterns of applications being used. Each OSS can manage multiple object storage targets (OSTs), one for each volume with I/O traffic load-balanced between servers and targets. An OSS should be configured to have a balance between the network bandwidth and the attached storage bandwidth to prevent bottlenecks in the I/O path. Depending on the server hardware, an OSS typically serves between 2 and 8 targets, with each target up to 128 terabytes (TBs) in size.</para>
+      <para>Lustre file system capacity is the sum of the capacities provided by the targets. For
+        example, 64 OSSs, each with two 8 TB targets, provide a file system with a capacity of
+        nearly 1 PB. If each OST uses ten 1 TB SATA disks (8 data disks plus 2 parity disks in a
+        RAID 6 configuration), it may be possible to get 50 MB/sec from each drive, providing up to
+        400 MB/sec of disk bandwidth per OST. If this system is used as storage backend with a
+        system network, such as the InfiniBand network, that provides a similar bandwidth, then each
+        OSS could provide 800 MB/sec of end-to-end I/O throughput. (Although the architectural
+        constraints described here are simple, in practice it takes careful hardware selection,
+        benchmarking and integration to obtain such results.)</para>
      </section>
    </section>
    <section xml:id="dbdoclet.50438256_31079">
        <title><indexterm><primary>setup</primary><secondary>space</secondary></indexterm>
            <indexterm><primary>space</primary><secondary>determining requirements</secondary></indexterm>
            Determining Space Requirements</title>
-    <para>The desired performance characteristics of the backing file systems on the MDT and OSTs are independent of one another. The size of the MDT backing file system depends on the number of inodes needed in the total Lustre file system, while the aggregate OST space depends on the total amount of data stored on the file system.</para>
-    <para>Each time a file is created on a Lustre file system, it consumes one inode on the MDT and one inode for each OST object over which the file is striped. Normally, each file&apos;s stripe count is based on the system-wide default stripe count. However, this can be changed for individual files using the lfssetstripe option. For more details, see <xref linkend="managingstripingfreespace"/>.</para>
+    <para>The desired performance characteristics of the backing file systems on the MDT and OSTs
+      are independent of one another. The size of the MDT backing file system depends on the number
+      of inodes needed in the total Lustre file system, while the aggregate OST space depends on the
+      total amount of data stored on the file system. If MGS data is to be stored on the MDT device
+      (co-located MGT and MDT), add 100 MB to the required size estimate for the MDT.</para>
+    <para>Each time a file is created on a Lustre file system, it consumes one inode on the MDT and one inode for each OST object over which the file is striped. Normally, each file&apos;s stripe count is based on the system-wide default stripe count. However, this can be changed for individual files using the <literal>lfs setstripe</literal> option. For more details, see <xref linkend="managingstripingfreespace"/>.</para>
      <para>In a Lustre ldiskfs file system, all the inodes are allocated on the MDT and OSTs when the file system is first formatted. The total number of inodes on a formatted MDT or OST cannot be easily changed, although it is possible to add OSTs with additional space and corresponding inodes. Thus, the number of inodes created at format time should be generous enough to anticipate future expansion.</para>
      <para>When the file system is in use and a file is created, the metadata associated with that file is stored in one of the pre-allocated inodes and does not consume any of the free space used to store file data.</para>
      <note>
-      <para>By default, the ldiskfs file system used by Lustre servers to store user-data objects and system data reserves 5% of space that cannot be used by Lustre. Additionally, Lustre reserves up to 400 MB on each OST for journal use and a small amount of space outside the journal to store accounting data for Lustre. This reserved space is unusable for general storage. Thus, at least 400 MB of space is used on each OST before any file object data is saved.</para>
+      <para>By default, the ldiskfs file system used by Lustre servers to store user-data objects
+        and system data reserves 5% of space that cannot be used by the Lustre file system.
+        Additionally, a Lustre file system reserves up to 400 MB on each OST for journal use and a
+        small amount of space outside the journal to store accounting data. This reserved space is
+        unusable for general storage. Thus, at least 400 MB of space is used on each OST before any
+        file object data is saved.</para>
      </note>
+    <para condition="l24">With a ZFS backing filesystem for the MDT or OST,
+    the space allocation for inodes and file data is dynamic, and inodes are
+    allocated as needed.  A minimum of 2kB of usable space (before RAID) is
+    needed for each inode, exclusive of other overhead such as directories,
+    internal log files, extended attributes, ACLs, etc.
+    Since the size of extended attributes and ACLs is highly dependent on
+    kernel versions and site-specific policies, it is best to over-estimate
+    the amount of space needed for the desired number of inodes, and any
+    excess space will be utilized to store more inodes.</para>
+    <section>
+      <title><indexterm>
+          <primary>setup</primary>
+          <secondary>MGT</secondary>
+        </indexterm>
+        <indexterm>
+          <primary>space</primary>
+          <secondary>determining MGT requirements</secondary>
+        </indexterm> Determining MGT Space Requirements</title>
+      <para>Less than 100 MB of space is required for the MGT. The size is determined by the number
+        of servers in the Lustre file system cluster(s) that are managed by the MGS.</para>
+    </section>
      <section xml:id="dbdoclet.50438256_87676">
-        <title><indexterm><primary>setup</primary><secondary>MDS/MDT</secondary></indexterm>
-          <indexterm><primary>space</primary><secondary>determining MDS/MDT requirements</secondary></indexterm>
-      Determining MDS/MDT Space Requirements</title>
-      <para>When calculating the MDT size, the important factor to consider is the number of files to be stored in the file system. This determines the number of inodes needed, which drives the MDT sizing. To be on the safe side, plan for 4 KB per inode on the MDT, which is the default value. Attached storage required for Lustre metadata is typically 1-2 percent of the file system capacity depending upon file size.</para>
+        <title><indexterm>
+          <primary>setup</primary>
+          <secondary>MDT</secondary>
+        </indexterm>
+        <indexterm>
+          <primary>space</primary>
+          <secondary>determining MDT requirements</secondary>
+        </indexterm> Determining MDT Space Requirements</title>
+      <para>When calculating the MDT size, the important factor to consider is the number of files
+        to be stored in the file system. This determines the number of inodes needed, which drives
+        the MDT sizing. To be on the safe side, plan for 2 KB per inode on the MDT, which is the
+        default value. Attached storage required for Lustre file system metadata is typically 1-2
+        percent of the file system capacity depending upon file size.</para>
        <para>For example, if the average file size is 5 MB and you have 100 TB of usable OST space, then you can calculate the minimum number of inodes as follows:</para>
        <informalexample>
          <para>(100 TB * 1024 GB/TB * 1024 MB/GB) / 5 MB/inode = 20 million inodes</para>
        </informalexample>
        <para>We recommend that you use at least twice the minimum number of inodes to allow for future expansion and allow for an average file size smaller than expected. Thus, the required space is:</para>
        <informalexample>
-        <para>4 KB/inode * 40 million inodes = 160 GB</para>
+        <para>2 KB/inode * 40 million inodes = 80 GB</para>
        </informalexample>
-      <para>If the average file size is small, 4 KB for example, Lustre is not very efficient as the MDT uses as much space as the OSTs. However, this is not a common configuration for Lustre.</para>
+      <para>If the average file size is small, 4 KB for example, the Lustre file system is not very
+        efficient as the MDT uses as much space as the OSTs. However, this is not a common
+        configuration for a Lustre environment.</para>
        <note>
          <para>If the MDT is too small, this can cause all the space on the OSTs to be unusable. Be sure to determine the appropriate size of the MDT needed to support the file system before formatting the file system. It is difficult to increase the number of inodes after the file system is formatted.</para>
        </note>
      </section>
      <section remap="h3">
-        <title><indexterm><primary>setup</primary><secondary>OSS/OST</secondary></indexterm>
-          <indexterm><primary>space</primary><secondary>determining OSS/OST requirements</secondary></indexterm>
-      Determining OSS/OST Space Requirements</title>
-      <para>For the OST, the amount of space taken by each object depends on the usage pattern of the users/applications running on the system. Lustre defaults to a conservative estimate for the object size (16 KB per object). If you are confident that the average file size for your applications will be larger than this, you can specify a larger average file size (fewer total inodes) to reduce file system overhead and minimize file system check time. See <xref linkend="dbdoclet.50438256_53886"/> for more details.</para>
+        <title><indexterm>
+          <primary>setup</primary>
+          <secondary>OST</secondary>
+        </indexterm>
+        <indexterm>
+          <primary>space</primary>
+          <secondary>determining OST requirements</secondary>
+        </indexterm> Determining OST Space Requirements</title>
+      <para>For the OST, the amount of space taken by each object depends on the usage pattern of
+        the users/applications running on the system. The Lustre software defaults to a conservative
+        estimate for the object size (16 KB per object). If you are confident that the average file
+        size for your applications will be larger than this, you can specify a larger average file
+        size (fewer total inodes) to reduce file system overhead and minimize file system check
+        time. See <xref linkend="dbdoclet.50438256_53886"/> for more details.</para>
      </section>
    </section>
    <section xml:id="dbdoclet.50438256_84701">
@@ -109,45 +201,173 @@
            <indexterm><primary>file system</primary><secondary>formatting options</secondary></indexterm>
            <indexterm><primary>setup</primary><secondary>file system</secondary></indexterm>
            Setting File System Formatting Options</title>
-    <para>To override the default formatting options for any of the Lustre backing file systems, use this argument to <literal>mkfs.lustre</literal> to pass formatting options to the backing <literal>mkfs</literal>:</para>
+    <para>By default, the <literal>mkfs.lustre</literal> utility applies these options to the Lustre
+      backing file system used to store data and metadata in order to enhance Lustre file system
+      performance and scalability. These options include:</para>
+        <itemizedlist>
+            <listitem>
+              <para><literal>flex_bg</literal> - When the flag is set to enable this
+          flexible-block-groups feature, block and inode bitmaps for multiple groups are aggregated
+          to minimize seeking when bitmaps are read or written and to reduce read/modify/write
+          operations on typical RAID storage (with 1 MB RAID stripe widths). This flag is enabled on
+          both OST and MDT file systems. On MDT file systems the <literal>flex_bg</literal> factor
+          is left at the default value of 16. On OSTs, the <literal>flex_bg</literal> factor is set
+          to 256 to allow all of the block or inode bitmaps in a single <literal>flex_bg</literal>
+          to be read or written in a single I/O on typical RAID storage.</para>
+            </listitem>
+            <listitem>
+              <para><literal>huge_file</literal> - Setting this flag allows files on OSTs to be
+          larger than 2 TB in size.</para>
+            </listitem>
+            <listitem>
+              <para><literal>lazy_journal_init</literal> - This extended option is enabled to
+          prevent a full overwrite of the 400 MB journal that is allocated by default in a Lustre
+          file system, which reduces the file system format time.</para>
+            </listitem>
+        </itemizedlist>
+    <para>To override the default formatting options, use arguments to
+        <literal>mkfs.lustre</literal> to pass formatting options to the backing file system:</para>
      <screen>--mkfsoptions=&apos;backing fs options&apos;</screen>
-    <para>For other options to format backing ldiskfs filesystems, see the Linux man page for <literal>mke2fs(8)</literal>.</para>
+    <para>For other <literal>mkfs.lustre</literal> options, see the Linux man page for
+        <literal>mke2fs(8)</literal>.</para>
      <section xml:id="dbdoclet.50438256_pgfId-1293228">
-      <title><indexterm><primary>inodes</primary><secondary>MDS</secondary></indexterm><indexterm><primary>setup</primary><secondary>inodes</secondary></indexterm>Setting the Number of Inodes for the MDS</title>
-      <para>The number of inodes on the MDT is determined at format time based on the total size of the file system to be created. The default MDT inode ratio is one inode for every 4096 bytes of file system space. To override the inode ratio, use the following option:</para>
-      <screen>-i <emphasis>&lt;bytes per inode&gt;</emphasis></screen>
-      <para>For example, use the following option to create one inode per 2048 bytes of file system space.</para>
-      <screen>--mkfsoptions=&quot;-i 2048&quot; </screen>
-      <para>To avoid mke2fs creating an unusable file system, do not specify the -i option with an inode ratio below one inode per 1024 bytes. Instead, specify an absolute number of inodes, using this option:</para>
-      <screen>-N<emphasis> &lt;number of inodes&gt;</emphasis></screen>
-      <para>For example, by default, a 2 TB MDT will have 512M inodes. The largest currently-supported file system size is 16 TB, which would hold 4B inodes, the maximum possible number of inodes in a ldiskfs file system. With an MDS inode ratio of 1024 bytes per inode, a 2 TB MDT would hold 2B inodes, and a 4 TB MDT would hold 4B inodes.</para>
-    </section>
-    <section remap="h3">
-      <title><indexterm><primary>inodes</primary><secondary>MDT</secondary></indexterm>Setting the Inode Size for the MDT</title>
-      <para>Lustre uses &quot;large&quot; inodes on backing file systems to efficiently store Lustre metadata with each file. On the MDT, each inode is at least 512 bytes in size (by default), while on the OST each inode is 256 bytes in size.</para>
-      <para>The backing ldiskfs file system also needs sufficient space for other metadata like the journal (up to 400 MB), bitmaps and directories and a few files that Lustre uses to maintain cluster consistency.</para>
-      <para>To specify a larger inode size, use the <literal>-I &lt;inodesize&gt;</literal> option. We recommend you do NOT specify a smaller-than-default inode size, as this can lead to serious performance problems; and you cannot change this parameter after formatting the file system. The inode ratio must always be larger than the inode size.</para>
+      <title><indexterm>
+          <primary>inodes</primary>
+          <secondary>MDS</secondary>
+        </indexterm><indexterm>
+          <primary>setup</primary>
+          <secondary>inodes</secondary>
+        </indexterm>Setting Formatting Options for an MDT</title>
+      <para>The number of inodes on the MDT is determined at format time based on the total size of
+        the file system to be created. The default <emphasis role="italic"
+          >bytes-per-inode</emphasis> ratio ("inode ratio") for an MDT is optimized at one inode for
+        every 2048 bytes of file system space. It is recommended that this value not be changed for
+        MDTs.</para>
+      <para>This setting takes into account the space needed for additional metadata, such as the
+        journal (up to 400 MB), bitmaps and directories, and a few files that the Lustre file system
+        uses to maintain cluster consistency.</para>
      </section>
      <section xml:id="dbdoclet.50438256_53886">
-      <title><indexterm><primary>inodes</primary><secondary>OST</secondary></indexterm>Setting the Number of Inodes for an OST</title>
-      <para>When formatting OST file systems, it is normally advantageous to take local file system usage into account. Try to minimize the number of inodes on each OST, while keeping enough margin for potential variance in future usage. This helps reduce the format and file system check time, and makes more space available for data.</para>
-      <para>The current default is to create one inode per 16 KB of space in the OST file system, but in many environments, this is far too many inodes for the average file size. As a good rule of thumb, the OSTs should have at least:</para>
-      <para>num_ost_inodes = 4 * <emphasis>&lt;num_mds_inodes&gt;</emphasis> * <emphasis>&lt;default_stripe_count&gt;</emphasis> / <emphasis>&lt;number_osts&gt;</emphasis></para>
-      <para>You can specify the number of inodes on the OST file systems using the following option to the <literal>--mkfs</literal> option:</para>
-      <screen>-N <emphasis>&lt;num_inodes&gt;</emphasis></screen>
-      <para> Alternately, if you know the average file size, then you can specify the OST inode count for the OST file systems using:</para>
-      <screen>-i <emphasis>&lt;average_file_size</emphasis> / (<emphasis>number_of_stripes</emphasis> * 4)<emphasis>&gt;</emphasis></screen>
-      <para>For example, if the average file size is 16 MB and there are, by default 4 stripes per file, then <literal>--mkfsoptions=&apos;-i 1048576&apos;</literal> would be appropriate.</para>
+      <title><indexterm>
+          <primary>inodes</primary>
+          <secondary>OST</secondary>
+        </indexterm>Setting Formatting Options for an OST</title>
+      <para>When formatting OST file systems, it is normally advantageous to take local file system
+        usage into account. When doing so, try to minimize the number of inodes on each OST, while
+        keeping enough margin for potential variations in future usage. This helps reduce the format
+        and file system check time and makes more space available for data.</para>
+      <para>The table below shows the default <emphasis role="italic">bytes-per-inode
+        </emphasis>ratio ("inode ratio") used for OSTs of various sizes when they are formatted. </para>
+      <para>
+        <table frame="all">
+          <title xml:id="settinguplustresystem.tab1">Inode Ratios Used for Newly Formatted
+            OSTs</title>
+          <tgroup cols="3">
+            <colspec colname="c1" colwidth="3*"/>
+            <colspec colname="c2" colwidth="2*"/>
+            <colspec colname="c3" colwidth="4*"/>
+            <thead>
+              <row>
+                <entry>
+                  <para><emphasis role="bold">LUN/OST size</emphasis></para>
+                </entry>
+                <entry>
+                  <para><emphasis role="bold">Inode ratio</emphasis></para>
+                </entry>
+                <entry>
+                  <para><emphasis role="bold">Total inodes</emphasis></para>
+                </entry>
+              </row>
+            </thead>
+            <tbody>
+              <row>
+                <entry>
+                  <para> over 10GB </para>
+                </entry>
+                <entry>
+                  <para> 1 inode/16KB </para>
+                </entry>
+                <entry>
+                  <para> 640 - 655k </para>
+                </entry>
+              </row>
+              <row>
+                <entry>
+                  <para> 10GB - 1TB </para>
+                </entry>
+                <entry>
+                  <para> 1 inode/68kiB </para>
+                </entry>
+                <entry>
+                  <para> 153k - 15.7M </para>
+                </entry>
+              </row>
+              <row>
+                <entry>
+                  <para> 1TB - 8TB </para>
+                </entry>
+                <entry>
+                  <para> 1 inode/256kB </para>
+                </entry>
+                <entry>
+                  <para> 4.2M - 33.6M </para>
+                </entry>
+              </row>
+              <row>
+                <entry>
+                  <para> over 8TB </para>
+                </entry>
+                <entry>
+                  <para> 1 inode/1MB </para>
+                </entry>
+                <entry>
+                  <para> 8.4M - 134M </para>
+                </entry>
+              </row>
+            </tbody>
+          </tgroup>
+        </table>
+      </para>
+      <para>In environments with few small files, the default inode ratio may result in far too many
+        inodes for the average file size. In this case, performance can be improved by increasing
+        the number of <emphasis role="italic">bytes-per-inode</emphasis>.To set the inode ratio, use
+        the <literal>-i</literal> argument to <literal>mkfs.lustre</literal> to specify the
+          <emphasis role="italic">bytes-per-inode</emphasis> value. </para>
        <note>
-        <para>In addition to the number of inodes, file system check time on OSTs is affected by a number of other variables: size of the file system, number of allocated blocks, distribution of allocated blocks on the disk, disk speed, CPU speed, and amount of RAM on the server. Reasonable file system check times (without serious file system problems), are expected to take five and thirty minutes per TB.</para>
+        <para>File system check time on OSTs is affected by a number of  variables in addition to
+          the number of inodes, including the size of the file system, the number of allocated
+          blocks, the distribution of allocated blocks on the disk, disk speed, CPU speed, and the
+          amount of RAM on the server. Reasonable file system check times are 5-30 minutes per
+          TB.</para>
        </note>
-      <para>For more details on formatting MDT and OST file systems, see <xref linkend="dbdoclet.50438208_51921"/>.</para>
+      <para>For more details about formatting MDT and OST file systems, see <xref
+          linkend="dbdoclet.50438208_51921"/>.</para>
      </section>
      <section remap="h3">
-      <title><indexterm><primary>setup</primary><secondary>limits</secondary></indexterm>File and File System Limits</title>
-      <para><xref linkend="settinguplustresystem.tab1"/> describes file and file system size limits. These limits are imposed by either the Lustre architecture or the Linux virtual file system (VFS) and virtual memory subsystems. In a few cases, a limit is defined within the code and can be changed by re-compiling Lustre (see <xref linkend="installinglustrefromsourcecode"/>). In these cases, the indicated limit was used for Lustre testing. </para>
+      <title><indexterm>
+          <primary>setup</primary>
+          <secondary>limits</secondary>
+        </indexterm><indexterm xmlns:xi="http://www.w3.org/2001/XInclude">
+          <primary>wide striping</primary>
+        </indexterm><indexterm xmlns:xi="http://www.w3.org/2001/XInclude">
+          <primary>xattr</primary>
+          <secondary><emphasis role="italic">See</emphasis> wide striping</secondary>
+        </indexterm><indexterm>
+          <primary>large_xattr</primary>
+          <secondary>ea_inode</secondary>
+        </indexterm><indexterm>
+          <primary>wide striping</primary>
+          <secondary>large_xattr</secondary>
+          <tertiary>ea_inode</tertiary>
+        </indexterm>File and File System Limits</title>
+      <para><xref linkend="settinguplustresystem.tab2"/> describes file and file system size limits.
+        These limits are imposed by either the Lustre architecture or the Linux virtual file system
+        (VFS) and virtual memory subsystems. In a few cases, a limit is defined within the code and
+        can be changed by re-compiling the Lustre software (see <xref
+          linkend="installinglustrefromsourcecode"/>). In these cases, the indicated limit was used
+        for testing of the Lustre software. </para>
        <table frame="all">
-        <title xml:id="settinguplustresystem.tab1">File and file system limits</title>
+        <title xml:id="settinguplustresystem.tab2">File and file system limits</title>
          <tgroup cols="3">
            <colspec colname="c1" colwidth="3*"/>
            <colspec colname="c2" colwidth="2*"/>
@@ -168,91 +388,111 @@
            <tbody>
              <row>
                <entry>
-                <para> Maximum stripe count</para>
+                <para> Maximum number of MDTs</para>
                </entry>
                <entry>
-                <para> 160</para>
+                <para> 1</para>
+                <para condition='l24'>4096</para>
                </entry>
                <entry>
-                <para>This limit is hard-coded, but is near the upper limit imposed by the underlying ldiskfs file system.</para>
+                <para>The Lustre software release 2.3 and earlier allows a maximum of 1 MDT per file
+                  system, but a single MDS can host multiple MDTs, each one for a separate file
+                  system.</para>
+                <para condition="l24">The Lustre software release 2.4 and later requires one MDT for
+                  the filesystem root. Up to 4095 additional MDTs can be added to the file system and attached
+                  into the namespace with remote directories.</para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Maximum stripe size</para>
+                <para> Maximum number of OSTs</para>
                </entry>
                <entry>
-                <para> &lt; 4 GB</para>
+                <para> 8150</para>
                </entry>
                <entry>
-                <para>The amount of data written to each object before moving on to next object.</para>
+                <para>The maximum number of OSTs is a constant that can be changed at compile time.
+                  Lustre file systems with up to 4000 OSTs have been tested.</para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Minimum stripe size</para>
+                <para> Maximum OST size</para>
                </entry>
                <entry>
-                <para> 64 KB</para>
+                <para> 128TB (ldiskfs), 256TB (ZFS)</para>
                </entry>
                <entry>
-                <para>Due to the 64 KB PAGE_SIZE on some 64-bit machines, the minimum stripe size is set to 64 KB.</para>
+                <para>This is not a <emphasis>hard</emphasis> limit. Larger OSTs are possible but
+                  today typical production systems do not go beyond the stated limit per OST. </para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Maximum object size</para>
+                <para> Maximum number of clients</para>
                </entry>
                <entry>
-                <para> 2 TB</para>
+                <para> 131072</para>
                </entry>
                <entry>
-                <para>The amount of data that can be stored in a single object. The ldiskfs limit of 2TB for a single file applies. Lustre allows 160 stripes of 2 TB each.</para>
+                <para>The maximum number of clients is a constant that can be changed at compile time. Up to 30000 clients have been used in production.</para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Maximum number of OSTs</para>
+                <para> Maximum size of a file system</para>
                </entry>
                <entry>
-                <para> 8150</para>
+                <para> 512 PB (ldiskfs), 1EB (ZFS)</para>
                </entry>
                <entry>
-                <para>The maximum number of OSTs is a constant that can be changed at compile time. Lustre has been tested with up to 4000 OSTs.</para>
+                <para>Each OST or MDT on 64-bit kernel servers can have a file system up to the above limit. On 32-bit systems, due to page cache limits, 16TB is the maximum block device size, which in turn applies to the size of OST on 32-bit kernel servers.</para>
+                <para>You can have multiple OST file systems on a single OSS node.</para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Maximum number of MDTs</para>
+                <para> Maximum stripe count</para>
                </entry>
                <entry>
-                <para> 1</para>
+                <para> 2000</para>
                </entry>
                <entry>
-                <para>Maximum of 1 MDT per file system, but a single MDS can host multiple MDTs, each one for a separate file system.</para>
+                <para>This limit is imposed by the size of the layout that needs to be stored on disk and sent in RPC requests, but is not a hard limit of the protocol.</para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Maximum number of clients</para>
+                <para> Maximum stripe size</para>
                </entry>
                <entry>
-                <para> 131072</para>
+                <para> &lt; 4 GB</para>
                </entry>
                <entry>
-                <para>The number of clients is a constant that can be changed at compile time.</para>
+                <para>The amount of data written to each object before moving on to next object.</para>
                </entry>
              </row>
              <row>
                <entry>
-                <para> Maximum size of a file system</para>
+                <para> Minimum stripe size</para>
                </entry>
                <entry>
-                <para> 64 PB</para>
+                <para> 64 KB</para>
                </entry>
                <entry>
-                <para>Each OST or MDT can have a file system up to 16 TB, regardless of whether 32-bit or 64-bit kernels are on the server.</para>
-                <para>You can have multiple OST file systems on a single OSS node.</para>
+                <para>Due to the 64 KB PAGE_SIZE on some 64-bit machines, the minimum stripe size is set to 64 KB.</para>
+              </entry>
+            </row>
+            <row>              <entry>
+                <para> Maximum object size</para>              </entry>
+              <entry>
+                <para> 16TB (ldiskfs), 256TB (ZFS)</para>
+              </entry>
+              <entry>
+                <para>The amount of data that can be stored in a single object. An object
+                  corresponds to a stripe. The ldiskfs limit of 16 TB for a single object applies.  
+                  For ZFS the limit is the size of the underlying OST.
+                  Files can consist of up to 2000 stripes, each stripe can contain the maximum object size. </para>
                </entry>
              </row>
              <row>
@@ -262,11 +502,13 @@
                <entry>
                  <para> 16 TB on 32-bit systems</para>
                  <para>&#160;</para>
-                <para>320 TB on 64-bit systems</para>
+                <para> 31.25 PB on 64-bit ldiskfs systems, 8EB on 64-bit ZFS systems</para>
                </entry>
                <entry>
-                <para>Individual files have a hard limit of nearly 16 TB on 32-bit systems imposed by the kernel memory subsystem. On 64-bit systems this limit does not exist. Hence, files can be 64-bits in size. Lustre imposes an additional size limit of up to the number of stripes, where each stripe is 2 TB.</para>
-                <para>A single file can have a maximum of 160 stripes, which gives an upper single file limit of 320 TB for 64-bit systems. The actual amount of data that can be stored in a file depends upon the amount of free space in each OST on which the file is striped.</para>
+                <para>Individual files have a hard limit of nearly 16 TB on 32-bit systems imposed
+                  by the kernel memory subsystem. On 64-bit systems this limit does not exist.
+                  Hence, files can be 2^63 bits (8EB) in size if the backing filesystem can support large enough objects.</para>
+                <para>A single file can have a maximum of 2000 stripes, which gives an upper single file limit of 31.25 PB for 64-bit ldiskfs systems. The actual amount of data that can be stored in a file depends upon the amount of free space in each OST on which the file is striped.</para>
                </entry>
              </row>
              <row>
@@ -274,11 +516,14 @@
                  <para> Maximum number of files or subdirectories in a single directory</para>
                </entry>
                <entry>
-                <para> 10 million files</para>
+                <para> 10 million files (ldiskfs), 2^48 (ZFS)</para>
                </entry>
                <entry>
-                <para>Lustre uses the ldiskfs hashed directory code, which has a limit of about 10 million files depending on the length of the file name. The limit on subdirectories is the same as the limit on regular files.</para>
-                <para>Lustre is tested with ten million files in a single directory.</para>
+                <para>The Lustre software uses the ldiskfs hashed directory code, which has a limit
+                  of about 10 million files depending on the length of the file name. The limit on
+                  subdirectories is the same as the limit on regular files.</para>
+                <para>Lustre file systems are tested with ten million files in a single
+                  directory.</para>
                </entry>
              </row>
              <row>
@@ -286,11 +531,15 @@
                  <para> Maximum number of files in the file system</para>
                </entry>
                <entry>
-                <para> 4 billion</para>
+                <para> 4 billion (ldiskfs), 256 trillion (ZFS)</para>
+                <para condition='l24'>4096 times the per-MDT limit</para>
                </entry>
                <entry>
-                <para>The ldiskfs file system imposes an upper limit of 4 billion inodes. By default, the MDS file system is formatted with 4 KB of space per inode, meaning 512 million inodes per file system of 2 TB.</para>
+                <para>The ldiskfs file system imposes an upper limit of 4 billion inodes. By default, the MDS file system is formatted with 2KB of space per inode, meaning 1 billion inodes per file system of 2 TB.</para>
                  <para>This can be increased initially, at the time of MDS file system creation. For more information, see <xref linkend="settinguplustresystem"/>.</para>
+                               <para condition="l24">Each additional MDT can hold up to the above maximum number of additional files, depending
+                  on available space and the distribution directories and files in the file
+                  system.</para>
                </entry>
              </row>
              <row>
@@ -301,7 +550,7 @@
                  <para> 255 bytes (filename)</para>
                </entry>
                <entry>
-                <para>This limit is 255 bytes for a single filename, the same as in an ldiskfs file system.</para>
+                <para>This limit is 255 bytes for a single filename, the same as the limit in the underlying file systems.</para>
                </entry>
              </row>
              <row>
@@ -317,19 +566,33 @@
              </row>
              <row>
                <entry>
-                <para> Maximum number of open files for Lustre file systems</para>
+                <para> Maximum number of open files for a Lustre file system</para>
                </entry>
                <entry>
-                <para> None</para>
+                <para> No limit</para>
                </entry>
                <entry>
-                <para>Lustre does not impose a maximum for the number of open files, but the practical limit depends on the amount of RAM on the MDS. No &quot;tables&quot; for open files exist on the MDS, as they are only linked in a list to a given client&apos;s export. Each client process probably has a limit of several thousands of open files which depends on the ulimit.</para>
+                <para>The Lustre software does not impose a maximum for the number of open files,
+                  but the practical limit depends on the amount of RAM on the MDS. No
+                  &quot;tables&quot; for open files exist on the MDS, as they are only linked in a
+                  list to a given client&apos;s export. Each client process probably has a limit of
+                  several thousands of open files which depends on the ulimit.</para>
                </entry>
              </row>
            </tbody>
          </tgroup>
        </table>
        <para>&#160;</para>
+      <note>
+        <para condition="l22">In Lustre software releases prior to release 2.2, the maximum stripe
+          count for a single file was limited to 160 OSTs. In Lustre software release 2.2, the large
+            <literal>xattr</literal> feature ("wide striping") was added to support up to 2000 OSTs.
+          This feature is disabled by default at <literal>mkfs.lustre</literal> time. In order to
+          enable this feature, set the "<literal>-O large_xattr</literal>" or "<literal>-O ea_inode</literal>"
+          option on the MDT either by using <literal>--mkfsoptions</literal> at format time or by using
+            <literal>tune2fs</literal>. Using either "<literal>large_xattr</literal>" or "<literal>ea_inode</literal>"
+          results in "<literal>ea_inode</literal>" in the file system feature list.</para>
+      </note>
      </section>
    </section>
    <section xml:id="dbdoclet.50438256_26456">
@@ -388,14 +651,20 @@
      </section>
      <section remap="h3">
        <title><indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>OSS</tertiary></indexterm>OSS Memory Requirements</title>
-      <para>When planning the hardware for an OSS node, consider the memory usage of several components in the Lustre system (i.e., journal, service threads, file system metadata, etc.). Also, consider the effect of the OSS read cache feature, which consumes memory as it caches data on the OSS node.</para>
+      <para>When planning the hardware for an OSS node, consider the memory usage of several
+        components in the Lustre file system (i.e., journal, service threads, file system metadata,
+        etc.). Also, consider the effect of the OSS read cache feature, which consumes memory as it
+        caches data on the OSS node.</para>
        <para>In addition to the MDS memory requirements mentioned in <xref linkend="dbdoclet.50438256_87676"/>, the OSS requirements include:</para>
        <itemizedlist>
          <listitem>
-          <para><emphasis role="bold">Service threads</emphasis> : The service threads on the OSS node pre-allocate a 1 MB I/O buffer for each ost_io service thread, so these buffers do not need to be allocated and freed for each I/O request.</para>
+          <para><emphasis role="bold">Service threads</emphasis> : The service threads on the OSS node pre-allocate a 4 MB I/O buffer for each ost_io service thread, so these buffers do not need to be allocated and freed for each I/O request.</para>
          </listitem>
          <listitem>
-          <para><emphasis role="bold">OSS read cache</emphasis> : OSS read cache provides read-only caching of data on an OSS, using the regular Linux page cache to store the data. Just like caching from a regular file system in Linux, OSS read cache uses as much physical memory as is available.</para>
+          <para><emphasis role="bold">OSS read cache</emphasis> : OSS read cache provides read-only
+            caching of data on an OSS, using the regular Linux page cache to store the data. Just
+            like caching from a regular file system in the Linux operating system, OSS read cache
+            uses as much physical memory as is available.</para>
          </listitem>
        </itemizedlist>
        <para>The same calculation applies to files accessed from the OSS as for the MDS, but the load is distributed over many more OSSs nodes, so the amount of memory required for locks, inode cache, etc. listed under MDS is spread out over the OSS nodes.</para>
@@ -404,15 +673,15 @@
          <title><indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>OSS</tertiary></indexterm>Calculating OSS Memory Requirements</title>
          <para>The minimum recommended RAM size for an OSS with two OSTs is computed below:</para>
          <informalexample>
-          <para>Ethernet/TCP send/receive buffers (1 MB * 512 threads) = 512 MB</para>
+          <para>Ethernet/TCP send/receive buffers (4 MB * 512 threads) = 2048 MB</para>
            <para>400 MB journal size * 2 OST devices = 800 MB</para>
            <para>1.5 MB read/write per OST IO thread * 512 threads = 768 MB</para>
            <para>600 MB file system read cache * 2 OSTs = 1200 MB</para>
            <para>1000 * 4-core clients * 100 files/core * 2kB = 800MB</para>
            <para>16 interactive clients * 10,000 files * 2kB = 320MB</para>
            <para>1,600,000 file extra working set * 1.5kB/file = 2400MB</para>
-          <para>     DLM locks + filesystem metadata TOTAL = 3520MB</para>
-          <para>Per OSS DLM locks + filesystem metadata = 3520MB/6 OSS = 600MB (approx.)</para>
+          <para> DLM locks + file system metadata TOTAL = 3520MB</para>
+          <para>Per OSS DLM locks + file system metadata = 3520MB/6 OSS = 600MB (approx.)</para>
            <para>Per OSS RAM minimum requirement = 4096MB (approx.)</para>
          </informalexample>
          <para>This consumes about 1,400 MB just for the pre-allocated buffers, and an additional 2 GB for minimal file system and kernel usage. Therefore, for a non-failover configuration, the minimum RAM would be 4 GB for an OSS node with two OSTs. Adding additional memory on the OSS will improve the performance of reading smaller, frequently-accessed files.</para>
@@ -422,9 +691,15 @@
      </section>
    </section>
    <section xml:id="dbdoclet.50438256_78272">
-    <title><indexterm><primary>setup</primary><secondary>network</secondary></indexterm>Implementing Networks To Be Used by Lustre</title>
-    <para>As a high performance file system, Lustre places heavy loads on networks. Thus, a network interface in each Lustre server and client is commonly dedicated to Lustre traffic. This is often a dedicated TCP/IP subnet, although other network hardware can also be used.</para>
-    <para>A typical Lustre implementation may include the following:</para>
+    <title><indexterm>
+        <primary>setup</primary>
+        <secondary>network</secondary>
+      </indexterm>Implementing Networks To Be Used by the Lustre File System</title>
+    <para>As a high performance file system, the Lustre file system places heavy loads on networks.
+      Thus, a network interface in each Lustre server and client is commonly dedicated to Lustre
+      file system traffic. This is often a dedicated TCP/IP subnet, although other network hardware
+      can also be used.</para>
+    <para>A typical Lustre file system implementation may include the following:</para>
      <itemizedlist>
        <listitem>
          <para>A high-performance backend network for the Lustre servers, typically an InfiniBand (IB) network.</para>
@@ -436,16 +711,30 @@
          <para>Lustre routers to connect the two networks.</para>
        </listitem>
      </itemizedlist>
-    <para>Lustre networks and routing are configured and managed by specifying parameters to the Lustre Networking (lnet) module in <literal>/etc/modprobe.conf</literal> or <literal>/etc/modprobe.conf.local</literal> (depending on your Linux distribution).</para>
-    <para>To prepare to configure Lustre Networking, complete the following steps:</para>
+    <para>Lustre networks and routing are configured and managed by specifying parameters to the
+      Lustre networking (<literal>lnet</literal>) module in
+        <literal>/etc/modprobe.d/lustre.conf</literal>.</para>
+    <para>To prepare to configure Lustre networking, complete the following steps:</para>
      <orderedlist>
        <listitem>
-        <para><emphasis role="bold">Identify all machines that will be running Lustre and the network interfaces they will use to run Lustre traffic. These machines will form the Lustre network.</emphasis></para>
-        <para>A network is a group of nodes that communicate directly with one another. Lustre includes Lustre network drivers (LNDs) to support a variety of network types and hardware (see <xref linkend="understandinglustrenetworking"/> for a complete list). The standard rules for specifying networks applies to Lustre networks. For example, two TCP networks on two different subnets (<literal>tcp0</literal> and <literal>tcp1</literal>) are considered to be two different Lustre networks.</para>
+        <para><emphasis role="bold">Identify all machines that will be running Lustre software and
+            the network interfaces they will use to run Lustre file system traffic. These machines
+            will form the Lustre network .</emphasis></para>
+        <para>A network is a group of nodes that communicate directly with one another. The Lustre
+          software includes Lustre network drivers (LNDs) to support a variety of network types and
+          hardware (see <xref linkend="understandinglustrenetworking"/> for a complete list). The
+          standard rules for specifying networks applies to Lustre networks. For example, two TCP
+          networks on two different subnets (<literal>tcp0</literal> and <literal>tcp1</literal>)
+          are considered to be two different Lustre networks.</para>
        </listitem>
        <listitem>
          <para><emphasis role="bold">If routing is needed, identify the nodes to be used to route traffic between networks.</emphasis></para>
-        <para>If you are using multiple network types, then you&apos;ll need a router. Any node with appropriate interfaces can route Lustre Networking (LNET) traffic between different network hardware types or topologies --the node may be a server, a client, or a standalone router. LNET can route messages between different network types (such as TCP-to-InfiniBand) or across different topologies (such as bridging two InfiniBand or TCP/IP networks). Routing will be configured in <xref linkend="configuringlnet"/>.</para>
+        <para>If you are using multiple network types, then you will need a router. Any node with
+          appropriate interfaces can route Lustre networking (LNET) traffic between different
+          network hardware types or topologies --the node may be a server, a client, or a standalone
+          router. LNET can route messages between different network types (such as
+          TCP-to-InfiniBand) or across different topologies (such as bridging two InfiniBand or
+          TCP/IP networks). Routing will be configured in <xref linkend="configuringlnet"/>.</para>
        </listitem>
        <listitem>
          <para><emphasis role="bold">Identify the network interfaces to include in or exclude from LNET. </emphasis>
@@ -455,7 +744,7 @@
        </listitem>
        <listitem>
          <para><emphasis role="bold">To ease the setup of networks with complex network configurations, determine a cluster-wide module configuration.</emphasis></para>
-        <para>For large clusters, you can configure the networking setup for all nodes by using a single, unified set of parameters in the modprobe.conf file on each node. Cluster-wide configuration is described in <xref linkend="configuringlnet"/>.</para>
+        <para>For large clusters, you can configure the networking setup for all nodes by using a single, unified set of parameters in the <literal>lustre.conf</literal> file on each node. Cluster-wide configuration is described in <xref linkend="configuringlnet"/>.</para>
        </listitem>
      </orderedlist>
      <note>