SettingUpLustreSystem.xml

   1 <?xml version='1.0' encoding='UTF-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="settinguplustresystem">
   3   <title xml:id="settinguplustresystem.title">Determining Hardware Configuration Requirements and
   4     Formatting Options</title>
   5   <para>This chapter describes hardware configuration requirements for a Lustre file system
   6     including:</para>
   7   <itemizedlist>
   8     <listitem>
   9       <para>
  10           <xref linkend="dbdoclet.50438256_49017"/>
  11       </para>
  12     </listitem>
  13     <listitem>
  14       <para>
  15           <xref linkend="dbdoclet.space_requirements"/>
  16       </para>
  17     </listitem>
  18     <listitem>
  19       <para>
  20           <xref linkend="dbdoclet.ldiskfs_mkfs_opts"/>
  21       </para>
  22     </listitem>
  23     <listitem>
  24       <para>
  25           <xref linkend="dbdoclet.50438256_26456"/>
  26       </para>
  27     </listitem>
  28     <listitem>
  29       <para>
  30           <xref linkend="dbdoclet.50438256_78272"/>
  31       </para>
  32     </listitem>
  33   </itemizedlist>
  34   <section xml:id="dbdoclet.50438256_49017">
  35       <title><indexterm><primary>setup</primary></indexterm>
  36   <indexterm><primary>setup</primary><secondary>hardware</secondary></indexterm>
  37   <indexterm><primary>design</primary><see>setup</see></indexterm>
  38           Hardware Considerations</title>
  39     <para>A Lustre file system can utilize any kind of block storage device such as single disks,
  40       software RAID, hardware RAID, or a logical volume manager. In contrast to some networked file
  41       systems, the block devices are only attached to the MDS and OSS nodes in a Lustre file system
  42       and are not accessed by the clients directly.</para>
  43     <para>Since the block devices are accessed by only one or two server nodes, a storage area network (SAN) that is accessible from all the servers is not required. Expensive switches are not needed because point-to-point connections between the servers and the storage arrays normally provide the simplest and best attachments. (If failover capability is desired, the storage must be attached to multiple servers.)</para>
  44     <para>For a production environment, it is preferable that the MGS have separate storage to allow future expansion to multiple file systems. However, it is possible to run the MDS and MGS on the same machine and have them share the same storage device.</para>
  45     <para>For best performance in a production environment, dedicated clients are required. For a non-production Lustre environment or for testing, a Lustre client and server can run on the same machine. However, dedicated clients are the only supported configuration.</para>
  46     <warning><para>Performance and recovery issues can occur if you put a client on an MDS or OSS:</para>
  47     <itemizedlist>
  48       <listitem>
  49         <para>Running the OSS and a client on the same machine can cause issues with low memory and memory pressure. If the client consumes all the memory and then tries to write data to the file system, the OSS will need to allocate pages to receive data from the client but will not be able to perform this operation due to low memory. This can cause the client to hang.</para>
  50       </listitem>
  51       <listitem>
  52         <para>Running the MDS and a client on the same machine can cause recovery and deadlock issues and impact the performance of other Lustre clients.</para>
  53       </listitem>
  54     </itemizedlist>
  55     </warning>
  56     <para>Only servers running on 64-bit CPUs are tested and supported. 64-bit CPU clients are
  57       typically used for testing to match expected customer usage and avoid limitations due to the 4
  58       GB limit for RAM size, 1 GB low-memory limitation, and 16 TB file size limit of 32-bit CPUs.
  59       Also, due to kernel API limitations, performing backups of Lustre software release 2.x. file
  60       systems on 32-bit clients may cause backup tools to confuse files that have the same 32-bit
  61       inode number.</para>
  62     <para>The storage attached to the servers typically uses RAID to provide fault tolerance and can
  63       optionally be organized with logical volume management (LVM), which is then formatted as a
  64       Lustre file system. Lustre OSS and MDS servers read, write and modify data in the format
  65       imposed by the file system.</para>
  66     <para>The Lustre file system uses journaling file system technology on both the MDTs and OSTs.
  67       For a MDT, as much as a 20 percent performance gain can be obtained by placing the journal on
  68       a separate device.</para>
  69     <para>The MDS can effectively utilize a lot of CPU cycles. A minimum of four processor cores are recommended. More are advisable for files systems with many clients.</para>
  70     <note>
  71       <para>Lustre clients running on architectures with different endianness are supported. One limitation is that the PAGE_SIZE kernel macro on the client must be as large as the PAGE_SIZE of the server. In particular, ia64 or PPC clients with large pages (up to 64kB pages) can run with x86 servers (4kB pages). If you are running x86 clients with ia64 or PPC servers, you must compile the ia64 kernel with a 4kB PAGE_SIZE (so the server page size is not larger than the client page size). </para>
  72     </note>
  73     <section remap="h3">
  74         <title><indexterm>
  75           <primary>setup</primary>
  76           <secondary>MDT</secondary>
  77         </indexterm> MGT and MDT Storage Hardware Considerations</title>
  78       <para>MGT storage requirements are small (less than 100 MB even in the largest Lustre file
  79         systems), and the data on an MGT is only accessed on a server/client mount, so disk
  80         performance is not a consideration.  However, this data is vital for file system access, so
  81         the MGT should be reliable storage, preferably mirrored RAID1.</para>
  82       <para>MDS storage is accessed in a database-like access pattern with many seeks and
  83         read-and-writes of small amounts of data. High throughput to MDS storage is not important.
  84         Storage types that provide much lower seek times, such as high-RPM SAS or SSD drives can be
  85         used for the MDT.</para>
  86       <para>For maximum performance, the MDT should be configured as RAID1 with an internal journal and two disks from different controllers.</para>
  87       <para>If you need a larger MDT, create multiple RAID1 devices from pairs of disks, and then make a RAID0 array of the RAID1 devices. This ensures maximum reliability because multiple disk failures only have a small chance of hitting both disks in the same RAID1 device.</para>
  88       <para>Doing the opposite (RAID1 of a pair of RAID0 devices) has a 50% chance that even two disk failures can cause the loss of the whole MDT device. The first failure disables an entire half of the mirror and the second failure has a 50% chance of disabling the remaining mirror.</para>
  89       <para condition='l24'>If multiple MDTs are going to be present in the
  90       system, each MDT should be specified for the anticipated usage and load.
  91       For details on how to add additional MDTs to the filesystem, see
  92       <xref linkend="dbdoclet.addingamdt"/>.</para>
  93       <warning condition='l24'><para>MDT0 contains the root of the Lustre file
  94       system. If MDT0 is unavailable for any reason, the file system cannot be
  95       used.</para></warning>
  96       <note condition='l24'><para>Using the DNE feature it is possible to
  97       dedicate additional MDTs to sub-directories off the file system root
  98       directory stored on MDT0, or arbitrarily for lower-level subdirectories.
  99       using the <literal>lfs mkdir -i <replaceable>mdt_index</replaceable></literal> command.
 100       If an MDT serving a subdirectory becomes unavailable, any subdirectories
 101       on that MDT and all directories beneath it will also become inaccessible.
 102       Configuring multiple levels of MDTs is an experimental feature for the
 103       2.4 release, and is fully functional in the 2.8 release.  This is
 104       typically useful for top-level directories to assign different users
 105       or projects to separate MDTs, or to distribute other large working sets
 106       of files to multiple MDTs.</para></note>
 107       <note condition='l28'><para>Starting in the 2.8 release it is possible
 108       to spread a single large directory across multiple MDTs using the DNE
 109       striped directory feature by specifying multiple stripes (or shards)
 110       at creation time using the
 111       <literal>lfs mkdir -c <replaceable>stripe_count</replaceable></literal>
 112       command, where <replaceable>stripe_count</replaceable> is often the
 113       number of MDTs in the filesystem.  Striped directories should typically
 114       not be used for all directories in the filesystem, since this incurs
 115       extra overhead compared to non-striped directories, but is useful for
 116       larger directories (over 50k entries) where many output files are being
 117       created at one time.
 118       </para></note>
 119     </section>
 120     <section remap="h3">
 121       <title><indexterm><primary>setup</primary><secondary>OST</secondary></indexterm>OST Storage Hardware Considerations</title>
 122       <para>The data access pattern for the OSS storage is a streaming I/O
 123       pattern that is dependent on the access patterns of applications being
 124       used. Each OSS can manage multiple object storage targets (OSTs), one
 125       for each volume with I/O traffic load-balanced between servers and
 126       targets. An OSS should be configured to have a balance between the
 127       network bandwidth and the attached storage bandwidth to prevent
 128       bottlenecks in the I/O path. Depending on the server hardware, an OSS
 129       typically serves between 2 and 8 targets, with each target between
 130       24-48TB, but may be up to 256 terabytes (TBs) in size.</para>
 131       <para>Lustre file system capacity is the sum of the capacities provided
 132       by the targets. For example, 64 OSSs, each with two 8 TB OSTs,
 133       provide a file system with a capacity of nearly 1 PB. If each OST uses
 134       ten 1 TB SATA disks (8 data disks plus 2 parity disks in a RAID-6
 135       configuration), it may be possible to get 50 MB/sec from each drive,
 136       providing up to 400 MB/sec of disk bandwidth per OST. If this system
 137       is used as storage backend with a system network, such as the InfiniBand
 138       network, that provides a similar bandwidth, then each OSS could provide
 139       800 MB/sec of end-to-end I/O throughput. (Although the architectural
 140       constraints described here are simple, in practice it takes careful
 141       hardware selection, benchmarking and integration to obtain such
 142       results.)</para>
 143     </section>
 144   </section>
 145   <section xml:id="dbdoclet.space_requirements">
 146       <title><indexterm><primary>setup</primary><secondary>space</secondary></indexterm>
 147           <indexterm><primary>space</primary><secondary>determining requirements</secondary></indexterm>
 148           Determining Space Requirements</title>
 149     <para>The desired performance characteristics of the backing file systems
 150     on the MDT and OSTs are independent of one another. The size of the MDT
 151     backing file system depends on the number of inodes needed in the total
 152     Lustre file system, while the aggregate OST space depends on the total
 153     amount of data stored on the file system. If MGS data is to be stored
 154     on the MDT device (co-located MGT and MDT), add 100 MB to the required
 155     size estimate for the MDT.</para>
 156     <para>Each time a file is created on a Lustre file system, it consumes
 157     one inode on the MDT and one OST object over which the file is striped.
 158     Normally, each file&apos;s stripe count is based on the system-wide
 159     default stripe count.  However, this can be changed for individual files
 160     using the <literal>lfs setstripe</literal> option. For more details,
 161     see <xref linkend="managingstripingfreespace"/>.</para>
 162     <para>In a Lustre ldiskfs file system, all the MDT inodes and OST
 163     objects are allocated when the file system is first formatted.  When
 164     the file system is in use and a file is created, metadata associated
 165     with that file is stored in one of the pre-allocated inodes and does
 166     not consume any of the free space used to store file data.  The total
 167     number of inodes on a formatted ldiskfs MDT or OST cannot be easily
 168     changed. Thus, the number of inodes created at format time should be
 169     generous enough to anticipate near term expected usage, with some room
 170     for growth without the effort of additional storage.</para>
 171     <para>By default, the ldiskfs file system used by Lustre servers to store
 172     user-data objects and system data reserves 5% of space that cannot be used
 173     by the Lustre file system.  Additionally, an ldiskfs Lustre file system
 174     reserves up to 400 MB on each OST, and up to 4GB on each MDT for journal
 175     use and a small amount of space outside the journal to store accounting
 176     data. This reserved space is unusable for general storage. Thus, at least
 177     this much space will be used per OST before any file object data is saved.
 178     </para>
 179     <para condition="l24">With a ZFS backing filesystem for the MDT or OST,
 180     the space allocation for inodes and file data is dynamic, and inodes are
 181     allocated as needed.  A minimum of 4kB of usable space (before mirroring)
 182     is needed for each inode, exclusive of other overhead such as directories,
 183     internal log files, extended attributes, ACLs, etc.  ZFS also reserves
 184     approximately 3% of the total storage space for internal and redundant
 185     metadata, which is not usable by Lustre.
 186     Since the size of extended attributes and ACLs is highly dependent on
 187     kernel versions and site-specific policies, it is best to over-estimate
 188     the amount of space needed for the desired number of inodes, and any
 189     excess space will be utilized to store more inodes.
 190     </para>
 191     <section>
 192       <title><indexterm>
 193           <primary>setup</primary>
 194           <secondary>MGT</secondary>
 195         </indexterm>
 196         <indexterm>
 197           <primary>space</primary>
 198           <secondary>determining MGT requirements</secondary>
 199         </indexterm> Determining MGT Space Requirements</title>
 200       <para>Less than 100 MB of space is typically required for the MGT.
 201       The size is determined by the total number of servers in the Lustre
 202       file system cluster(s) that are managed by the MGS.</para>
 203     </section>
 204     <section xml:id="dbdoclet.50438256_87676">
 205         <title><indexterm>
 206           <primary>setup</primary>
 207           <secondary>MDT</secondary>
 208         </indexterm>
 209         <indexterm>
 210           <primary>space</primary>
 211           <secondary>determining MDT requirements</secondary>
 212         </indexterm> Determining MDT Space Requirements</title>
 213       <para>When calculating the MDT size, the important factor to consider
 214       is the number of files to be stored in the file system, which depends on
 215       at least 4 KiB per inode of usable space on the MDT.  Since MDTs typically
 216       use RAID-1+0 mirroring, the total storage needed will be double this.
 217       </para>
 218       <para>Please note that the actual used space per MDT depends on the number
 219       of files per directory, the number of stripes per file, whether files
 220       have ACLs or user xattrs, and the number of hard links per file.  The
 221       storage required for Lustre file system metadata is typically 1-2
 222       percent of the total file system capacity depending upon file size.</para>
 223       <para>For example, if the average file size is 5 MiB and you have
 224       100 TiB of usable OST space, then you can calculate the minimum total
 225       number of inodes each for MDTs and OSTs as follows:</para>
 226       <informalexample>
 227         <para>(500 TB * 1000000 MB/TB) / 5 MB/inode = 100M inodes</para>
 228       </informalexample>
 229       <para>For details about formatting options for ldiskfs MDT and OST file
 230       systems, see <xref linkend="dbdoclet.ldiskfs_mdt_mkfs"/>.</para>
 231       <para>It is recommended that the MDT have at least twice the minimum
 232       number of inodes to allow for future expansion and allow for an average
 233       file size smaller than expected. Thus, the minimum space for an ldiskfs
 234       MDT should be approximately:
 235       </para>
 236       <informalexample>
 237         <para>2 KiB/inode x 100 million inodes x 2 = 400 GiB ldiskfs MDT</para>
 238       </informalexample>
 239       <note>
 240         <para>If the average file size is very small, 4 KB for example, the
 241         MDT will use as much space for each file as the space used on the OST.
 242         However, this is an uncommon usage for a Lustre filesystem.</para>
 243       </note>
 244       <note>
 245         <para>If the MDT has too few inodes, this can cause the space on the
 246         OSTs to be inaccessible since no new files can be created. Be sure to
 247         determine the appropriate size of the MDT needed to support the file
 248         system before formatting the file system. It is possible to increase the
 249         number of inodes after the file system is formatted, depending on the
 250         storage.  For ldiskfs MDT filesystems the <literal>resize2fs</literal>
 251         tool can be used if the underlying block device is on a LVM logical
 252         volume and the underlying logical volume size can be increased.
 253         For ZFS new (mirrored) VDEVs can be added to the MDT pool to increase
 254         the total space available for inode storage.
 255         Inodes will be added approximately in proportion to space added.
 256         </para>
 257       </note>
 258       <note condition='l24'>
 259         <para>Note that the number of total and free inodes reported by
 260         <literal>lfs df -i</literal> for ZFS MDTs and OSTs is estimated based
 261         on the current average space used per inode.  When a ZFS filesystem is
 262         first formatted, this free inode estimate will be very conservative
 263         (low) due to the high ratio of directories to regular files created for
 264         internal Lustre metadata storage, but this estimate will improve as
 265         more files are created by regular users and the average file size will
 266         better reflect actual site usage.
 267         </para>
 268       </note>
 269       <note condition='l24'>
 270         <para>Starting in release 2.4, using the DNE remote directory feature
 271         it is possible to increase the total number of inodes of a Lustre
 272         filesystem, as well as increasing the aggregate metadata performance,
 273         by configuring additional MDTs into the filesystem, see
 274         <xref linkend="dbdoclet.addingamdt"/> for details.
 275         </para>
 276       </note>
 277     </section>
 278     <section remap="h3">
 279         <title><indexterm>
 280           <primary>setup</primary>
 281           <secondary>OST</secondary>
 282         </indexterm>
 283         <indexterm>
 284           <primary>space</primary>
 285           <secondary>determining OST requirements</secondary>
 286         </indexterm> Determining OST Space Requirements</title>
 287       <para>For the OST, the amount of space taken by each object depends on
 288       the usage pattern of the users/applications running on the system. The
 289       Lustre software defaults to a conservative estimate for the average
 290       object size (between 64KB per object for 10GB OSTs, and 1MB per object
 291       for 16TB and larger OSTs). If you are confident that the average file
 292       size for your applications will be larger than this, you can specify a
 293       larger average file size (fewer total inodes for a given OST size) to
 294       reduce file system overhead and minimize file system check time.
 295       See <xref linkend="dbdoclet.ldiskfs_ost_mkfs"/> for more details.</para>
 296     </section>
 297   </section>
 298   <section xml:id="dbdoclet.ldiskfs_mkfs_opts">
 299     <title>
 300       <indexterm>
 301         <primary>ldiskfs</primary>
 302         <secondary>formatting options</secondary>
 303       </indexterm>
 304       <indexterm>
 305         <primary>setup</primary>
 306         <secondary>ldiskfs</secondary>
 307       </indexterm>
 308       Setting ldiskfs File System Formatting Options
 309     </title>
 310     <para>By default, the <literal>mkfs.lustre</literal> utility applies these
 311     options to the Lustre backing file system used to store data and metadata
 312     in order to enhance Lustre file system performance and scalability. These
 313     options include:</para>
 314         <itemizedlist>
 315             <listitem>
 316               <para><literal>flex_bg</literal> - When the flag is set to enable this
 317           flexible-block-groups feature, block and inode bitmaps for multiple groups are aggregated
 318           to minimize seeking when bitmaps are read or written and to reduce read/modify/write
 319           operations on typical RAID storage (with 1 MB RAID stripe widths). This flag is enabled on
 320           both OST and MDT file systems. On MDT file systems the <literal>flex_bg</literal> factor
 321           is left at the default value of 16. On OSTs, the <literal>flex_bg</literal> factor is set
 322           to 256 to allow all of the block or inode bitmaps in a single <literal>flex_bg</literal>
 323           to be read or written in a single I/O on typical RAID storage.</para>
 324             </listitem>
 325             <listitem>
 326               <para><literal>huge_file</literal> - Setting this flag allows files on OSTs to be
 327           larger than 2 TB in size.</para>
 328             </listitem>
 329             <listitem>
 330               <para><literal>lazy_journal_init</literal> - This extended option is enabled to
 331           prevent a full overwrite of the 400 MB journal that is allocated by default in a Lustre
 332           file system, which reduces the file system format time.</para>
 333             </listitem>
 334         </itemizedlist>
 335     <para>To override the default formatting options, use arguments to
 336         <literal>mkfs.lustre</literal> to pass formatting options to the backing file system:</para>
 337     <screen>--mkfsoptions=&apos;backing fs options&apos;</screen>
 338     <para>For other <literal>mkfs.lustre</literal> options, see the Linux man page for
 339         <literal>mke2fs(8)</literal>.</para>
 340     <section xml:id="dbdoclet.ldiskfs_mdt_mkfs">
 341       <title><indexterm>
 342           <primary>inodes</primary>
 343           <secondary>MDS</secondary>
 344         </indexterm><indexterm>
 345           <primary>setup</primary>
 346           <secondary>inodes</secondary>
 347         </indexterm>Setting Formatting Options for an ldiskfs MDT</title>
 348       <para>The number of inodes on the MDT is determined at format time
 349       based on the total size of the file system to be created. The default
 350       <emphasis role="italic">bytes-per-inode</emphasis> ratio ("inode ratio")
 351       for an MDT is optimized at one inode for every 2048 bytes of file
 352       system space. It is recommended that this value not be changed for
 353       MDTs.</para>
 354       <para>This setting takes into account the space needed for additional
 355       ldiskfs filesystem-wide metadata, such as the journal (up to 4 GB),
 356       bitmaps, and directories, as well as files that Lustre uses internally
 357       to maintain cluster consistency.  There is additional per-file metadata
 358       such as file layout for files with a large number of stripes, Access
 359       Control Lists (ACLs), and user extended attributes.</para>
 360       <para> It is possible to reserve less than the recommended 2048 bytes
 361       per inode for an ldiskfs MDT when it is first formatted by adding the
 362       <literal>--mkfsoptions="-i bytes-per-inode"</literal> option to
 363       <literal>mkfs.lustre</literal>.  Decreasing the inode ratio tunable
 364       <literal>bytes-per-inode</literal> will create more inodes for a given
 365       MDT size, but will leave less space for extra per-file metadata.  The
 366       inode ratio must always be strictly larger than the MDT inode size,
 367       which is 512 bytes by default.  It is recommended to use an inode ratio
 368       at least 512 bytes larger than the inode size to ensure the MDT does
 369       not run out of space.</para>
 370       <para>The size of the inode may be changed by adding the
 371       <literal>--stripe-count-hint=N</literal> to have
 372       <literal>mkfs.lustre</literal> automatically calculate a reasonable
 373       inode size based on the default stripe count that will be used by the
 374       filesystem, or directly by specifying the
 375       <literal>--mkfsoptions="-I inode-size"</literal> option.  Increasing
 376       the inode size will provide more space in the inode for a larger Lustre
 377       file layout, ACLs, user and system extended attributes, SELinux and
 378       other security labels, and other internal metadata.  However, if these
 379       features or other in-inode xattrs are not needed, the larger inode size
 380       will hurt metadata performance as 2x, 4x, or 8x as much data would be
 381       read or written for each MDT inode access.
 382       </para>
 383     </section>
 384     <section xml:id="dbdoclet.ldiskfs_ost_mkfs">
 385       <title><indexterm>
 386           <primary>inodes</primary>
 387           <secondary>OST</secondary>
 388         </indexterm>Setting Formatting Options for an ldiskfs OST</title>
 389       <para>When formatting an OST file system, it can be beneficial
 390       to take local file system usage into account. When doing so, try to
 391       reduce the number of inodes on each OST, while keeping enough margin
 392       for potential variations in future usage. This helps reduce the format
 393       and file system check time and makes more space available for data.</para>
 394       <para>The table below shows the default
 395       <emphasis role="italic">bytes-per-inode</emphasis> ratio ("inode ratio")
 396       used for OSTs of various sizes when they are formatted.</para>
 397       <para>
 398         <table frame="all" xml:id="settinguplustresystem.tab1">
 399           <title>Default Inode Ratios Used for Newly Formatted OSTs</title>
 400           <tgroup cols="3">
 401             <colspec colname="c1" colwidth="3*"/>
 402             <colspec colname="c2" colwidth="2*"/>
 403             <colspec colname="c3" colwidth="4*"/>
 404             <thead>
 405               <row>
 406                 <entry>
 407                   <para><emphasis role="bold">LUN/OST size</emphasis></para>
 408                 </entry>
 409                 <entry>
 410                   <para><emphasis role="bold">Default Inode ratio</emphasis></para>
 411                 </entry>
 412                 <entry>
 413                   <para><emphasis role="bold">Total inodes</emphasis></para>
 414                 </entry>
 415               </row>
 416             </thead>
 417             <tbody>
 418               <row>
 419                 <entry>
 420                   <para> over 10GB </para>
 421                 </entry>
 422                 <entry>
 423                   <para> 1 inode/16KB </para>
 424                 </entry>
 425                 <entry>
 426                   <para> 640 - 655k </para>
 427                 </entry>
 428               </row>
 429               <row>
 430                 <entry>
 431                   <para> 10GB - 1TB </para>
 432                 </entry>
 433                 <entry>
 434                   <para> 1 inode/68kiB </para>
 435                 </entry>
 436                 <entry>
 437                   <para> 153k - 15.7M </para>
 438                 </entry>
 439               </row>
 440               <row>
 441                 <entry>
 442                   <para> 1TB - 8TB </para>
 443                 </entry>
 444                 <entry>
 445                   <para> 1 inode/256kB </para>
 446                 </entry>
 447                 <entry>
 448                   <para> 4.2M - 33.6M </para>
 449                 </entry>
 450               </row>
 451               <row>
 452                 <entry>
 453                   <para> over 8TB </para>
 454                 </entry>
 455                 <entry>
 456                   <para> 1 inode/1MB </para>
 457                 </entry>
 458                 <entry>
 459                   <para> 8.4M - 134M </para>
 460                 </entry>
 461               </row>
 462             </tbody>
 463           </tgroup>
 464         </table>
 465       </para>
 466       <para>In environments with few small files, the default inode ratio
 467       may result in far too many inodes for the average file size. In this
 468       case, performance can be improved by increasing the number of
 469       <emphasis role="italic">bytes-per-inode</emphasis>.  To set the inode
 470       ratio, use the <literal>--mkfsoptions="-i <replaceable>bytes-per-inode</replaceable>"</literal>
 471       argument to <literal>mkfs.lustre</literal> to specify the expected
 472       average (mean) size of OST objects.  For example, to create an OST
 473       with an expected average object size of 8MB run:
 474       <screen>[oss#] mkfs.lustre --ost --mkfsoptions=&quot;-i $((8192 * 1024))&quot; ...</screen>
 475       </para>
 476       <note>
 477         <para>OSTs formatted with ldiskfs are limited to a maximum of
 478         320 million to 1 billion objects.  Specifying a very small
 479         bytes-per-inode ratio for a large OST that causes this limit to be
 480         exceeded can cause either premature out-of-space errors and prevent
 481         the full OST space from being used, or will waste space and slow down
 482         e2fsck more than necessary.  The default inode ratios are chosen to
 483         ensure that the total number of inodes remain below this limit.
 484         </para>
 485       </note>
 486       <note>
 487         <para>File system check time on OSTs is affected by a number of
 488         variables in addition to the number of inodes, including the size of
 489         the file system, the number of allocated blocks, the distribution of
 490         allocated blocks on the disk, disk speed, CPU speed, and the amount
 491         of RAM on the server. Reasonable file system check times for valid
 492         filesystems are 5-30 minutes per TB, but may increase significantly
 493         if substantial errors are detected and need to be required.</para>
 494       </note>
 495       <para>For more details about formatting MDT and OST file systems,
 496       see <xref linkend="dbdoclet.ldiskfs_raid_opts"/>.</para>
 497     </section>
 498     <section remap="h3">
 499       <title><indexterm>
 500           <primary>setup</primary>
 501           <secondary>limits</secondary>
 502         </indexterm><indexterm xmlns:xi="http://www.w3.org/2001/XInclude">
 503           <primary>wide striping</primary>
 504         </indexterm><indexterm xmlns:xi="http://www.w3.org/2001/XInclude">
 505           <primary>xattr</primary>
 506           <secondary><emphasis role="italic">See</emphasis> wide striping</secondary>
 507         </indexterm><indexterm>
 508           <primary>large_xattr</primary>
 509           <secondary>ea_inode</secondary>
 510         </indexterm><indexterm>
 511           <primary>wide striping</primary>
 512           <secondary>large_xattr</secondary>
 513           <tertiary>ea_inode</tertiary>
 514         </indexterm>File and File System Limits</title>
 515
 516          <para><xref linkend="settinguplustresystem.tab2"/> describes
 517      current known limits of Lustre.  These limits are imposed by either
 518      the Lustre architecture or the Linux virtual file system (VFS) and
 519      virtual memory subsystems. In a few cases, a limit is defined within
 520      the code and can be changed by re-compiling the Lustre software.
 521      Instructions to install from source code are beyond the scope of this
 522      document, and can be found elsewhere online. In these cases, the
 523      indicated limit was used for testing of the Lustre software. </para>
 524
 525       <table frame="all" xml:id="settinguplustresystem.tab2">
 526         <title>File and file system limits</title>
 527         <tgroup cols="3">
 528           <colspec colname="c1" colwidth="3*"/>
 529           <colspec colname="c2" colwidth="2*"/>
 530           <colspec colname="c3" colwidth="4*"/>
 531           <thead>
 532             <row>
 533               <entry>
 534                 <para><emphasis role="bold">Limit</emphasis></para>
 535               </entry>
 536               <entry>
 537                 <para><emphasis role="bold">Value</emphasis></para>
 538               </entry>
 539               <entry>
 540                 <para><emphasis role="bold">Description</emphasis></para>
 541               </entry>
 542             </row>
 543           </thead>
 544           <tbody>
 545             <row>
 546               <entry>
 547                 <para> Maximum number of MDTs</para>
 548               </entry>
 549               <entry>
 550                 <para condition='l24'>256</para>
 551               </entry>
 552               <entry>
 553                 <para>The Lustre software release 2.3 and earlier allows a
 554                 maximum of 1 MDT per file system, but a single MDS can host
 555                 multiple MDTs, each one for a separate file system.</para>
 556                 <para condition="l24">The Lustre software release 2.4 and later
 557                 requires one MDT for the filesystem root. At least 255 more
 558                 MDTs can be added to the filesystem and attached into
 559                 the namespace with DNE remote or striped directories.</para>
 560               </entry>
 561             </row>
 562             <row>
 563               <entry>
 564                 <para> Maximum number of OSTs</para>
 565               </entry>
 566               <entry>
 567                 <para> 8150</para>
 568               </entry>
 569               <entry>
 570                 <para>The maximum number of OSTs is a constant that can be
 571                 changed at compile time.  Lustre file systems with up to
 572                 4000 OSTs have been tested.  Multiple OST file systems can
 573                 be configured on a single OSS node.</para>
 574               </entry>
 575             </row>
 576             <row>
 577               <entry>
 578                 <para> Maximum OST size</para>
 579               </entry>
 580               <entry>
 581                 <para> 128TB (ldiskfs), 256TB (ZFS)</para>
 582               </entry>
 583               <entry>
 584                 <para>This is not a <emphasis>hard</emphasis> limit. Larger
 585                 OSTs are possible but today typical production systems do not
 586                 typically go beyond the stated limit per OST because Lustre
 587                 can add capacity and performance with additional OSTs, and
 588                 having more OSTs improves aggregate I/O performance and
 589                 minimizes contention.
 590                 </para>
 591                 <para>
 592                 With 32-bit kernels, due to page cache limits, 16TB is the
 593                 maximum block device size, which in turn applies to the
 594                 size of OST.  It is strongly recommended to run Lustre
 595                 clients and servers with 64-bit kernels.</para>
 596               </entry>
 597             </row>
 598             <row>
 599               <entry>
 600                 <para> Maximum number of clients</para>
 601               </entry>
 602               <entry>
 603                 <para> 131072</para>
 604               </entry>
 605               <entry>
 606                 <para>The maximum number of clients is a constant that can
 607                 be changed at compile time. Up to 30000 clients have been
 608                 used in production.</para>
 609               </entry>
 610             </row>
 611             <row>
 612               <entry>
 613                 <para> Maximum size of a file system</para>
 614               </entry>
 615               <entry>
 616                 <para> 512 PB (ldiskfs), 1EB (ZFS)</para>
 617               </entry>
 618               <entry>
 619                 <para>Each OST can have a file system up to the
 620                 Maximum OST size limit, and the Maximum number of OSTs
 621                 can be combined into a single filesystem.
 622                 </para>
 623               </entry>
 624             </row>
 625             <row>
 626               <entry>
 627                 <para> Maximum stripe count</para>
 628               </entry>
 629               <entry>
 630                 <para> 2000</para>
 631               </entry>
 632               <entry>
 633                 <para>This limit is imposed by the size of the layout that
 634                 needs to be stored on disk and sent in RPC requests, but is
 635                 not a hard limit of the protocol. The number of OSTs in the
 636                 filesystem can exceed the stripe count, but this limits the
 637                 number of OSTs across which a single file can be striped.</para>
 638               </entry>
 639             </row>
 640             <row>
 641               <entry>
 642                 <para> Maximum stripe size</para>
 643               </entry>
 644               <entry>
 645                 <para> &lt; 4 GB</para>
 646               </entry>
 647               <entry>
 648                 <para>The amount of data written to each object before moving
 649                 on to next object.</para>
 650               </entry>
 651             </row>
 652             <row>
 653               <entry>
 654                 <para> Minimum stripe size</para>
 655               </entry>
 656               <entry>
 657                 <para> 64 KB</para>
 658               </entry>
 659               <entry>
 660                 <para>Due to the 64 KB PAGE_SIZE on some 64-bit machines,
 661                 the minimum stripe size is set to 64 KB.</para>
 662               </entry>
 663             </row>
 664             <row>
 665               <entry>
 666                 <para> Maximum object size</para>
 667               </entry>
 668               <entry>
 669                 <para> 16TB (ldiskfs), 256TB (ZFS)</para>
 670               </entry>
 671               <entry>
 672                 <para>The amount of data that can be stored in a single object.
 673                 An object corresponds to a stripe. The ldiskfs limit of 16 TB
 674                 for a single object applies.  For ZFS the limit is the size of
 675                 the underlying OST.  Files can consist of up to 2000 stripes,
 676                 each stripe can be up to the maximum object size. </para>
 677               </entry>
 678             </row>
 679             <row>
 680               <entry>
 681                 <para> Maximum <anchor xml:id="dbdoclet.50438256_marker-1290761" xreflabel=""/>file size</para>
 682               </entry>
 683               <entry>
 684                 <para> 16 TB on 32-bit systems</para>
 685                 <para>&#160;</para>
 686                 <para> 31.25 PB on 64-bit ldiskfs systems, 8EB on 64-bit ZFS systems</para>
 687               </entry>
 688               <entry>
 689                 <para>Individual files have a hard limit of nearly 16 TB on
 690                 32-bit systems imposed by the kernel memory subsystem. On
 691                 64-bit systems this limit does not exist.  Hence, files can
 692                 be 2^63 bits (8EB) in size if the backing filesystem can
 693                 support large enough objects.</para>
 694                 <para>A single file can have a maximum of 2000 stripes, which
 695                 gives an upper single file limit of 31.25 PB for 64-bit
 696                 ldiskfs systems. The actual amount of data that can be stored
 697                 in a file depends upon the amount of free space in each OST
 698                 on which the file is striped.</para>
 699               </entry>
 700             </row>
 701             <row>
 702               <entry>
 703                 <para> Maximum number of files or subdirectories in a single directory</para>
 704               </entry>
 705               <entry>
 706                 <para> 10 million files (ldiskfs), 2^48 (ZFS)</para>
 707               </entry>
 708               <entry>
 709                 <para>The Lustre software uses the ldiskfs hashed directory
 710                 code, which has a limit of about 10 million files, depending
 711                 on the length of the file name. The limit on subdirectories
 712                 is the same as the limit on regular files.</para>
 713                 <note condition='l28'><para>Starting in the 2.8 release it is
 714                 possible to exceed this limit by striping a single directory
 715                 over multiple MDTs with the <literal>lfs mkdir -c</literal>
 716                 command, which increases the single directory limit by a
 717                 factor of the number of directory stripes used.</para></note>
 718                 <para>Lustre file systems are tested with ten million files
 719                 in a single directory.</para>
 720               </entry>
 721             </row>
 722             <row>
 723               <entry>
 724                 <para> Maximum number of files in the file system</para>
 725               </entry>
 726               <entry>
 727                 <para> 4 billion (ldiskfs), 256 trillion (ZFS)</para>
 728                 <para condition='l24'>up to 256 times the per-MDT limit</para>
 729               </entry>
 730               <entry>
 731                 <para>The ldiskfs filesystem imposes an upper limit of
 732                 4 billion inodes per filesystem. By default, the MDT
 733                 filesystem is formatted with one inode per 2KB of space,
 734                 meaning 512 million inodes per TB of MDT space. This can be
 735                 increased initially at the time of MDT filesystem creation.
 736                 For more information, see
 737                 <xref linkend="settinguplustresystem"/>.</para>
 738                 <para condition="l24">The ZFS filesystem
 739                 dynamically allocates inodes and does not have a fixed ratio
 740                 of inodes per unit of MDT space, but consumes approximately
 741                 4KB of space per inode, depending on the configuration.</para>
 742                 <para condition="l24">Each additional MDT can hold up to the
 743                 above maximum number of additional files, depending on
 744                 available space and the distribution directories and files
 745                 in the filesystem.</para>
 746               </entry>
 747             </row>
 748             <row>
 749               <entry>
 750                 <para> Maximum length of a filename</para>
 751               </entry>
 752               <entry>
 753                 <para> 255 bytes (filename)</para>
 754               </entry>
 755               <entry>
 756                 <para>This limit is 255 bytes for a single filename, the
 757                 same as the limit in the underlying filesystems.</para>
 758               </entry>
 759             </row>
 760             <row>
 761               <entry>
 762                 <para> Maximum length of a pathname</para>
 763               </entry>
 764               <entry>
 765                 <para> 4096 bytes (pathname)</para>
 766               </entry>
 767               <entry>
 768                 <para>The Linux VFS imposes a full pathname length of 4096 bytes.</para>
 769               </entry>
 770             </row>
 771             <row>
 772               <entry>
 773                 <para> Maximum number of open files for a Lustre file system</para>
 774               </entry>
 775               <entry>
 776                 <para> No limit</para>
 777               </entry>
 778               <entry>
 779                 <para>The Lustre software does not impose a maximum for the number of open files,
 780                   but the practical limit depends on the amount of RAM on the MDS. No
 781                   &quot;tables&quot; for open files exist on the MDS, as they are only linked in a
 782                   list to a given client&apos;s export. Each client process probably has a limit of
 783                   several thousands of open files which depends on the ulimit.</para>
 784               </entry>
 785             </row>
 786           </tbody>
 787         </tgroup>
 788       </table>
 789       <para>&#160;</para>
 790       <note><para>By default for ldiskfs MDTs the maximum stripe count for a
 791       <emphasis>single file</emphasis> is limited to 160 OSTs.  In order to
 792       increase the maximum file stripe count, use
 793       <literal>--mkfsoptions="-O ea_inode"</literal> when formatting the MDT,
 794       or use <literal>tune2fs -O ea_inode</literal> to enable it after the
 795       MDT has been formatted.</para>
 796       </note>
 797     </section>
 798   </section>
 799   <section xml:id="dbdoclet.50438256_26456">
 800     <title><indexterm><primary>setup</primary><secondary>memory</secondary></indexterm>Determining Memory Requirements</title>
 801     <para>This section describes the memory requirements for each Lustre file system component.</para>
 802     <section remap="h3">
 803         <title>
 804             <indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>client</tertiary></indexterm>
 805             Client Memory Requirements</title>
 806       <para>A minimum of 2 GB RAM is recommended for clients.</para>
 807     </section>
 808     <section remap="h3">
 809         <title><indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>MDS</tertiary></indexterm>MDS Memory Requirements</title>
 810       <para>MDS memory requirements are determined by the following factors:</para>
 811       <itemizedlist>
 812         <listitem>
 813           <para>Number of clients</para>
 814         </listitem>
 815         <listitem>
 816           <para>Size of the directories</para>
 817         </listitem>
 818         <listitem>
 819           <para>Load placed on server</para>
 820         </listitem>
 821       </itemizedlist>
 822       <para>The amount of memory used by the MDS is a function of how many clients are on the system, and how many files they are using in their working set. This is driven, primarily, by the number of locks a client can hold at one time. The number of locks held by clients varies by load and memory availability on the server. Interactive clients can hold in excess of 10,000 locks at times. On the MDS, memory usage is approximately 2 KB per file, including the Lustre distributed lock manager (DLM) lock and kernel data structures for the files currently in use. Having file data in cache can improve metadata performance by a factor of 10x or more compared to reading it from disk.</para>
 823       <para>MDS memory requirements include:</para>
 824       <itemizedlist>
 825         <listitem>
 826           <para><emphasis role="bold">File system metadata</emphasis> : A reasonable amount of RAM needs to be available for file system metadata. While no hard limit can be placed on the amount of file system metadata, if more RAM is available, then the disk I/O is needed less often to retrieve the metadata.</para>
 827         </listitem>
 828         <listitem>
 829           <para><emphasis role="bold">Network transport</emphasis> : If you are using TCP or other network transport that uses system memory for send/receive buffers, this memory requirement must also be taken into consideration.</para>
 830         </listitem>
 831         <listitem>
 832           <para><emphasis role="bold">Journal size</emphasis> : By default, the journal size is 400 MB for each Lustre ldiskfs file system. This can pin up to an equal amount of RAM on the MDS node per file system.</para>
 833         </listitem>
 834         <listitem>
 835           <para><emphasis role="bold">Failover configuration</emphasis> : If the MDS node will be used for failover from another node, then the RAM for each journal should be doubled, so the backup server can handle the additional load if the primary server fails.</para>
 836         </listitem>
 837       </itemizedlist>
 838       <section remap="h4">
 839         <title><indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>MDS</tertiary></indexterm>Calculating MDS Memory Requirements</title>
 840         <para>By default, 400 MB are used for the file system journal. Additional RAM is used for caching file data for the larger working set, which is not actively in use by clients but should be kept &quot;hot&quot; for improved access times. Approximately 1.5 KB per file is needed to keep a file in cache without a lock.</para>
 841         <para>For example, for a single MDT on an MDS with 1,000 clients, 16 interactive nodes, and a 2 million file working set (of which 400,000 files are cached on the clients):</para>
 842         <informalexample>
 843           <para>Operating system overhead = 512 MB</para>
 844           <para>File system journal = 400 MB</para>
 845           <para>1000 * 4-core clients * 100 files/core * 2kB = 800 MB</para>
 846           <para>16 interactive clients * 10,000 files * 2kB = 320 MB</para>
 847           <para>1,600,000 file extra working set * 1.5kB/file = 2400 MB</para>
 848         </informalexample>
 849         <para>Thus, the minimum requirement for a system with this configuration is at least 4 GB of RAM. However, additional memory may significantly improve performance.</para>
 850         <para>For directories containing 1 million or more files, more memory may provide a significant benefit. For example, in an environment where clients randomly access one of 10 million files, having extra memory for the cache significantly improves performance.</para>
 851       </section>
 852     </section>
 853     <section remap="h3">
 854       <title><indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>OSS</tertiary></indexterm>OSS Memory Requirements</title>
 855       <para>When planning the hardware for an OSS node, consider the memory usage of several
 856         components in the Lustre file system (i.e., journal, service threads, file system metadata,
 857         etc.). Also, consider the effect of the OSS read cache feature, which consumes memory as it
 858         caches data on the OSS node.</para>
 859       <para>In addition to the MDS memory requirements mentioned in <xref linkend="dbdoclet.50438256_87676"/>, the OSS requirements include:</para>
 860       <itemizedlist>
 861         <listitem>
 862           <para><emphasis role="bold">Service threads</emphasis> : The service threads on the OSS node pre-allocate a 4 MB I/O buffer for each ost_io service thread, so these buffers do not need to be allocated and freed for each I/O request.</para>
 863         </listitem>
 864         <listitem>
 865           <para><emphasis role="bold">OSS read cache</emphasis> : OSS read cache provides read-only
 866             caching of data on an OSS, using the regular Linux page cache to store the data. Just
 867             like caching from a regular file system in the Linux operating system, OSS read cache
 868             uses as much physical memory as is available.</para>
 869         </listitem>
 870       </itemizedlist>
 871       <para>The same calculation applies to files accessed from the OSS as for the MDS, but the load is distributed over many more OSSs nodes, so the amount of memory required for locks, inode cache, etc. listed under MDS is spread out over the OSS nodes.</para>
 872       <para>Because of these memory requirements, the following calculations should be taken as determining the absolute minimum RAM required in an OSS node.</para>
 873       <section remap="h4">
 874         <title><indexterm><primary>setup</primary><secondary>memory</secondary><tertiary>OSS</tertiary></indexterm>Calculating OSS Memory Requirements</title>
 875         <para>The minimum recommended RAM size for an OSS with two OSTs is computed below:</para>
 876         <informalexample>
 877           <para>Ethernet/TCP send/receive buffers (4 MB * 512 threads) = 2048 MB</para>
 878           <para>400 MB journal size * 2 OST devices = 800 MB</para>
 879           <para>1.5 MB read/write per OST IO thread * 512 threads = 768 MB</para>
 880           <para>600 MB file system read cache * 2 OSTs = 1200 MB</para>
 881           <para>1000 * 4-core clients * 100 files/core * 2kB = 800MB</para>
 882           <para>16 interactive clients * 10,000 files * 2kB = 320MB</para>
 883           <para>1,600,000 file extra working set * 1.5kB/file = 2400MB</para>
 884           <para> DLM locks + file system metadata TOTAL = 3520MB</para>
 885           <para>Per OSS DLM locks + file system metadata = 3520MB/6 OSS = 600MB (approx.)</para>
 886           <para>Per OSS RAM minimum requirement = 4096MB (approx.)</para>
 887         </informalexample>
 888         <para>This consumes about 1,400 MB just for the pre-allocated buffers, and an additional 2 GB for minimal file system and kernel usage. Therefore, for a non-failover configuration, the minimum RAM would be 4 GB for an OSS node with two OSTs. Adding additional memory on the OSS will improve the performance of reading smaller, frequently-accessed files.</para>
 889         <para>For a failover configuration, the minimum RAM would be at least 6 GB. For 4 OSTs on each OSS in a failover configuration 10GB of RAM is reasonable. When the OSS is not handling any failed-over OSTs the extra RAM will be used as a read cache.</para>
 890         <para>As a reasonable rule of thumb, about 2 GB of base memory plus 1 GB per OST can be used. In failover configurations, about 2 GB per OST is needed.</para>
 891       </section>
 892     </section>
 893   </section>
 894   <section xml:id="dbdoclet.50438256_78272">
 895     <title><indexterm>
 896         <primary>setup</primary>
 897         <secondary>network</secondary>
 898       </indexterm>Implementing Networks To Be Used by the Lustre File System</title>
 899     <para>As a high performance file system, the Lustre file system places heavy loads on networks.
 900       Thus, a network interface in each Lustre server and client is commonly dedicated to Lustre
 901       file system traffic. This is often a dedicated TCP/IP subnet, although other network hardware
 902       can also be used.</para>
 903     <para>A typical Lustre file system implementation may include the following:</para>
 904     <itemizedlist>
 905       <listitem>
 906         <para>A high-performance backend network for the Lustre servers, typically an InfiniBand (IB) network.</para>
 907       </listitem>
 908       <listitem>
 909         <para>A larger client network.</para>
 910       </listitem>
 911       <listitem>
 912         <para>Lustre routers to connect the two networks.</para>
 913       </listitem>
 914     </itemizedlist>
 915     <para>Lustre networks and routing are configured and managed by specifying parameters to the
 916       Lustre Networking (<literal>lnet</literal>) module in
 917         <literal>/etc/modprobe.d/lustre.conf</literal>.</para>
 918     <para>To prepare to configure Lustre networking, complete the following steps:</para>
 919     <orderedlist>
 920       <listitem>
 921         <para><emphasis role="bold">Identify all machines that will be running Lustre software and
 922             the network interfaces they will use to run Lustre file system traffic. These machines
 923             will form the Lustre network .</emphasis></para>
 924         <para>A network is a group of nodes that communicate directly with one another. The Lustre
 925           software includes Lustre network drivers (LNDs) to support a variety of network types and
 926           hardware (see <xref linkend="understandinglustrenetworking"/> for a complete list). The
 927           standard rules for specifying networks applies to Lustre networks. For example, two TCP
 928           networks on two different subnets (<literal>tcp0</literal> and <literal>tcp1</literal>)
 929           are considered to be two different Lustre networks.</para>
 930       </listitem>
 931       <listitem>
 932         <para><emphasis role="bold">If routing is needed, identify the nodes to be used to route traffic between networks.</emphasis></para>
 933         <para>If you are using multiple network types, then you will need a router. Any node with
 934           appropriate interfaces can route Lustre networking (LNet) traffic between different
 935           network hardware types or topologies --the node may be a server, a client, or a standalone
 936           router. LNet can route messages between different network types (such as
 937           TCP-to-InfiniBand) or across different topologies (such as bridging two InfiniBand or
 938           TCP/IP networks). Routing will be configured in <xref linkend="configuringlnet"/>.</para>
 939       </listitem>
 940       <listitem>
 941         <para><emphasis role="bold">Identify the network interfaces to include
 942         in or exclude from LNet.</emphasis></para>
 943         <para>If not explicitly specified, LNet uses either the first available
 944         interface or a pre-defined default for a given network type. Interfaces
 945         that LNet should not use (such as an administrative network or
 946         IP-over-IB), can be excluded.</para>
 947         <para>Network interfaces to be used or excluded will be specified using
 948         the lnet kernel module parameters <literal>networks</literal> and
 949         <literal>ip2nets</literal> as described in
 950         <xref linkend="configuringlnet"/>.</para>
 951       </listitem>
 952       <listitem>
 953         <para><emphasis role="bold">To ease the setup of networks with complex
 954         network configurations, determine a cluster-wide module configuration.
 955         </emphasis></para>
 956         <para>For large clusters, you can configure the networking setup for
 957         all nodes by using a single, unified set of parameters in the
 958         <literal>lustre.conf</literal> file on each node. Cluster-wide
 959         configuration is described in <xref linkend="configuringlnet"/>.</para>
 960       </listitem>
 961     </orderedlist>
 962     <note>
 963       <para>We recommend that you use &apos;dotted-quad&apos; notation for IP addresses rather than host names to make it easier to read debug logs and debug configurations with multiple interfaces.</para>
 964     </note>
 965   </section>
 966 </chapter>