From d2805ca938b3fbf8180cb6a9a0936f31d343190d Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Sun, 28 Jun 2020 16:25:46 -0600 Subject: [PATCH] LUDOC-11 setup: update limits for new releases Update the filesystem limits based on recent changes. - directory size limits for ldiskfs increase due to large_dir - maximum OST size limits increase due to recent fixes - reference ARM for large pages instead of ancient IA64 Add/improve related cross-reference label names, which improves the visible URL names in the HTML version of the manual. Signed-off-by: Andreas Dilger Change-Id: I793b1a8a657bc3470f17c983a98404dcaf3ebbe5 Reviewed-on: https://review.whamcloud.com/39208 Reviewed-by: Andreas Dilger Tested-by: jenkins --- LustreTroubleshooting.xml | 3 +- SettingUpLustreSystem.xml | 131 +++++++++++++++++++++++++--------------------- 2 files changed, 72 insertions(+), 62 deletions(-) diff --git a/LustreTroubleshooting.xml b/LustreTroubleshooting.xml index cd2f885..178e97f 100644 --- a/LustreTroubleshooting.xml +++ b/LustreTroubleshooting.xml @@ -748,7 +748,8 @@ server now claims 791)! Lustre or kernel stack traces showing processes stuck in "try_to_free_pages" - For information on determining the MDS memory and OSS memory requirements, see . + For information on determining the MDS memory and OSS memory + requirements, see .
Setting SCSI I/O Sizes diff --git a/SettingUpLustreSystem.xml b/SettingUpLustreSystem.xml index 658ce2b..b06dc4b 100644 --- a/SettingUpLustreSystem.xml +++ b/SettingUpLustreSystem.xml @@ -9,7 +9,7 @@ - + @@ -24,16 +24,16 @@ - + - + -
+
<indexterm><primary>setup</primary></indexterm> <indexterm><primary>setup</primary><secondary>hardware</secondary></indexterm> <indexterm><primary>design</primary><see>setup</see></indexterm> @@ -72,7 +72,11 @@ a separate device.</para> <para>The MDS can effectively utilize a lot of CPU cycles. A minimum of four processor cores are recommended. More are advisable for files systems with many clients.</para> <note> - <para>Lustre clients running on architectures with different endianness are supported. One limitation is that the PAGE_SIZE kernel macro on the client must be as large as the PAGE_SIZE of the server. In particular, ia64 or PPC clients with large pages (up to 64kB pages) can run with x86 servers (4kB pages). If you are running x86 clients with ia64 or PPC servers, you must compile the ia64 kernel with a 4kB PAGE_SIZE (so the server page size is not larger than the client page size). </para> + <para>Lustre clients running on different CPU architectures is supported. + One limitation is that the PAGE_SIZE kernel macro on the client must be + as large as the PAGE_SIZE of the server. In particular, ARM or PPC + clients with large pages (up to 64kB pages) can run with x86 servers + (4kB pages).</para> </note> <section remap="h3"> <title><indexterm> @@ -523,13 +527,13 @@ <screen>[oss#] mkfs.lustre --ost --mkfsoptions="-i $((8192 * 1024))" ...</screen> </para> <note> - <para>OSTs formatted with ldiskfs can use a maximum of approximately - 320 million objects per MDT, up to a maximum of 4 billion inodes. + <para>OSTs formatted with ldiskfs should preferably have fewer than + 320 million objects per MDT, and up to a maximum of 4 billion inodes. Specifying a very small bytes-per-inode ratio for a large OST that - exceeds this limit can cause either premature out-of-space errors and prevent - the full OST space from being used, or will waste space and slow down - e2fsck more than necessary. The default inode ratios are chosen to - ensure that the total number of inodes remain below this limit. + exceeds this limit can cause either premature out-of-space errors and + prevent the full OST space from being used, or will waste space and + slow down e2fsck more than necessary. The default inode ratios are + chosen to ensure the total number of inodes remain below this limit. </para> </note> <note> @@ -564,13 +568,12 @@ </indexterm>File and File System Limits describes - current known limits of Lustre. These limits are imposed by either - the Lustre architecture or the Linux virtual file system (VFS) and - virtual memory subsystems. In a few cases, a limit is defined within - the code and can be changed by re-compiling the Lustre software. - Instructions to install from source code are beyond the scope of this - document, and can be found elsewhere online. In these cases, the - indicated limit was used for testing of the Lustre software. + current known limits of Lustre. These limits may be imposed by either + the Lustre architecture or the Linux virtual file system (VFS) and + virtual memory subsystems. In a few cases, a limit is defined within + the code Lustre based on tested values and could be changed by editing + and re-compiling the Lustre software. In these cases, the indicated + limit was used for testing of the Lustre software. File and file system limits @@ -594,7 +597,7 @@ - Maximum number of MDTs + Maximum number of MDTs 256 @@ -611,28 +614,28 @@ - Maximum number of OSTs + Maximum number of OSTs 8150 The maximum number of OSTs is a constant that can be - changed at compile time. Lustre file systems with up to - 4000 OSTs have been tested. Multiple OST file systems can - be configured on a single OSS node. + changed at compile time. Lustre file systems with up to 4000 + OSTs have been configured in the past. Multiple OST targets + can be configured on a single OSS node. - Maximum OST size + Maximum OST size - 512TiB (ldiskfs), 512TiB (ZFS) + 1024TiB (ldiskfs), 1024TiB (ZFS) This is not a hard limit. Larger - OSTs are possible but most production systems do not + OSTs are possible, but most production systems do not typically go beyond the stated limit per OST because Lustre can add capacity and performance with additional OSTs, and having more OSTs improves aggregate I/O performance, @@ -642,13 +645,13 @@ With 32-bit kernels, due to page cache limits, 16TB is the maximum block device size, which in turn applies to the - size of OST. It is strongly recommended to run Lustre - clients and servers with 64-bit kernels. + size of OST. It is strongly recommended + to run Lustre clients and servers with 64-bit kernels. - Maximum number of clients + Maximum number of clients 131072 @@ -661,21 +664,21 @@ - Maximum size of a single file system + Maximum size of a single file system - at least 1EiB + 2EiB or larger - Each OST can have a file system up to the - Maximum OST size limit, and the Maximum number of OSTs - can be combined into a single filesystem. + Each OST can have a file system up to the "Maximum OST + size" limit, and the Maximum number of OSTs can be combined + into a single filesystem. - Maximum stripe count + Maximum stripe count 2000 @@ -684,13 +687,22 @@ This limit is imposed by the size of the layout that needs to be stored on disk and sent in RPC requests, but is not a hard limit of the protocol. The number of OSTs in the - filesystem can exceed the stripe count, but this limits the - number of OSTs across which a single file can be striped. + filesystem can exceed the stripe count, but this is the maximum + number of OSTs on which a single file + can be striped. + Before 2.13, the default for ldiskfs + MDTs the maximum stripe count for a + single file is limited to 160 OSTs. In order to + increase the maximum file stripe count, use + --mkfsoptions="-O ea_inode" when formatting the MDT, + or use tune2fs -O ea_inode to enable it after the + MDT has been formatted. + - Maximum stripe size + Maximum stripe size < 4 GiB @@ -702,7 +714,7 @@ - Minimum stripe size + Minimum stripe size 64 KiB @@ -711,12 +723,13 @@ Due to the use of 64 KiB PAGE_SIZE on some CPU architectures such as ARM and POWER, the minimum stripe size is 64 KiB so that a single page is not split over - multiple servers. + multiple servers. This is also the minimum Data-on-MDT + component size that can be specified. - Maximum single object size + Maximum single object size 16TiB (ldiskfs), 256TiB (ZFS) @@ -731,7 +744,7 @@ - Maximum file size + Maximum file size 16 TiB on 32-bit systems @@ -754,14 +767,14 @@ - Maximum number of files or subdirectories in a single directory + Maximum number of files or subdirectories in a single directory - 10 million files (ldiskfs), 2^48 (ZFS) + 600M-3.8B files (ldiskfs), 16T (ZFS) The Lustre software uses the ldiskfs hashed directory - code, which has a limit of about 10 million files, depending + code, which has a limit of at least 600 million files, depending on the length of the file name. The limit on subdirectories is the same as the limit on regular files. Starting in the 2.8 release it is @@ -769,16 +782,19 @@ over multiple MDTs with the lfs mkdir -c command, which increases the single directory limit by a factor of the number of directory stripes used. - Lustre file systems are tested with ten million files - in a single directory. + Starting in the 2.14 release, the + large_dir feature of ldiskfs is enabled by + default to allow directories with more than 10M entries. In + the 2.12 release, the large_dir feature was + present but not enabled by default. - Maximum number of files in the file system + Maximum number of files in the file system - 4 billion (ldiskfs), 256 trillion (ZFS) per MDT + 4 billion (ldiskfs), 256 trillion (ZFS) per MDT The ldiskfs filesystem imposes an upper limit of @@ -800,7 +816,7 @@ - Maximum length of a filename + Maximum length of a filename 255 bytes (filename) @@ -812,7 +828,7 @@ - Maximum length of a pathname + Maximum length of a pathname 4096 bytes (pathname) @@ -823,7 +839,7 @@ - Maximum number of open files for a Lustre file system + Maximum number of open files for a Lustre file system No limit @@ -841,15 +857,8 @@
  - By default for ldiskfs MDTs the maximum stripe count for a - single file is limited to 160 OSTs. In order to - increase the maximum file stripe count, use - --mkfsoptions="-O ea_inode" when formatting the MDT, - or use tune2fs -O ea_inode to enable it after the - MDT has been formatted. -
-
+
<indexterm><primary>setup</primary><secondary>memory</secondary></indexterm>Determining Memory Requirements This section describes the memory requirements for each Lustre file system component.
@@ -991,7 +1000,7 @@
-
+
<indexterm> <primary>setup</primary> <secondary>network</secondary> -- 1.8.3.1