From 22f8de480091e0eae66d31d72ac49975c6412031 Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Fri, 2 Nov 2018 16:43:11 -0600 Subject: [PATCH] LUDOC-399 dom: improve description of inode limits Fix one typo in the MDT inode size description (current default is 2KB/inode instead of old 4KB/inode) value. Clarify the description of DoM usage and limits. Signed-off-by: Andreas Dilger Change-Id: I42259e45fdbfb2e8541745ce4f794c2b123ebbe5 Reviewed-on: https://review.whamcloud.com/33562 Tested-by: Jenkins Reviewed-by: Stephan Thiell Reviewed-by: Mike Pershin Reviewed-by: Joseph Gmitter --- DataOnMDT.xml | 48 +++++++++++++++++++++--------------- SettingUpLustreSystem.xml | 62 ++++++++++++++++++++++++++++------------------- 2 files changed, 66 insertions(+), 44 deletions(-) diff --git a/DataOnMDT.xml b/DataOnMDT.xml index 1ab9a1c..2c6b985 100644 --- a/DataOnMDT.xml +++ b/DataOnMDT.xml @@ -55,8 +55,9 @@ The lfs setstripe command is used to create DoM files.
Command - lfs setstripe --component-end|-E end1 --layout|-L \ -mdt [--component-end|-E end2 [STRIPE_OPTIONS] ...] <filename> + +lfs setstripe --component-end|-E end1 --layout|-L mdt \ + [--component-end|-E end2 [STRIPE_OPTIONS] ...] <filename> The command above creates a file with the special composite layout, which defines the first component as an MDT component. The @@ -65,8 +66,9 @@ mdt [--component-end|-E end2 [STRIPE_OPTIONS] ...] <filename> end1 is also the stripe size of this component, and is limited by the lod.*.dom_stripesize of the MDT the file is - created on. No other options are required. The rest of the - components use the normal syntax for composite files creation. + created on. No other options are required for this component. + The rest of the components use the normal syntax for composite + files creation. If the next component doesn't specify striping, such as: @@ -256,18 +258,22 @@ client$ lfs getstripe /mnt/lustre/domdir/domfile
LFS limits for DoM component size lfs setstripe allows for setting the - component size for MDT layouts up to 1GB, however, the size must + component size for MDT layouts up to 1GB (this is a compile-time + limit to avoid improper configuration), however, the size must also be aligned by 64KB due to the minimum stripe size in Lustre - (see ). This value - represents the maximum possible size of the component on the MDT. - Meanwhile, there is another limit which is checked by - lfs setstripe and is provided by the MDT - server itself. + (see + Minimum stripe size). There is also a limit + imposed on each file by lfs setstripe -E end + that may be smaller than the MDT-imposed limit if this is better + for a particular usage.
MDT Server Limits - The LOD parameter dom_stripesize is used - to control the per-server maximum size for a DoM component. It is - 1MB by default and can be changed with the + The lod.$fsname-MDTxxxx.dom_stripesize + is used to control the per-MDT maximum size for a DoM component. + Larger DoM components specified by the user will be truncated to + the MDT-specified limit, and as such may be different on each + MDT to balance DoM space usage on each MDT separately, if needed. + It is 1MB by default and can be changed with the lctl tool. For more information on setting dom_stripesize please see . @@ -380,7 +386,7 @@ client$ lfs find -L mdt -S +200K -type f /mnt/lustre the parameter dom_stripesize in the LOD device. The dom_stripesize can be set differently for each MDT, if necessary. The default value of the parameter is 1MB and can - be changed with lclt tool. + be changed with lctl tool.
Get Command lctl get_param lod.*MDT<index>*.dom_stripesize @@ -445,12 +451,16 @@ mds# lctl get_param -n lod.*MDT0000*.dom_stripesize dom disabledom Disable DoM - When lclt set_param or - lctl conf_param sets - dom_stripesize to 0, DoM - file creation will be prohibited on the selected server. + When lctl set_param or + lctl conf_param sets + dom_stripesize to 0, DoM + component creation will be disabled on the selected server, and + any new layouts with a specified DoM component + will have that component removed from the file layout. Existing + files and layouts with DoM components on that MDT are not changed. + DoM files can still be created in existing directories - with the default DoM layout + with a default DoM layout.
diff --git a/SettingUpLustreSystem.xml b/SettingUpLustreSystem.xml index 5ca5eb7..f7ca264 100644 --- a/SettingUpLustreSystem.xml +++ b/SettingUpLustreSystem.xml @@ -221,7 +221,7 @@ Determining MDT Space Requirements When calculating the MDT size, the important factor to consider is the number of files to be stored in the file system, which depends on - at least 4 KiB per inode of usable space on the MDT. Since MDTs typically + at least 2 KiB per inode of usable space on the MDT. Since MDTs typically use RAID-1+0 mirroring, the total storage needed will be double this. Please note that the actual used space per MDT depends on the number @@ -230,41 +230,50 @@ storage required for Lustre file system metadata is typically 1-2 percent of the total file system capacity depending upon file size. If the feature is in use for Lustre - 2.11 or later, MDT space should typically be 5 percent of the total space, - depending on the distribution of small files within the filesystem. + 2.11 or later, MDT space should typically be 5 percent or more of the + total space, depending on the distribution of small files within the + filesystem and the lod.*.dom_stripesize limit on + the MDT and file layout used.
For ZFS-based MDT filesystems, the number of inodes created on the MDT and OST is dynamic, so there is less need to determine the number of inodes in advance, though there still needs to be some thought given to the total MDT space compared to the total filesystem size. For example, if the average file size is 5 MiB and you have - 100 TiB of usable OST space, then you can calculate the minimum total - number of inodes each for MDTs and OSTs as follows: + 100 TiB of usable OST space, then you can calculate the + minimum total number of inodes for MDTs and OSTs + as follows: (500 TB * 1000000 MB/TB) / 5 MB/inode = 100M inodes - For details about formatting options for ldiskfs MDT and OST file - systems, see . - It is recommended that the MDT have at least twice the minimum + It is recommended that the MDT(s) have at least twice the minimum number of inodes to allow for future expansion and allow for an average - file size smaller than expected. Thus, the minimum space for an ldiskfs - MDT should be approximately: + file size smaller than expected. Thus, the minimum space for ldiskfs + MDT(s) should be approximately: 2 KiB/inode x 100 million inodes x 2 = 400 GiB ldiskfs MDT + For details about formatting options for ldiskfs MDT and OST file + systems, see . - If the average file size is very small, 4 KB for example, the - MDT will use as much space for each file as the space used on the OST, - so the use of Data-on-MDT is strongly recommended. + If the median file size is very small, 4 KB for example, the + MDT would use as much space for each file as the space used on the OST, + so the use of Data-on-MDT is strongly recommended in that case. + The MDT space per inode should be increased correspondingly to + account for the extra data space usage for each inode: + + 6 KiB/inode x 100 million inodes x 2 = 1200 GiB ldiskfs MDT + + If the MDT has too few inodes, this can cause the space on the OSTs to be inaccessible since no new files can be created. In this - case, the lfs df -i and df -i - commands will limit the number of available inodes reported for the - filesystem to match the total number of available objects on the OSTs. - Be sure to determine the appropriate MDT size needed to support the - filesystem before formatting. It is possible to increase the + case, the lfs df -i and df -i + commands will limit the number of available inodes reported for the + filesystem to match the total number of available objects on the OSTs. + Be sure to determine the appropriate MDT size needed to support the + filesystem before formatting. It is possible to increase the number of inodes after the file system is formatted, depending on the storage. For ldiskfs MDT filesystems the resize2fs tool can be used if the underlying block device is on a LVM logical @@ -287,9 +296,9 @@ Starting in release 2.4, using the DNE remote directory feature - it is possible to increase the total number of inodes of a Lustre - filesystem, as well as increasing the aggregate metadata performance, - by configuring additional MDTs into the filesystem, see + it is possible to increase the total number of inodes of a Lustre + filesystem, as well as increasing the aggregate metadata performance, + by configuring additional MDTs into the filesystem, see for details. @@ -619,7 +628,8 @@ typically go beyond the stated limit per OST because Lustre can add capacity and performance with additional OSTs, and having more OSTs improves aggregate I/O performance, - minimizes contention, and allows parallel recovery (e2fsck). + minimizes contention, and allows parallel recovery (e2fsck + for ldiskfs OSTs, scrub for ZFS OSTs). With 32-bit kernels, due to page cache limits, 16TB is the @@ -638,7 +648,7 @@ The maximum number of clients is a constant that can be changed at compile time. Up to 30000 clients have been - used in production. + used in production accessing a single filesystem. @@ -690,8 +700,10 @@ 64 KiB - Due to the 64 KiB PAGE_SIZE on some 64-bit machines, - the minimum stripe size is set to 64 KiB. + Due to the use of 64 KiB PAGE_SIZE on some CPU + architectures such as ARM and POWER, the minimum stripe + size is 64 KiB so that a single page is not split over + multiple servers. -- 1.8.3.1