From f83fdd4062a22d711ec018af71f5d50b892a6609 Mon Sep 17 00:00:00 2001 From: Richard Henwood Date: Tue, 17 May 2011 18:16:25 -0500 Subject: [PATCH] FIX: xrefs --- ManagingStripingFreeSpace.xml | 126 +++++++++--------------------------------- 1 file changed, 26 insertions(+), 100 deletions(-) diff --git a/ManagingStripingFreeSpace.xml b/ManagingStripingFreeSpace.xml index dcc26a0..b234a84 100644 --- a/ManagingStripingFreeSpace.xml +++ b/ManagingStripingFreeSpace.xml @@ -1,89 +1,57 @@ - + - Managing File Striping and Free Space + Managing File Striping and Free Space + This chapter describes file striping and I/O options, and includes the following sections: + - How Lustre Striping Works + - + - Lustre File Striping Considerations + - + - Setting the File Layout/Striping Configuration (lfs setstripe) - - - - - - Retrieving File Layout/Striping Information (getstripe) - - - - - - Managing Free Space - - - + -
- <anchor xml:id="dbdoclet.50438209_pgfId-1291833" xreflabel=""/> -
- 18.1 <anchor xml:id="dbdoclet.50438209_79324" xreflabel=""/>How Lustre Striping Works + +
+ 18.1 How Lustre Striping Works Lustre uses a round-robin algorithm for selecting the next OST to which a stripe is to be written. Normally the usage of OSTs is well balanced. However, if users create a small number of exceptionally large files or incorrectly specify striping parameters, imbalanced OST usage may result. The MDS allocates objects on seqential OSTs. Periodically, it will adjust the striping layout to eliminate some degenerated cases where applications that create very regular file layouts (striping patterns) would preferentially use a particular OST in the sequence. Stripes are written to sequential OSTs until free space across the OSTs differs by more than 20%. The MDS will then use weighted random allocations with a preference for allocating objects on OSTs with more free space. This can reduce I/O performance until space usage is rebalanced to within 20% again. - For a more detailed description of stripe assignments, see Managing Free Space. + For a more detailed description of stripe assignments, see .
-
- 18.2 Lustre <anchor xml:id="dbdoclet.50438209_48033" xreflabel=""/>File <anchor xml:id="dbdoclet.50438209_marker-1291832" xreflabel=""/>Striping Considerations +
+ 18.2 Lustre File <anchor xml:id="dbdoclet.50438209_marker-1291832" xreflabel=""/>Striping Considerations Whether you should set up file striping and what parameter values you select depends on your need. A good rule of thumb is to stripe over as few objects as will meet those needs and no more. Some reasons for using striping include: Providing high-bandwidth access - Many applications require high-bandwidth access to a single file - more bandwidth than can be provided by a single OSS. For example, scientific applications that write to a single file from hundreds of nodes, or a binary executable that is loaded by many nodes when an application starts. - - - - - + In cases like these, a file can be striped over as many OSSs as it takes to achieve the required peak aggregate bandwidth for that file. Striping across a larger number of OSSs should only be used when the file size is very large and/or is accessed by many nodes at a time. Currently, Lustre files can be striped across up to 160 OSSs, the maximum stripe count for an ldiskfs file system. - + Improving performance when OSS bandwidth is exceeded - Striping across many OSSs can improve performance if the aggregate client bandwidth exceeds the server bandwidth and the application reads and writes data fast enough to take advantage of the additional OSS bandwidth. The largest useful stripe count is bounded by the I/O rate of the clients/jobs divided by the performance per OSS. - - - Providing space for very large files. Striping is also useful when a single OST does not have enough free space to hold the entire file. - - - Some reasons to minimize or avoid striping: Increased overhead - Striping results in more locks and extra network operations during common operations such as stat and unlink. Even when these operations are performed in parallel, one network operation takes less time than 100 operations. - - - - - Increased overhead also results from server contention. Consider a cluster with 100 clients and 100 OSSs, each with one OST. If each file has exactly one object and the load is distributed evenly, there is no contention and the disks on each server can manage sequential I/O. If each file has 100 objects, then the clients all compete with one another for the attention of the servers, and the disks on each node seek in 100 different directions. In this case, there is needless contention. - + Increased risk - When a file is striped across all servers and one of the servers breaks down, a small part of each striped file is lost. By comparison, if each file has exactly one stripe, you lose fewer files, but you lose them in their entirety. Many users would prefer to lose some of their files entirely than all of their files partially. - - -
<anchor xml:id="dbdoclet.50438209_pgfId-1291860" xreflabel=""/>18.2.1 Choosing a Stripe <anchor xml:id="dbdoclet.50438209_marker-1291859" xreflabel=""/>Size @@ -92,43 +60,25 @@ The stripe size must be a multiple of the page size. Lustre’s tools enforce a multiple of 64 KB (the maximum page size on ia64 and PPC64 nodes) so that users on platforms with smaller pages do not accidentally create files that might cause problems for ia64 clients. - - - The smallest recommended stripe size is 512 KB. Although you can create files with a stripe size of 64 KB, the smallest practical stripe size is 512 KB because Lustre sends 1MB chunks over the network. Choosing a smaller stripe size may result in inefficient I/O to the disks and reduced performance. - - - A good stripe size for sequential I/O using high-speed networks is between 1 MB and 4 MB. In most situations, stripe sizes larger than 4 MB may result in longer lock hold times and contention on shared file access. - - - The maximum stripe size is 4GB. Using a large stripe size can improve performance when accessing very large files. It allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. - - - Choose a stripe pattern that takes into account your application’s write patterns. Writes that cross an object boundary are slightly less efficient than writes that go entirely to one server. If the file is written in a very consistent and aligned way, make the stripe size a multiple of the write() size. - - - The choice of stripe size has no effect on a single-stripe file. - - -
-
- 18.3 <anchor xml:id="dbdoclet.50438209_78664" xreflabel=""/>Setting the File Layout/Striping Configuration (lfs setstripe) +
+ 18.3 Setting the File Layout/Striping Configuration (lfs setstripe) Use the lfs setstripe command to create new files with a specific file layout (stripe pattern) configuration. lfs setstripe [--size|-s stripe_size] [--count|-c stripe_count] [--index|-i start_ost] [--pool|-p pool_name] <filename|dirname> @@ -139,16 +89,7 @@ The stripe count indicates how many OSTs to use. The default stripe_count value is 1. Setting stripe_count to 0 causes the default stripe count to be used. Setting stripe_count to -1 means stripe over all available OSTs (full OSTs are skipped). start_ost The start OST is the first OST to which files are written. The default value for start_ost is -1 , which allows the MDS to choose the starting index. This setting is strongly recommended, as it allows space and load balancing to be done by the MDS as needed. Otherwise, the file starts on the specified OST index. The numbering of the OSTs starts at 0. - - - - - - Note -If you pass a start_ost value of 0 and a stripe_count value of 1, all files are written to OST 0, until space is exhausted. This is probably not what you meant to do. If you only want to adjust the stripe count and keep the other parameters at their default settings, do not specify any of the other parameters:lfs setstripe -c <stripe_count> <file> - - - - + If you pass a start_ost value of 0 and a stripe_count value of 1, all files are written to OST 0, until space is exhausted. This is probably not what you meant to do. If you only want to adjust the stripe count and keep the other parameters at their default settings, do not specify any of the other parameters:lfs setstripe -c <stripe_count> <file> pool_name Specify the OST pool on which the file will be written. This allows limiting the OSTs used to a subset of all OSTs in the file system. For more details about using OST pools, see Creating and Managing OST Pools.
@@ -228,8 +169,8 @@
-
- 18.4 <anchor xml:id="dbdoclet.50438209_44776" xreflabel=""/>Retrieving File Layout/Striping Information (getstripe) +
+ 18.4 Retrieving File Layout/Striping Information (getstripe) The lfsgetstripe command is used to display information that shows over which OSTs a file is distributed. For each OST, the index and UUID is displayed, along with the OST index and object ID for each stripe in the file. For directories, the default settings for files created in that directory are printed.
<anchor xml:id="dbdoclet.50438209_pgfId-1297837" xreflabel=""/>18.4.1 Displaying the Current Stripe Size @@ -264,8 +205,8 @@ group
-
- 18.5 <anchor xml:id="dbdoclet.50438209_10424" xreflabel=""/>Managing Free <anchor xml:id="dbdoclet.50438209_marker-1295520" xreflabel=""/>Space +
+ 18.5 Managing Free <anchor xml:id="dbdoclet.50438209_marker-1295520" xreflabel=""/>Space The MDT assigns file stripes to OSTs based on location (which OSS) and size considerations (free space) to optimize file system performance. Emptier OSTs are preferentially selected for stripes, and stripes are preferentially spread out between OSSs to increase network bandwidth utilization. The weighting factor between these two optimizations can be adjusted by the user.
<anchor xml:id="dbdoclet.50438209_pgfId-1293929" xreflabel=""/>18.5.1 <anchor xml:id="dbdoclet.50438209_35838" xreflabel=""/>Checking File System Free Space @@ -351,8 +292,6 @@ IFree IUse% Mounted on Two stripe allocation methods are provided: round-robin and weighted. By default, the allocation method is determined by the amount of free-space imbalance on the OSTs. The weighted allocator is used when any two OSTs are imbalanced by more than 20%. Otherwise, the faster round-robin allocator is used. (The round-robin order maximizes network balancing.) Round-robin allocator - When OSTs have approximately the same amount of free space (within 20%), an efficient round-robin allocator is used. The round-robin allocator alternates stripes between OSTs on different OSSs, so the OST used for stripe 0 of each file is evenly distributed among OSTs, regardless of the stripe count. Here are several sample round-robin stripe orders (each letter represents a different OST on a single OSS): - - @@ -385,9 +324,6 @@ IFree IUse% Mounted on Weighted allocator - When the free space difference between the OSTs is significant (by default, 20% of the free space), then a weighting algorithm is used to influence OST ordering based on size and location. Note that these are weightings for a random algorithm, so the OST with the most free space is not necessarily chosen each time. On average, the weighted allocator fills the emptier OSTs faster. - - -
@@ -396,17 +332,7 @@ IFree IUse% Mounted on lctl conf_param <fsname>-MDT0000.lov.qos_prio_free=90 Increasing this value puts more weighting on free space. When the free space priority is set to 100%, then location is no longer used in stripe-ordering calculations and weighting is based entirely on free space. - - - - - - Note -Setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via a weighting. For example, if OST2 has twice as much free space as OST1, then OST2 is twice as likely to be used, but it is not guaranteed to be used. - - - - + Setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via a weighting. For example, if OST2 has twice as much free space as OST1, then OST2 is twice as likely to be used, but it is not guaranteed to be used.
-
-- 1.8.3.1