From: Andreas Dilger Date: Sat, 25 Apr 2020 08:04:39 +0000 (-0600) Subject: LUDOC-11 misc: remove pre-2.5 conditional text X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=25106a04c49c4acef6322b68ce22217fac9e523c;p=doc%2Fmanual.git LUDOC-11 misc: remove pre-2.5 conditional text Remove conditional text in the manual for Lustre versions earlier than 2.5. It is unlikely that many of the changes in the newer manual are relevant to such old releases, and they would be better off using the old manual, if needed. Signed-off-by: Andreas Dilger Change-Id: I82972511a5a53cae0170d37ec422e0ef2fa7dfce Reviewed-on: https://review.whamcloud.com/38364 Tested-by: jenkins Reviewed-by: James Nunez --- diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..037a184 --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +.DS_Store +*.orig +*.rej +lustre_manual.fo +lustre_manual.html +lustre_manual.pdf +lustre_manual.xhtml diff --git a/BackupAndRestore.xml b/BackupAndRestore.xml index 614e8a9..1af7314 100644 --- a/BackupAndRestore.xml +++ b/BackupAndRestore.xml @@ -426,18 +426,6 @@ Changelog records consumed: 42 expensive storage than the actual MDT device(s) since it only needs to have good streaming read/write speed instead of high random IOPS. - - In Lustre software release 2.0 through 2.2, the only successful - way to backup and restore an MDT is to do a device-level backup as is - described in this section. File-level restore of an MDT is not possible - before Lustre software release 2.3, as the Object Index (OI) file cannot - be rebuilt after restore without the OI Scrub functionality. - Since Lustre 2.3, Object Index files are automatically rebuilt at first - mount after a restore is detected (see - LU-957), - and file-level backup is supported (see - ). - If hardware replacement is the reason for the backup or if a spare storage device is available, it is possible to do a raw copy of the MDT or OST from one block device to the other, as long as the new device is at @@ -451,11 +439,8 @@ Changelog records consumed: 42 Even in the face of hardware errors, the ldiskfs file system is very robust and it may be possible to recover the file system data after running - e2fsck -fy /dev/{newdev} on the new device, along with - ll_recover_lost_found_objs for OST devices. - With Lustre software version 2.6 and later, there is - no longer a need to run - ll_recover_lost_found_objs on the OSTs, since the + e2fsck -fy /dev/{newdev} on the new device. + With Lustre software version 2.6 and later, the LFSCK scanning will automatically move objects from lost+found back into its correct location on the OST after directory corruption. @@ -493,19 +478,17 @@ Changelog records consumed: 42 it is the preferred method for migration of OST devices, especially when it is desirable to reformat the underlying file system with different configuration options or to reduce fragmentation. - - Prior to Lustre software release 2.3, the - only successful way to perform an MDT backup and restore was to do a - device-level backup as described in - . The ability to do MDT - file-level backups is not available for Lustre software release 2.0 - through 2.2, because restoration of the Object Index (OI) file does not - return the MDT to a functioning state. - Since Lustre software release 2.3, - Object Index files are automatically rebuilt at first mount after a - restore is detected (see - LU-957), - so file-level MDT restore is supported. + + Since Lustre stores internal metadata that maps FIDs to local + inode numbers via the Object Index file, they need to be rebuilt at + first mount after a restore is detected so that file-level MDT backup + and restore is supported. The OI Scrub rebuilds these automatically + at first mount after a restore is detected, which may affect MDT + performance after mount until the rebuild is completed. Progress can + be monitored via lctl get_param osd-*.*.oi_scrub + on the MDS or OSS node where the target filesystem was restored. + +
<indexterm> @@ -520,7 +503,7 @@ Changelog records consumed: 42</screen> before the unmount of the target (MDT or OST) in order to be able to restore the file system successfully. To enable index backup on the target, execute the following command on the target server:</para> - <screen># lctl set_param osd-zfs.${fsname}-${target}.index_backup=1</screen> + <screen># lctl set_param osd-*.${fsname}-${target}.index_backup=1</screen> <para><replaceable>${target}</replaceable> is composed of the target type (MDT or OST) plus the target index, such as <literal>MDT0000</literal>, <literal>OST0001</literal>, and so on.</para> @@ -535,9 +518,8 @@ Changelog records consumed: 42</screen> <primary>backup</primary> <secondary>OST and MDT</secondary> </indexterm>Backing Up an OST or MDT - For Lustre software release 2.3 and newer with MDT file-level backup - support, substitute mdt for ost - in the instructions below. + If backing up an MDT, substitute mdt for + ost in the instructions below. Umount the target diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml index 0c11e87..e1712de 100644 --- a/BenchmarkingTests.xml +++ b/BenchmarkingTests.xml @@ -822,11 +822,10 @@ Ost# Read(MB/s) Write(MB/s) Read-time Write-time <indexterm><primary>benchmarking</primary> <secondary>MDS performance</secondary></indexterm> Testing MDS Performance (<literal>mds-survey</literal>) - mds-survey is available in Lustre software release - 2.2 and beyond. The mds-survey script tests the local - metadata performance using the echo_client to drive different layers of the - MDS stack: mdd, mdt, osd (the Lustre software only supports mdd stack). It - can be used with the following classes of operations: + The mds-survey script tests the local + metadata performance using the echo_client to drive different layers of + the MDS stack: mdd, mdt, osd (the Lustre software only supports mdd + stack). It can be used with the following classes of operations: Open-create/mkdir/create @@ -1183,4 +1182,4 @@ Ost# Read(MB/s) Write(MB/s) Read-time Write-time
- \ No newline at end of file + diff --git a/ConfiguringLustre.xml b/ConfiguringLustre.xml index 4836276..a2700aa 100644 --- a/ConfiguringLustre.xml +++ b/ConfiguringLustre.xml @@ -127,8 +127,7 @@ mkfs.lustre --fsname= - Optional for Lustre software release 2.4 and later. - Add in additional MDTs. + Optionally add in additional MDTs. mkfs.lustre --fsname= fsname --mgsnode= diff --git a/ConfiguringQuotas.xml b/ConfiguringQuotas.xml index bc5d215..a4a1b7c 100644 --- a/ConfiguringQuotas.xml +++ b/ConfiguringQuotas.xml @@ -27,10 +27,10 @@ xml:id="configuringquotas"> lctl commands (post-mount). - The quota feature in Lustre software is distributed + The quota feature in Lustre software is distributed throughout the system (as the Lustre file system is a distributed file system). Because of this, quota setup and behavior on Lustre is - different from local disk quotas in the following ways: + somewhat different from local disk quotas in the following ways: No single point of administration: some commands must be @@ -45,7 +45,8 @@ xml:id="configuringquotas"> Accuracy: quota information is distributed throughout the file system and can only be accurately calculated with a quiescent file - system. + system in order to minimize performance overhead during normal use. + @@ -55,13 +56,9 @@ xml:id="configuringquotas"> Client does not set the usrquota or - grpquota options to mount. As of Lustre software - release 2.4, space accounting is always enabled by default and quota - enforcement can be enabled/disabled on a per-file system basis with - lctl conf_param. It is worth noting that both - lfs quotaon and - quota_type are deprecated as of Lustre software - release 2.4.0. + grpquota options to mount. Space accounting is + enabled by default and quota enforcement can be enabled/disabled on + a per-filesystem basis with lctl conf_param. @@ -86,14 +83,8 @@ xml:id="configuringquotas"> responsible for management and enforcement. The back-end file system is responsible for resource usage and accounting. Because of this, it is necessary to begin enabling quotas by enabling quotas on the - back-end disk system. Because quota setup is dependent on the Lustre - software version in use, you may first need to run - lctl get_param version to identify - you are currently using. + back-end disk system. -
- Enabling Disk Quotas (Lustre Software Release 2.4 and - later) Quota setup is orchestrated by the MGS and all setup commands in this section must be run directly on the MGS. @@ -163,7 +154,7 @@ xml:id="configuringquotas"> enables the QUOTA feature flag in the superblock which turns quota accounting on at mount time automatically. e2fsck was also modified to fix the quota files when the QUOTA feature flag is present. The - project quota feature is disabled by default, and + project quota feature is disabled by default, and tune2fs needs to be run to enable every target manually. @@ -195,13 +186,6 @@ xml:id="configuringquotas"> - Lustre file systems formatted with a Lustre release prior to 2.4.0 - can be still safely upgraded to release 2.4.0, but will not have - functional space usage report until - tunefs.lustre --quota is run against all targets. This - command sets the QUOTA feature flag in the superblock and runs e2fsck (as - a result, the target must be offline) to build the per-UID/GID disk usage - database. Lustre filesystems formatted with a Lustre release prior to 2.10 can be still safely upgraded to release 2.10, but will not have project quota usage reporting functional until @@ -212,13 +196,11 @@ xml:id="configuringquotas"> considerations. - Lustre software release 2.4 and later requires a version of - e2fsprogs that supports quota (i.e. newer or equal to 1.42.13.wc5, - 1.42.13.wc6 or newer is needed for project quota support) to be - installed on the server nodes using ldiskfs backend (e2fsprogs is not - needed with ZFS backend). In general, we recommend to use the latest - e2fsprogs version available on - + Lustre requires a version of e2fsprogs that supports quota + to be installed on the server nodes when using the ldiskfs backend + (e2fsprogs is not needed with ZFS backend). In general, we recommend + to use the latest e2fsprogs version available on + http://downloads.whamcloud.com/public/e2fsprogs/. The ldiskfs OSD relies on the standard Linux quota to maintain accounting information on disk. As a consequence, the Linux kernel @@ -227,16 +209,11 @@ xml:id="configuringquotas"> CONFIG_QUOTACTL and CONFIG_QFMT_V2 enabled. - As of Lustre software release 2.4.0, quota enforcement is thus - turned on/off independently of space accounting which is always enabled. - lfs quota - on|off as well as the per-target - quota_type parameter are deprecated in favor of a - single per-file system quota parameter controlling inode/block quota - enforcement. Like all permanent parameters, this quota parameter can be - set via - lctl conf_param on the MGS via the following - syntax: + Quota enforcement is turned on/off independently of space + accounting which is always enabled. There is a single per-file + system quota parameter controlling inode/block quota enforcement. + Like all permanent parameters, this quota parameter can be set via + lctl conf_param on the MGS via the command: lctl conf_param fsname.quota.ost|mdt=u|g|p|ugp|none @@ -281,26 +258,22 @@ lctl conf_param fsname.quota.ost|mdtTo turn on user, group, and project quotas for block only on file system testfs1, on the MGS run: - $ lctl conf_param testfs1.quota.ost=ugp - + mgs# lctl conf_param testfs1.quota.ost=ugp To turn on group quotas for inodes on file system testfs2, on the MGS run: - $ lctl conf_param testfs2.quota.mdt=g - + mgs# lctl conf_param testfs2.quota.mdt=g To turn off user, group, and project quotas for both inode and block on file system testfs3, on the MGS run: - $ lctl conf_param testfs3.quota.ost=none - - $ lctl conf_param testfs3.quota.mdt=none - + mgs# lctl conf_param testfs3.quota.ost=none + mgs# lctl conf_param testfs3.quota.mdt=none
- - <indexterm> - <primary>Quotas</primary> - <secondary>verifying</secondary> - </indexterm>Quota Verification - Once the quota parameters have been configured, all targets + + <indexterm> + <primary>Quotas</primary> + <secondary>verifying</secondary> + </indexterm>Quota Verification + Once the quota parameters have been configured, all targets which are part of the file system will be automatically notified of the new quota settings and enable/disable quota enforcement as needed. The per-target enforcement status can still be verified by running the @@ -317,7 +290,6 @@ user uptodate: glb[1],slv[1],reint[0] group uptodate: glb[1],slv[1],reint[0]
-
@@ -325,7 +297,7 @@ group uptodate: glb[1],slv[1],reint[0] <primary>Quotas</primary> <secondary>creating</secondary> </indexterm>Quota Administration - Once the file system is up and running, quota limits on blocks + Once the file system is up and running, quota limits on blocks and inodes can be set for user, group, and project. This is controlled entirely from a client via three quota parameters: @@ -484,13 +456,6 @@ uses an IAM files while the ZFS OSD creates dedicated ZAPs. lctl get_param osd-*.*.quota_slave.limit* - - Prior to 2.4, global quota limits used to be stored in - administrative quota files using the on-disk format of the linux quota - file. When upgrading MDT0000 to 2.4, those administrative quota files are - converted into IAM indexes automatically, conserving existing quota - limits previously set by the administrator. -
@@ -635,37 +600,6 @@ $ cp: writing `/mnt/testfs/foo`: Disk quota exceeded. <primary>Quotas</primary> <secondary>Interoperability</secondary> </indexterm>Quotas and Version Interoperability - The new quota protocol introduced in Lustre software release 2.4.0 - is not compatible with previous - versions. As a consequence, - all Lustre servers must be upgraded to release 2.4.0 - for quota to be functional. Quota limits set on the Lustre file - system prior to the upgrade will be automatically migrated to the new quota - index format. As for accounting information with ldiskfs backend, they will - be regenerated by running - tunefs.lustre --quota against all targets. It is worth - noting that running - tunefs.lustre --quota is - mandatory for all targets formatted with a - Lustre software release older than release 2.4.0, otherwise quota - enforcement as well as accounting won't be functional. - Besides, the quota protocol in release 2.4 takes for granted that the - Lustre client supports the - OBD_CONNECT_EINPROGRESS connect flag. Clients supporting - this flag will retry indefinitely when the server returns - EINPROGRESS in a reply. Here is the list of Lustre client - version which are compatible with release 2.4: - - - Release 2.3-based clients and later - - - Release 1.8 clients newer or equal to release 1.8.9-wc1 - - - Release 2.1 clients newer or equal to release 2.1.4 - - To use the project quota functionality introduced in Lustre 2.10, all Lustre servers and clients must be upgraded to Lustre release 2.10 or later for project quota to work @@ -673,7 +607,7 @@ $ cp: writing `/mnt/testfs/foo`: Disk quota exceeded. clients and not be accounted for on OSTs. Furthermore, the servers may be required to use a patched kernel, for more information see - . + .
diff --git a/ConfiguringStorage.xml b/ConfiguringStorage.xml index 9c2921e..516a5a3 100644 --- a/ConfiguringStorage.xml +++ b/ConfiguringStorage.xml @@ -137,11 +137,17 @@ </section> <section remap="h3"> <title><indexterm><primary>storage</primary><secondary>configuring</secondary><tertiary>external journal</tertiary></indexterm>Choosing Parameters for an External Journal - If you have configured a RAID array and use it directly as an OST, it contains both data and metadata. For better performance, we recommend putting the OST journal on a separate device, by creating a small RAID 1 array and using it as an external journal for the OST. - In a Lustre file system, the default journal size is 400 MB. A journal size of up to 1 - GB has shown increased performance but diminishing returns are seen for larger journals. - Additionally, a copy of the journal is kept in RAM. Therefore, make sure you have enough - memory available to hold copies of all the journals. + If you have configured a RAID array and use it directly as an OST, + it contains both data and metadata. For better performance, we + recommend putting the OST journal on a separate device, by creating a + small RAID 1 array and using it as an external journal for the OST. + + In a typical Lustre file system, the default OST journal size is + up to 1GB, and the default MDT journal size is up to 4GB, in order to + handle a high transaction rate without blocking on journal flushes. + Additionally, a copy of the journal is kept in RAM. Therefore, make + sure you have enough RAM on the servers to hold copies of all journals. + The file system journal options are specified to mkfs.lustre using the --mkfsoptions parameter. For example: --mkfsoptions "other_options -j -J device=/dev/mdJ" diff --git a/Glossary.xml b/Glossary.xml index f0ddd65..b37f23e 100644 --- a/Glossary.xml +++ b/Glossary.xml @@ -53,7 +53,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> D - + Default stripe pattern Information in the LOV descriptor that describes the default @@ -79,20 +79,21 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> new subdirectories at the time they are created. - + Distributed namespace (DNE) A collection of metadata targets serving a single file - system namespace. Prior to DNE, Lustre file systems were limited to a - single metadata target for the entire name space. Without the ability - to distribute metadata load over multiple targets, Lustre file system - performance is limited. Lustre was enhanced with DNE functionality in - two development phases. After completing the first phase of development - in Lustre software version 2.4, Remote Directories - allows the metadata for sub-directories to be serviced by an - independent MDT(s). After completing the second phase of development in - Lustre software version 2.8, Striped Directories - allows files in a single directory to be serviced by multiple MDTs. + system namespace. Without DNE, Lustre file systems are limited to a + single metadata target for the entire name space. Without the ability + to distribute metadata load over multiple targets, Lustre file system + performance may be limited. The DNE functionality has two types of + scalability. Remote Directories (DNE1) allows + sub-directories to be serviced by an independent MDT(s), increasing + aggregate metadata capacity and performance for independent sub-trees + of the filesystem. This also allows performance isolation of workloads + running in a specific sub-directory on one MDT from workloads on other + MDTs. In Lustre 2.8 and later Striped Directories + (DNE2) allows a single directory to be serviced by multiple MDTs. @@ -437,13 +438,13 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> the file data. - - MDT0 + + MDT0000 - The metadata target for the file system root. Since Lustre - software release 2.4, multiple metadata targets are possible in the - same file system. MDT0 is the root of the file system, which must be - available for the file system to be accessible. + The metadata target storing the file system root directory, as + well as some core services such as quota tables. Multiple metadata + targets are possible in the same file system. MDT0000 must be + available for the file system to be accessible. @@ -523,15 +524,6 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> Examples include MDC, OSC, LOV, MDT, and OST. - - OBD API - - The programming interface for configuring OBD devices. This was - formerly also the API for accessing object IO and attribute methods on - both the client and server, but has been replaced by the OSD API in - most parts of the code. - - OBD type @@ -539,14 +531,6 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> Examples of OBD types include the LOV, OSC and OSD. - - Obdfilter - - An older name for the OBD API data object operation device driver - that sits between the OST and the OSD. In Lustre software release 2.4 - this device has been renamed OFD." - - Object storage @@ -663,17 +647,13 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> server restarts. - + Remote directory - A remote directory describes a feature of - Lustre where metadata for files in a given directory may be - stored on a different MDT than the metadata for the parent - directory. Remote directories only became possible with the - advent of DNE Phase 1, which arrived in Lustre version - 2.4. Remote directories are available to system - administrators who wish to provide individual metadata - targets for individual workloads. + A remote directory describes a feature of Lustre where metadata + for files in a given directory may be stored on a different MDT than + the metadata for the parent directory. This is sometimes referred + to as DNE1. @@ -746,13 +726,12 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"> Striped Directory - A striped directory is a feature of Lustre - software where metadata for files in a given directory are - distributed evenly over multiple MDTs. Striped directories + A striped directory is when metadata for files in a given + directory are distributed evenly over multiple MDTs. Striped directories are only available in Lustre software version 2.8 or later. - An administrator can use a striped directory to increase - metadata performance by distributing the metadata requests - in a single directory over two or more MDTs. + A user can create a striped directory to increase metadata + performance of large directories by distributing the metadata + requests in a single directory over two or more MDTs. diff --git a/InstallingLustre.xml b/InstallingLustre.xml index 7a7a63b..6852bda 100644 --- a/InstallingLustre.xml +++ b/InstallingLustre.xml @@ -129,9 +129,10 @@ xml:id="installinglustre"> lustre-tests-ver_lustre.arch - Lustre I/O Kit benchmarking tools - (Included in Lustre software as of - release 2.2) + Scripts and programs used for running regression + tests for Lustre, but likely only of interest to + Lustre developers or testers. + diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml index 271d1b3..c95f7f2 100644 --- a/LustreMaintenance.xml +++ b/LustreMaintenance.xml @@ -230,15 +230,18 @@ mgs# lctl --device MGS llog_print fsname-OST0000
<indexterm><primary>maintenance</primary><secondary>changing a NID</secondary></indexterm> Changing a Server NID - In Lustre software release 2.3 or earlier, the tunefs.lustre - --writeconf command is used to rewrite all of the configuration files. - If you need to change the NID on the MDT or OST, a new - replace_nids command was added in Lustre software release 2.4 to simplify - this process. The replace_nids command differs from tunefs.lustre - --writeconf in that it does not erase the entire configuration log, precluding the - need the need to execute the writeconf command on all servers and - re-specify all permanent parameter settings. However, the writeconf command - can still be used if desired. + In order to totally rewrite the Lustre configuration, the + tunefs.lustre --writeconf command is used to + rewrite all of the configuration files. + If you need to change only the NID of the MDT or OST, the + replace_nids command can simplify this process. + The replace_nids command differs from + tunefs.lustre --writeconf in that it does not + erase the entire configuration log, precluding the need the need to + execute the writeconf command on all servers and + re-specify all permanent parameter settings. However, the + writeconf command can still be used if desired. + Change a server NID in these situations: @@ -344,7 +347,7 @@ Changing a Server NID
-
+
<indexterm> <primary>maintenance</primary> <secondary>adding an MDT</secondary> @@ -356,7 +359,7 @@ Changing a Server NID user or application workloads from other users of the filesystem. It is possible to have multiple remote sub-directories reference the same MDT. However, the root directory will always be located on - MDT0. To add a new MDT into the file system: + MDT0000. To add a new MDT into the file system: Discover the maximum MDT index. Each MDT must have unique index. @@ -483,7 +486,7 @@ Removing and Restoring MDTs and OSTs desire to continue using the filesystem before it is repaired. -
+
<indexterm><primary>maintenance</primary><secondary>removing an MDT</secondary></indexterm>Removing an MDT from the File System If the MDT is permanently inaccessible, lfs rm_entry {directory} can be used to delete the @@ -505,7 +508,7 @@ client$ lfs getstripe --mdt-index /mnt/lustre/local_dir0 The lfs getstripe --mdt-index command returns the index of the MDT that is serving the given directory.
-
+
<indexterm><primary>maintenance</primary></indexterm> <indexterm><primary>maintenance</primary><secondary>inactive MDTs</secondary></indexterm>Working with Inactive MDTs diff --git a/LustreMonitoring.xml b/LustreMonitoring.xml index 610c9b0..67613ff 100644 --- a/LustreMonitoring.xml +++ b/LustreMonitoring.xml @@ -594,11 +594,10 @@ mdd.seb-MDT0000.changelog_deniednext=120 monitoringjobstats Lustre Jobstats - The Lustre jobstats feature is available starting in Lustre software - release 2.3. It collects file system operation statistics for user processes - running on Lustre clients, and exposes them via procfs on the server using - the unique Job Identifier (JobID) provided by the job scheduler for each - job. Job schedulers known to be able to work with jobstats include: + The Lustre jobstats feature collects file system operation statistics + for user processes running on Lustre clients, and exposes on the server + using the unique Job Identifier (JobID) provided by the job scheduler for + each job. Job schedulers known to be able to work with jobstats include: SLURM, SGE, LSF, Loadleveler, PBS and Maui/MOAB. Since jobstats is implemented in a scheduler-agnostic manner, it is likely that it will be able to work with other schedulers also, and also diff --git a/LustreOperations.xml b/LustreOperations.xml index 6f96ebf..7637ba1 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -70,8 +70,7 @@ client# mount -t lustre mds0@tcp0:/short Mount the MDT. - Mount all MDTs if multiple MDTs are - present. + Mount all MDTs if multiple MDTs are present. @@ -427,15 +426,16 @@ client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
-
+
<indexterm> <primary>operations</primary> <secondary>remote directory</secondary> - </indexterm>Creating a sub-directory on a given MDT - Lustre 2.4 enables individual sub-directories to be serviced by - unique MDTs. An administrator can allocate a sub-directory to a given MDT - using the command: + Creating a sub-directory on a specific MDT + It is possible to create individual directories, along with its + files and sub-directories, to be stored on specific MDTs. To create + a sub-directory on a given MDT use the command: + client# lfs mkdir –i mdt_index @@ -450,11 +450,11 @@ client# lfs mkdir –i An administrator can allocate remote sub-directories to separate MDTs. Creating remote sub-directories in parent directories not hosted on - MDT0 is not recommended. This is because the failure of the parent MDT + MDT0000 is not recommended. This is because the failure of the parent MDT will leave the namespace below it inaccessible. For this reason, by - default it is only possible to create remote sub-directories off MDT0. To - relax this restriction and enable remote sub-directories off any MDT, an - administrator must issue the following command on the MGS: + default it is only possible to create remote sub-directories off MDT0000. + To relax this restriction and enable remote sub-directories off any MDT, + an administrator must issue the following command on the MGS: mgs# lctl conf_param fsname.mdt.enable_remote_dir=1 For Lustre filesystem 'scratch', the command executed is: mgs# lctl conf_param scratch.mdt.enable_remote_dir=1 @@ -468,7 +468,7 @@ client# lfs mkdir –i enable_remote_dir_gid. For example, setting this parameter to the 'wheel' or 'admin' group ID allows users with that GID to create and delete remote and striped directories. Setting this - parameter to -1 on MDT0 to permanently allow any + parameter to -1 on MDT0000 to permanently allow any non-root users create and delete remote and striped directories. On the MGS execute the following command: mgs# lctl conf_param fsname.mdt.enable_remote_dir_gid=-1 @@ -840,16 +840,14 @@ mds1# lctl get_param mdt.testfs-MDT0000.recovery_status software tries the first one, and if that fails, it tries the second one.) Two options to - mkfs.lustre can be used to specify failover nodes. - Introduced in Lustre software release 2.0, the + mkfs.lustre can be used to specify failover nodes. The --servicenode option is used to specify all service NIDs, including those for primary nodes and failover nodes. When the --servicenode option is used, the first service node to load the target device becomes the primary service node, while nodes corresponding to the other specified NIDs become failover locations for the - target device. An older option, - --failnode, specifies just the NIDS of failover nodes. - For more information about the + target device. An older option, --failnode, specifies + just the NIDs of failover nodes. For more information about the --servicenode and --failnode options, see llite.fsname_instance.max_cache_mb - - Maximum amount of inactive data cached by the client. The + - Maximum amount of read+write data cached by the client. The default value is 3/4 of the client RAM. @@ -2608,7 +2608,7 @@ debug=neterror warning error emerg console
Interpreting OST Statistics - See also (llobdstat) and + See also (collectl). OST stats files can be used to provide statistics showing activity @@ -2868,7 +2868,7 @@ ost_write 21 2 59 [bytes] 7648424 15019 332725.08 910694 180397.87
Interpreting MDT Statistics - See also (llobdstat) and + See also (collectl). MDT stats files can be used to track MDT diff --git a/LustreRecovery.xml b/LustreRecovery.xml index f9c4210..5b4b390 100644 --- a/LustreRecovery.xml +++ b/LustreRecovery.xml @@ -50,17 +50,18 @@ Transient network partition - For Lustre software release 2.1.x and all earlier releases, all Lustre file system failure - and recovery operations are based on the concept of connection failure; all imports or exports - associated with a given connection are considered to fail if any of them fail. Lustre software - release 2.2.x adds the feature which enables the MGS to - actively inform clients when a target restarts after a failure, failover or other - interruption. - For information on Lustre file system recovery, see . For - information on recovering from a corrupt file system, see . For - information on resolving orphaned objects, a common issue after recovery, see . For information on imperative recovery see + For Lustre, all Lustre file system failure and recovery operations + are based on the concept of connection failure; all imports or exports + associated with a given connection are considered to fail if any of + them fail. The feature allows + the MGS to actively inform clients when a target restarts after a + failure, failover, or other interruption to speed up recovery. + For information on Lustre file system recovery, see + . For information on recovering from a + corrupt file system, see . For + information on resolving orphaned objects, a common issue after recovery, + see . For information on + imperative recovery see
<indexterm><primary>recovery</primary><secondary>client failure</secondary></indexterm>Client Failure @@ -122,8 +123,7 @@ at the time of MDS failure are permitted to reconnect during the recovery window, to avoid the introduction of state changes that might conflict with what is being replayed by previously-connected clients. - Lustre software release 2.4 introduces multiple - metadata targets. If multiple MDTs are in use, active-active failover + If multiple MDTs are in use, active-active failover is possible (e.g. two MDS nodes, each actively serving one or more different MDTs for the same filesystem). See for more information. @@ -403,7 +403,6 @@
<indexterm><primary>imperative recovery</primary></indexterm>Imperative Recovery - Imperative Recovery (IR) was first introduced in Lustre software release 2.2.0. Large-scale Lustre file system implementations have historically experienced problems recovering in a timely manner after a server failure. This is due to the way that clients detect the server failure and how the servers perform their recovery. Many of the processes @@ -647,34 +646,45 @@ obdfilter.testfs-OST0001.instance=5 $ lctl get_param osc.testfs-OST0001-osc-*.import |grep instance instance: 5 -
-
-
- <indexterm><primary>imperative recovery</primary><secondary>Configuration Suggestions</secondary></indexterm>Configuration Suggestions for Imperative Recovery -We used to build the MGS and MDT0 on the same target to save a server node. However, to make - IR work efficiently, we strongly recommend running the MGS node on a separate node for any - significant Lustre file system installation. There are three main advantages of doing this: - -Be able to notify clients if the MDT0 is dead -Load balance. The load on the MDS may be very high which may make the MGS unable to notify the clients in time -Safety. The MGS code is simpler and much smaller compared to the code of MDT. This means the chance of MGS down time due to a software bug is very low. - -
+
+
+
+ <indexterm><primary>imperative recovery</primary><secondary>Configuration Suggestions</secondary></indexterm>Configuration Suggestions for Imperative Recovery + We used to build the MGS and MDT0000 on the same target to save + a server node. However, to make IR work efficiently, we strongly + recommend running the MGS node on a separate node for any + significant Lustre file system installation. There are three main + advantages of doing this: + + Be able to notify clients when MDT0000 recovered. + + Improved load balance. The load on the MDS may be + very high which may make the MGS unable to notify the clients in + time. + Robustness. The MGS code is simpler and much smaller + compared to the MDS code. This means the chance of an MGS downtime + due to a software bug is very low. + + +
<indexterm><primary>suppressing pings</primary></indexterm>Suppressing Pings - On clusters with large numbers of clients and OSTs, OBD_PING messages may impose - significant performance overheads. As an intermediate solution before a more self-contained - one is built, Lustre software release 2.4 introduces an option to suppress pings, allowing - ping overheads to be considerably reduced. Before turning on this option, administrators - should consider the following requirements and understand the trade-offs involved: + On clusters with large numbers of clients and OSTs, + OBD_PING messages may impose significant performance + overheads. There is an option to suppress pings, allowing ping overheads + to be considerably reduced. Before turning on this option, administrators + should consider the following requirements and understand the trade-offs + involved: - When suppressing pings, a target can not detect client deaths, since clients do not - send pings that are only to keep their connections alive. Therefore, a mechanism external - to the Lustre file system shall be set up to notify Lustre targets of client deaths in a - timely manner, so that stale connections do not exist for too long and lock callbacks to + When suppressing pings, a server cannot detect client deaths, + since clients do not send pings that are only to keep their + connections alive. Therefore, a mechanism external to the Lustre + file system shall be set up to notify Lustre targets of client + deaths in a timely manner, so that stale connections do not exist + for too long and lock callbacks to dead clients do not always have to wait for timeouts. diff --git a/LustreTuning.xml b/LustreTuning.xml index 276990a..98df019 100644 --- a/LustreTuning.xml +++ b/LustreTuning.xml @@ -615,14 +615,6 @@ cpu_partition_table= ksocklnd using the nscheds parameter. This adjusts the number of threads for each partition, not the overall number of threads on the LND. - - Lustre software release 2.3 has greatly decreased the default - number of threads for - ko2iblnd and - ksocklnd on high-core count machines. The current - default values are automatically set and are chosen to work well across a - number of typical scenarios. -
ko2iblnd Tuning The following table outlines the ko2iblnd module parameters to be used @@ -1047,7 +1039,7 @@ cpu_partition_table=
-
+
<indexterm> <primary>tuning</primary> diff --git a/ManagingFileSystemIO.xml b/ManagingFileSystemIO.xml index c80d3dd..f396d43 100644 --- a/ManagingFileSystemIO.xml +++ b/ManagingFileSystemIO.xml @@ -592,15 +592,9 @@ osc.testfs-OST0000-osc-ffff81012b2c48e0.checksum_type=[crc32] adler </section> <section remap="h3"> <title>Ptlrpc Thread Pool - Releases prior to Lustre software release 2.2 used two portal RPC - daemons for each client/server pair. One daemon handled all synchronous - IO requests, and the second daemon handled all asynchronous (non-IO) - RPCs. The increasing use of large SMP nodes for Lustre servers exposed - some scaling issues. The lack of threads for large SMP nodes resulted in - cases where a single CPU would be 100% utilized and other CPUs would be - relativity idle. This is especially noticeable when a single client - traverses a large directory. - Lustre software release 2.2.x implements a ptlrpc thread pool, so + The increasing use of large SMP nodes for Lustre servers requires + good scaling of many application threads generating large amounts of IO. + Lustre implements a ptlrpc thread pool, so that multiple threads can be created to serve asynchronous RPC requests. The number of threads spawned is controlled at module load time using module options. By default one thread is spawned per CPU, with a minimum diff --git a/ManagingLNet.xml b/ManagingLNet.xml index cb581da..f3af20a 100644 --- a/ManagingLNet.xml +++ b/ManagingLNet.xml @@ -244,7 +244,7 @@ ents" "even" servers with o2ib2 on rail1.
-
+
<indexterm><primary>LNet</primary></indexterm>Dynamically Configuring LNet Routes Two scripts are provided: diff --git a/ManagingStripingFreeSpace.xml b/ManagingStripingFreeSpace.xml index 9fd099e..137303c 100644 --- a/ManagingStripingFreeSpace.xml +++ b/ManagingStripingFreeSpace.xml @@ -372,12 +372,12 @@ osc.lustre-OST0002-osc.ost_conn_uuid=192.168.20.1@tcp striping remote directories Locating the MDT for a remote directory - Lustre software release 2.4 can be configured with - multiple MDTs in the same file system. Each sub-directory can have a - different MDT. To identify on which MDT a given subdirectory is - located, pass the getstripe [--mdt-index|-M] - parameters to lfs. An example of this command is - provided in the section . + Lustre can be configured with multiple MDTs in the same file + system. Each directory and file could be located on a different MDT. + To identify which MDT a given subdirectory is located, pass the + getstripe [--mdt-index|-M] parameter to + lfs. An example of this command is provided in + the section .
diff --git a/Revision.xml b/Revision.xml index 7c1edf3..3bc8fa6 100644 --- a/Revision.xml +++ b/Revision.xml @@ -10,33 +10,27 @@ Instructions for providing a patch to the existing manual are available at: http://wiki.lustre.org/Lustre_Manual_Changes. - This manual currently covers all the 2.x Lustre software releases. -Features that are specific to individual releases are identified within the -table of contents using a short hand notation (i.e. 'L24' is a Lustre software -release 2.4 specific feature), and within the text using a distinct box. For -example: - - Lustre software release version 2.4 includes support -for multiple metadata servers. + This manual covers a range of Lustre 2.x software + releases. Features that are specific to individual releases are + identified within the table of contents using a short hand notation + (i.e. this paragraph is tagged for a Lustre 2.8 specific feature), + and within the text using a distinct box. For example: which version? version which version of Lustre am I running? - The current version of Lustre - that is in use on the client can be found using the command - lctl get_param version, for example: + The current version of Lustre that is in use on the node can be found + using the command lctl get_param version, for example: $ lctl get_param version -version= -lustre: 2.7.59 -kernel: patchless_client -build: v2_7_59_0-g703195a-CHANGED-3.10.0.lustreopa - +version=2.10.5 + + - Only the latest revision of this document is made readily available -because changes are continually arriving. The current and latest revision of -this manual is available from links maintained at: http://lustre.opensfs.org/documentation/. + Only the latest revision of this document is made readily available + because changes are continually arriving. The current and latest revision + of this manual is available from links maintained at: + http://lustre.opensfs.org/documentation/. diff --git a/SettingLustreProperties.xml b/SettingLustreProperties.xml index 4c2628f..8d2f714 100644 --- a/SettingLustreProperties.xml +++ b/SettingLustreProperties.xml @@ -647,29 +647,6 @@ struct obd_uuid { - LUSTRE_Q_QUOTAON - - - Turns on quotas for a Lustre file system. Deprecated as of 2.4.0. - qc_type is USRQUOTA, - GRPQUOTA or UGQUOTA (both user and group - quota). The quota files must exist. They are normally created with the - llapi_quotacheck call. This call is restricted to the super - user privilege. As of 2.4.0, quota is now enabled on a per file system basis via - lctl conf_param (see ) - on the MGS node and quotacheck isn't needed any more. - - - - - LUSTRE_Q_QUOTAOFF - - - Turns off quotas for a Lustre file system. Deprecated as of 2.4.0. qc_type is USRQUOTA, GRPQUOTA or UGQUOTA (both user and group quota). This call is restricted to the super user privilege. As of 2.4.0, quota is disabled via lctl conf_param (see ). - - - - LUSTRE_Q_GETQUOTA diff --git a/SettingUpLustreSystem.xml b/SettingUpLustreSystem.xml index 4585367..aa25c3c 100644 --- a/SettingUpLustreSystem.xml +++ b/SettingUpLustreSystem.xml @@ -95,24 +95,22 @@ chance that even two disk failures can cause the loss of the whole MDT device. The first failure disables an entire half of the mirror and the second failure has a 50% chance of disabling the remaining mirror. - If multiple MDTs are going to be present in the + If multiple MDTs are going to be present in the system, each MDT should be specified for the anticipated usage and load. For details on how to add additional MDTs to the filesystem, see . - MDT0 contains the root of the Lustre file - system. If MDT0 is unavailable for any reason, the file system cannot be - used. - Using the DNE feature it is possible to - dedicate additional MDTs to sub-directories off the file system root - directory stored on MDT0, or arbitrarily for lower-level subdirectories. - using the lfs mkdir -i mdt_index command. - If an MDT serving a subdirectory becomes unavailable, any subdirectories - on that MDT and all directories beneath it will also become inaccessible. - Configuring multiple levels of MDTs is an experimental feature for the - 2.4 release, and is fully functional in the 2.8 release. This is - typically useful for top-level directories to assign different users - or projects to separate MDTs, or to distribute other large working sets - of files to multiple MDTs. + MDT0000 contains the root of the Lustre file system. If + MDT0000 is unavailable for any reason, the file system cannot be used. + + Using the DNE feature it is possible to dedicate additional + MDTs to sub-directories off the file system root directory stored on + MDT0000, or arbitrarily for lower-level subdirectories, using the + lfs mkdir -i mdt_index + command. If an MDT serving a subdirectory becomes unavailable, any + subdirectories on that MDT and all directories beneath it will also + become inaccessible. This is typically useful for top-level directories + to assign different users or projects to separate MDTs, or to distribute + other large working sets of files to multiple MDTs. Starting in the 2.8 release it is possible to spread a single large directory across multiple MDTs using the DNE striped directory feature by specifying multiple stripes (or shards) @@ -185,7 +183,7 @@ data. This reserved space is unusable for general storage. Thus, at least this much space will be used per OST before any file object data is saved. - With a ZFS backing filesystem for the MDT or OST, + With a ZFS backing filesystem for the MDT or OST, the space allocation for inodes and file data is dynamic, and inodes are allocated as needed. A minimum of 4kB of usable space (before mirroring) is needed for each inode, exclusive of other overhead such as directories, @@ -283,7 +281,7 @@ Inodes will be added approximately in proportion to space added. - + Note that the number of total and free inodes reported by lfs df -i for ZFS MDTs and OSTs is estimated based on the current average space used per inode. When a ZFS filesystem is @@ -294,8 +292,8 @@ better reflect actual site usage. - - Starting in release 2.4, using the DNE remote directory feature + + Using the DNE remote directory feature it is possible to increase the total number of inodes of a Lustre filesystem, as well as increasing the aggregate metadata performance, by configuring additional MDTs into the filesystem, see @@ -595,15 +593,12 @@ Maximum number of MDTs - 256 + 256 - The Lustre software release 2.3 and earlier allows a - maximum of 1 MDT per file system, but a single MDS can host - multiple MDTs, each one for a separate file system. - The Lustre software release 2.4 and later - requires one MDT for the filesystem root. Up to 255 more - MDTs can be added to the filesystem and attached into + A single MDS can host + multiple MDTs, either for separate file systems, or up to 255 + additional MDTs can be added to the filesystem and attached into the namespace with DNE remote or striped directories. @@ -776,8 +771,7 @@ Maximum number of files in the file system - 4 billion (ldiskfs), 256 trillion (ZFS) - this is a per-MDT limit + 4 billion (ldiskfs), 256 trillion (ZFS) per MDT The ldiskfs filesystem imposes an upper limit of @@ -787,11 +781,11 @@ increased initially at the time of MDT filesystem creation. For more information, see . - The ZFS filesystem dynamically allocates + The ZFS filesystem dynamically allocates inodes and does not have a fixed ratio of inodes per unit of MDT space, but consumes approximately 4KiB of mirrored space per inode, depending on the configuration. - Each additional MDT can hold up to the + Each additional MDT can hold up to the above maximum number of additional files, depending on available space and the distribution directories and files in the filesystem. diff --git a/SystemConfigurationUtilities.xml b/SystemConfigurationUtilities.xml index a50f286..5f21e5a 100644 --- a/SystemConfigurationUtilities.xml +++ b/SystemConfigurationUtilities.xml @@ -19,9 +19,6 @@ - - - @@ -793,19 +790,31 @@ lctl > quit
<indexterm><primary>ll_decode_filter_fid</primary></indexterm> ll_decode_filter_fid - The ll_decode_filter_fid utility displays the Lustre object ID and MDT parent FID. + The ll_decode_filter_fid utility displays the + Lustre object ID and MDT parent FID.
Synopsis ll_decode_filter_fid object_file [object_file ...]
Description - The ll_decode_filter_fid utility decodes and prints the Lustre OST object ID, MDT FID, - stripe index for the specified OST object(s), which is stored in the "trusted.fid" - attribute on each OST object. This is accessible to ll_decode_filter_fid - when the OST file system is mounted locally as type ldiskfs for maintenance. - The "trusted.fid" extended attribute is stored on each OST object when it is first modified (data written or attributes set), and is not accessed or modified by Lustre after that time. - The OST object ID (objid) is useful in case of OST directory corruption, though normally the ll_recover_lost_found_objs(8) utility is able to reconstruct the entire OST object directory hierarchy. The MDS FID can be useful to determine which MDS inode an OST object is (or was) used by. The stripe index can be used in conjunction with other OST objects to reconstruct the layout of a file even if the MDT inode was lost. + The ll_decode_filter_fid utility decodes and prints the Lustre OST + object ID, MDT FID, stripe index for the specified OST object(s), which + is stored in the "trusted.fid" attribute on each OST object. + This is accessible to ll_decode_filter_fid when the + OST file system is mounted locally as type ldiskfs for maintenance. + + The "trusted.fid" extended attribute is stored on each + OST object when it is first modified (data written or attributes set), + and is not accessed or modified by Lustre after that time. + The OST object ID (objid) may be useful in case of OST directory + corruption, though LFSCK can normally reconstruct the entire OST object + directory tree, see for details. + The MDS FID can be useful to determine which MDS inode an OST object + is (or was) used by. The stripe index can be used in conjunction with + other OST objects to reconstruct the layout of a file even if the MDT + inode was lost. +
Examples @@ -837,75 +846,6 @@ ll_recover_lost_found_objs online scanning will automatically move objects from lost+found to the proper place in the OST. - - The ll_recover_lost_found_objs tool is not - strictly necessary to bring an OST back online, it just avoids losing - access to objects that were moved to the lost+found directory due to - directory corruption on the OST. - -
- Synopsis - $ ll_recover_lost_found_objs [-hv] -d directory -
-
- Description - The first time Lustre modifies an object, it saves the MDS inode number and the objid as an extended attribute on the object, so in case of directory corruption of the OST, it is possible to recover the objects. Running e2fsck fixes the corrupted OST directory, but it puts all of the objects into a lost and found directory, where they are inaccessible to Lustre. Use the ll_recover_lost_found_objs utility to recover all (or at least most) objects from a lost and found directory and return them to the O/0/d* directories. - To use ll_recover_lost_found_objs, mount the file system locally (using the -t ldiskfs, or -t zfs command), run the utility and then unmount it again. The OST must not be mounted by Lustre when ll_recover_lost_found_objs is run. -
-
- Options - - - - - - - - Option - - - Description - - - - - - - -h - - - Prints a help message - - - - - -v - - - Increases verbosity - - - - - -d directory - - - Sets the lost and found directory path - - - - - -
-
- Example - ll_recover_lost_found_objs -d /mnt/ost/lost+found -
-
-
- <indexterm><primary>llodbstat</primary></indexterm> -llobdstat - The llobdstat utility displays OST statistics.
Synopsis llobdstat ost_name [interval] diff --git a/TroubleShootingRecovery.xml b/TroubleShootingRecovery.xml index 9aa6a88..85b9ceb 100644 --- a/TroubleShootingRecovery.xml +++ b/TroubleShootingRecovery.xml @@ -207,10 +207,8 @@ root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually restoring from a file-level MDT backup ( ), or in case the OI Table is otherwise corrupted. Later phases of LFSCK will add further checks to the - Lustre distributed file system state. - In Lustre software release 2.4, LFSCK namespace - scanning can verify and repair the directory FID-in-dirent and LinkEA - consistency. + Lustre distributed file system state. LFSCK namespace scanning can verify + and repair the directory FID-in-dirent and LinkEA consistency. In Lustre software release 2.6, LFSCK layout scanning can verify and repair MDT-OST file layout inconsistencies. File layout inconsistencies between MDT-objects and OST-objects that are checked and @@ -436,8 +434,7 @@ root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually started. Anytime the LFSCK is triggered, the OI scrub will run automatically, so there is no need to specify OI_scrub in that case. - - namespace: check and repair + namespace: check and repair FID-in-dirent and LinkEA consistency. Lustre-2.7 enhances namespace consistency verification under DNE mode. @@ -818,7 +815,7 @@ root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually
-
+
LFSCK status of namespace via <literal>procfs</literal>
diff --git a/UnderstandingFailover.xml b/UnderstandingFailover.xml index 09d57dc..a54cbd0 100644 --- a/UnderstandingFailover.xml +++ b/UnderstandingFailover.xml @@ -118,15 +118,13 @@ xml:id="understandingfailover"> from the failed node. - In Lustre software releases previous to Lustre software release - 2.4, MDSs can be configured as an active/passive pair, while OSSs can be - deployed in an active/active configuration that provides redundancy + If there is a single MDT in a filesystem, two MDSes can be + configured as an active/passive pair, while pairs of OSSes can be + deployed in an active/active configuration that improves OST availability without extra overhead. Often the standby MDS is the active MDS for another Lustre file system or the MGS, so no nodes are idle in the - cluster. - Lustre software release 2.4 introduces metadata - targets for individual sub-directories. Active-active failover - configurations are available for MDSs that serve MDTs on shared + cluster. If there are multiple MDTs in a filesystem, active-active + failover configurations are available for MDSs that serve MDTs on shared storage.
@@ -149,11 +147,10 @@ xml:id="understandingfailover"> For MDT failover, two MDSs can be configured to serve the same - MDT. Only one MDS node can serve an MDT at a time. - Lustre software release 2.4 allows multiple MDTs. - By placing two or more MDT partitions on storage shared by two MDSs, - one MDS can fail and the remaining MDS can begin serving the unserved - MDT. This is described as an active/active failover pair. + MDT. Only one MDS node can serve any MDT at one time. + By placing two or more MDT devices on storage shared by two MDSs, + one MDS can fail and the remaining MDS can begin serving the unserved + MDT. This is described as an active/active failover pair. For OST failover, multiple OSS nodes can be configured to be able @@ -222,14 +219,13 @@ xml:id="understandingfailover">
-
+
<indexterm> <primary>failover</primary> <secondary>MDT</secondary> </indexterm>MDT Failover Configuration (Active/Active) - Multiple MDTs became available with the advent of Lustre software - release 2.4. MDTs can be setup as an active/active failover + MDTs can be configured as an active/active failover configuration. A failover cluster is built from two MDSs as shown in .
diff --git a/UnderstandingLustre.xml b/UnderstandingLustre.xml index ffde1f7..9a1630e 100644 --- a/UnderstandingLustre.xml +++ b/UnderstandingLustre.xml @@ -229,8 +229,7 @@ xml:id="understandinglustre"> MDS count: - 1 primary + 1 standby - 256 MDSs, with up to 256 MDTs + 256 MDSs, with up to 256 MDTs @@ -304,11 +303,10 @@ xml:id="understandinglustre"> additional functionality needed by the Lustre file system. - With the Lustre software release 2.4 and later, - it is also possible to use ZFS as the backing filesystem for Lustre - for the MDT, OST, and MGS storage. This allows Lustre to leverage the - scalability and data integrity features of ZFS for individual storage - targets. + It is also possible to use ZFS as the backing filesystem for + Lustre for the MDT, OST, and MGS storage. This allows Lustre to + leverage the scalability and data integrity features of ZFS for + individual storage targets. @@ -335,9 +333,8 @@ xml:id="understandinglustre"> High-availability:The Lustre file system supports active/active failover using shared storage - partitions for OSS targets (OSTs). Lustre software release 2.3 and - earlier releases offer active/passive failover using a shared storage - partition for the MDS target (MDT). The Lustre file system can work + partitions for OSS targets (OSTs), and for MDS targets (MDTs). + The Lustre file system can work with a variety of high availability (HA) managers to allow automated failover and has no single point of failure (NSPF). This allows application transparent recovery. Multiple mount protection (MMP) @@ -345,11 +342,10 @@ xml:id="understandinglustre"> systems that would otherwise cause file system corruption. - With Lustre software release 2.4 or later - servers and clients it is possible to configure active/active - failover of multiple MDTs. This allows scaling the metadata - performance of Lustre filesystems with the addition of MDT storage - devices and MDS nodes. + It is possible to configure active/active failover of multiple + MDTs. This allows scaling the metadata performance of Lustre + filesystems with the addition of MDT storage devices and MDS nodes. + @@ -508,16 +504,16 @@ xml:id="understandinglustre"> - Metadata Targets (MDT) - For Lustre - software release 2.3 and earlier, each file system has one MDT. The + Metadata Targets (MDT) - Each + filesystem has at least one MDT. The MDT stores metadata (such as filenames, directories, permissions and file layout) on storage attached to an MDS. Each file system has one MDT. An MDT on a shared storage target can be available to multiple MDSs, although only one can access it at a time. If an active MDS fails, a standby MDS can serve the MDT and make it available to clients. This is referred to as MDS failover. - Since Lustre software release 2.4, multiple - MDTs are supported in the Distributed Namespace Environment (DNE). + Multiple MDTs are supported in the Distributed Namespace + Environment (DNE). In addition to the primary MDT that holds the filesystem root, it is possible to add additional MDS nodes, each with their own MDTs, to hold sub-directory trees of the filesystem. @@ -705,44 +701,18 @@ xml:id="understandinglustre"> Lustre I/O Lustre File System Storage and I/O - In Lustre software release 2.0, Lustre file identifiers (FIDs) were - introduced to replace UNIX inode numbers for identifying files or objects. - A FID is a 128-bit identifier that contains a unique 64-bit sequence - number, a 32-bit object ID (OID), and a 32-bit version number. The sequence - number is unique across all Lustre targets in a file system (OSTs and - MDTs). This change enabled future support for multiple MDTs (introduced in - Lustre software release 2.4) and ZFS (introduced in Lustre software release - 2.4). - Also introduced in release 2.0 is an ldiskfs feature named - FID-in-dirent(also known as - dirdata) in which the FID is stored as - part of the name of the file in the parent directory. This feature - significantly improves performance for - ls command executions by reducing disk I/O. The - FID-in-dirent is generated at the time the file is created. - - The FID-in-dirent feature is not backward compatible with the - release 1.8 ldiskfs disk format. Therefore, when an upgrade from - release 1.8 to release 2.x is performed, the FID-in-dirent feature is - not automatically enabled. For upgrades from release 1.8 to releases - 2.0 through 2.3, FID-in-dirent can be enabled manually but only takes - effect for new files. - For more information about upgrading from Lustre software release - 1.8 and enabling FID-in-dirent for existing files, see - Chapter 16 “Upgrading a Lustre File - System”. - - The LFSCK file system consistency checking tool - released with Lustre software release 2.4 provides functionality that - enables FID-in-dirent for existing files. It includes the following - functionality: + Lustre uses file identifiers (FIDs) to replace UNIX inode numbers + for identifying files or objects. A FID is a 128-bit identifier that + contains a unique 64-bit sequence number, a 32-bit object ID (OID), and + a 32-bit version number. The sequence number is unique across all Lustre + targets in a file system (OSTs and MDTs). This allows Lustre to identify + files on multiple MDTs independent of the underlying filesystem type. + + The LFSCK file system consistency checking tool verifies the + consistency of file objects between MDTs and OSTs. It includes the + following functionality: - Generates IGIF mode FIDs for existing files from a 1.8 version - file system files. - - Verifies the FID-in-dirent for each file and regenerates the FID-in-dirent if it is invalid or missing. @@ -897,10 +867,6 @@ xml:id="understandinglustre"> 31.25 PiB for ldiskfs or 8EiB with ZFS. Note that a Lustre file system can support files up to 2^63 bytes (8EiB), limited only by the space available on the OSTs. - - Versions of the Lustre software prior to Release 2.2 limited the - maximum stripe count for a single file to 160 OSTs. - Although a single file can only be striped over 2000 objects, Lustre file systems can have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O bandwidth to the objects in a diff --git a/UpgradingLustre.xml b/UpgradingLustre.xml index 78629f8..3b2c9f2 100644 --- a/UpgradingLustre.xml +++ b/UpgradingLustre.xml @@ -92,78 +92,19 @@ xml:id="upgradinglustre"> multiple MDSs - large_xattr - ea_inode + ea_inode + large_xattr wide striping - large_xattr - ea_inode + ea_inode + large_xattr Upgrading to Lustre Software Release 2.x (Major Release) The procedure for upgrading from a Lustre software release 2.x to a - more recent 2.x release of the Lustre software is described in this - section. - - This procedure can also be used to upgrade Lustre software release - 1.8.6-wc1 or later to any Lustre software release 2.x. To upgrade other - versions of Lustre software release 1.8.x, contact your support - provider. - - - In Lustre software release 2.2, a feature has been added for - ldiskfs-based MDTs that allows striping a single file across up to 2000 - OSTs. By default, this "wide striping" feature is disabled. It is - activated by setting the ea_inode option on the MDT - using either mkfs.lustre or tune2fs. - For example after upgrading an existing file system to Lustre software - release 2.2 or later, wide striping can be enabled by running the - following command on the MDT device before mounting it: - tune2fs -O large_xattr - Once the wide striping feature is enabled and in use on the MDT, it is - not possible to directly downgrade the MDT file system to an earlier - version of the Lustre software that does not support wide striping. To - disable wide striping: - - - Delete all wide-striped files, OR - use lfs_migrate -c 160 (or fewer stripes) - to migrate the files to use fewer OSTs. This does not affect the - total number of OSTs that the whole filesystem can access. - - - Unmount the MDT. - - - Run the following command to turn off the - large_xattr option: - tune2fs -O ^large_xattr - - Using either - mkfs.lustre or - tune2fs with - large_xattr or - ea_inode option reseults in - ea_inode in the file system feature list. - - - To generate a list of all files with more than 160 stripes use - lfs find with the - --stripe-count option: - lfs find ${mountpoint} --stripe-count=+160 - - - In Lustre software release 2.4, a new feature allows using multiple - MDTs, which can each serve one or more remote sub-directories in the file - system. The - root directory is always located on MDT0. - Note that clients running a release prior to the Lustre software - release 2.4 can only see the namespace hosted by MDT0 and will return an - IO error if an attempt is made to access a directory on another - MDT. - - To upgrade a Lustre software release 2.x to a more recent major - release, complete these steps: + more recent 2.y major release of the Lustre software is described in this + section. To upgrade an existing 2.x installation to a more recent major + release, complete the following steps: Create a complete, restorable file system backup. @@ -265,17 +206,19 @@ xml:id="upgradinglustre"> - (Optional) For upgrades to Lustre software release 2.2 or higher, - to enable wide striping on an existing MDT, run the following command - on the MDT: - tune2fs -O ea_inode /dev/mdtdev + Lustre allows striping a single file across up to + 2000 OSTs. Before Lustre 2.13, the "wide striping" feature that + allowed creating files with more than 160 stripes was not enabled by + default. From the 2.13 release onward, the ea_inode + feature is enabled for newly-formatted MDTs. The feature can also + be enabled by the tune2fs command on existing MDTs: + mds# tune2fs -O ea_inode /dev/mdtdev For more information about wide striping, see . - (Optional) For upgrades to Lustre software release 2.4 or higher, - to format an additional MDT, complete these steps: + (Optional) To format an additional MDT, complete these steps: Determine the index used for the first MDT (each MDT must @@ -295,16 +238,6 @@ xml:id="upgradinglustre"> - (Optional) If you are upgrading to Lustre software release 2.3 or - higher from Lustre software release 2.2 or earlier and want to enable - the quota feature, complete these steps: - - - Before setting up the file system, enter on both the MDS and - OSTs: - tunefs.lustre --quota - - (Optional) If you are upgrading before Lustre software release 2.10, to enable the project quota feature enter the following on every ldiskfs backend target: @@ -315,32 +248,9 @@ xml:id="upgradinglustre"> should only be enabled if the project quota feature is required and/or after it is known that the upgraded release does not need to be downgraded. - - - When setting up the file system, enter: - conf_param $FSNAME.quota.mdt=$QUOTA_TYPE + When setting up the file system, enter: + conf_param $FSNAME.quota.mdt=$QUOTA_TYPE conf_param $FSNAME.quota.ost=$QUOTA_TYPE - - - - - (Optional) If you are upgrading from Lustre software release 1.8, - you must manually enable the FID-in-dirent feature. On the MDS, enter: - tune2fs –O dirdata /dev/mdtdev - - This step is not reversible. Do not complete this step until - you are sure you will not be downgrading the Lustre software. - - This step only enables FID-in-dirent for newly - created files. If you are upgrading to Lustre software release 2.4, - you can use namespace LFSCK to enable FID-in-dirent for the existing - files. For the case of upgrading from Lustre software release 1.8, it is - important to note that if you do NOT enable dirdata via - the tune2fs command above, the namespace LFSCK will NOT - generate FID-in-dirent for the existing files. For more information about - FID-in-dirent and related functionalities in LFSCK, see - . Start the Lustre file system by starting the components in the diff --git a/UserUtilities.xml b/UserUtilities.xml index 9c19dfa..5c7684b 100644 --- a/UserUtilities.xml +++ b/UserUtilities.xml @@ -484,9 +484,8 @@ lfs help default values for unspecified fields. If the striping EA is not set, 0, 0, and -1 will be printed for the stripe count, size, and offset respectively. - The - --mdt-index prints the index of the MDT for a given - directory. See + The --mdt-index prints the index of + the MDT for a given directory. See . diff --git a/style/customstyle_common.xsl b/style/customstyle_common.xsl index e26cd07..624f1ee 100644 --- a/style/customstyle_common.xsl +++ b/style/customstyle_common.xsl @@ -32,18 +32,6 @@ - - - - - - - - - - - - @@ -167,12 +155,6 @@ - - L 2.3 - - - L 2.4 - L 2.5 diff --git a/style/customstyle_fo.xsl b/style/customstyle_fo.xsl index abee7a8..52e546e 100644 --- a/style/customstyle_fo.xsl +++ b/style/customstyle_fo.xsl @@ -146,8 +146,6 @@ - Introduced in Lustre 2.3 - Introduced in Lustre 2.4 Introduced in Lustre 2.5 Introduced in Lustre 2.6 Introduced in Lustre 2.7 @@ -157,6 +155,9 @@ Introduced in Lustre 2.11 Introduced in Lustre 2.12 Introduced in Lustre 2.13 + Introduced in Lustre 2.14 + Introduced in Lustre 2.15 + Introduced in Lustre 2.16 Documentation Error: unrecognised condition attribute