From 300f536bc61bb03d31b4f7b13daac12840db7c42 Mon Sep 17 00:00:00 2001 From: Richard Henwood Date: Thu, 9 May 2013 13:27:25 -0500 Subject: [PATCH] LUDOC-118 trademarks: trademark policy must be applied. A trademark compliance review was completed on the opening content and Section 1: Lustre Intro and a few additional minor edits were made. Signed-off-by: Richard Henwood Signed-off-by: Linda Bebernes Change-Id: Iaedab4e3d1b4f53e2d0d6c35171c4b270d37b909 Reviewed-on: http://review.whamcloud.com/6303 Tested-by: Hudson --- I_LustreIntro.xml | 9 +- Preface.xml | 28 ++- Revision.xml | 45 ++--- UnderstandingLustre.xml | 485 ++++++++++++++++++++++++++++++++++++------------ index.xml | 2 +- 5 files changed, 409 insertions(+), 160 deletions(-) diff --git a/I_LustreIntro.xml b/I_LustreIntro.xml index 167c9f2..3b6972e 100644 --- a/I_LustreIntro.xml +++ b/I_LustreIntro.xml @@ -1,9 +1,10 @@ - Introducing Lustre + Introducing the Lustre* File System - Part I provides background information to help you understand the Lustre architecture and how the major components fit together. You will find information in this section about: - + Part I provides background information to help you understand the Lustre* file system + architecture and how the major components fit together. You will find information in + this section about: @@ -25,7 +26,7 @@ - + diff --git a/Preface.xml b/Preface.xml index 87b2a3f..95741f8 100644 --- a/Preface.xml +++ b/Preface.xml @@ -1,24 +1,30 @@ Preface - This operations manual provides detailed information and procedures to install, configure and tune the Lustre file system. The manual covers topics such as failover, quotas, striping and bonding. The Lustre manual also contains troubleshooting information and tips to improve Lustre operation and performance. + The Lustre* Operations Manual provides detailed + information and procedures to install, configure and tune a Lustre file system. The manual + covers topics such as failover, quotas, striping, and bonding. This manual also contains + troubleshooting information and tips to improve the operation and performance of a Lustre file + system.
About this Document This document is maintained by Intel in Docbook format. The canonical version is available at http://wiki.whamcloud.com/display/PUB/Documentation.
UNIX Commands - This document might not contain information about basic UNIX commands and procedures such as shutting down the system, booting the system, and configuring devices. Refer to the following for this information: + This document may not contain information about basic UNIX* operating system commands + and procedures such as shutting down the system, booting the system, and configuring + devices. Refer to the following for this information: Software documentation that you received with your system - Red Hat Enterprise Linux documentation, which is at: - http://docs.redhat.com/docs/en-US/index.html - - The Lustre client module is available for many different Linux versions and distributions. Red Hat Enterprise is the best supported and tested platform for Lustre Servers. - + Red Hat* Enterprise Linux* documentation, which is at: http://docs.redhat.com/docs/en-US/index.html + The Lustre client module is available for many different Linux* versions and distributions. + Red Hat Enterprise is the best supported and tested platform for Lustre servers.
@@ -107,7 +113,8 @@ Latest information - Lustre 2.0 Release Notes + + Lustre* 2.0 Release Notes PDF @@ -121,7 +128,8 @@ Service - Lustre 2.0 Operations Manual + + Lustre* 2.0 Operations Manual PDF @@ -151,5 +159,5 @@
- +
diff --git a/Revision.xml b/Revision.xml index 9067f79..780fe9b 100644 --- a/Revision.xml +++ b/Revision.xml @@ -1,27 +1,28 @@
Revisions - - Note that the Lustre 2.x manual is intended to be relevant for all 2.x - releases of Lustre. Most of the manual content is relevant to all of - the releases. However, any features and content that are added for new - versions of Lustre should be clearly marked with the version in which - this functionality is added. Similarly, features that are no longer - available in Lustre 2.x should be marked with the version in which they - are deprecated or removed. - - - 2.3 - November 2012 - Intel Corporation - Release of Lustre 2.3 manual - - - 2.1 - June 2011 - Whamcloud, Inc - First release of Lustre 2.1 manual - - + Note that the Lustre* Operations Manual 2.x is + intended to be relevant for all 2.x releases of the Lustre software. Most of the manual + content is relevant to all of the releases. However, any features and content that are added + for new versions of the Lustre software are clearly marked with the version in which this + functionality is added. Similarly, features that are no longer available are marked with + the version in which they were deprecated or removed. + + 2.3 + November 2012 + + Intel Corporation + + Release of Lustre 2.3 manual + + + 2.1 + June 2011 + + Whamcloud, Inc + + First release of Lustre 2.1 manual + +
diff --git a/UnderstandingLustre.xml b/UnderstandingLustre.xml index 4f2ef12..f1a482e 100644 --- a/UnderstandingLustre.xml +++ b/UnderstandingLustre.xml @@ -1,37 +1,71 @@ - - Understanding Lustre + + Understanding Lustre Architecture - This chapter describes the Lustre architecture and features of Lustre. It includes the following sections: + This chapter describes the Lustre architecture and features of Lustre. It includes the + following sections: - - + + - - + + - - + +
- <indexterm><primary>Lustre</primary></indexterm>What Lustre Is (and What It Isn't) - Lustre is a storage architecture for clusters. The central component of the Lustre architecture is the Lustre file system, which is supported on the Linux operating system and provides a POSIX-compliant UNIX file system interface. - The Lustre storage architecture is used for many different kinds of clusters. It is best known for powering seven of the ten largest high-performance computing (HPC) clusters worldwide, with tens of thousands of client systems, petabytes (PB) of storage and hundreds of gigabytes per second (GB/sec) of I/O throughput. Many HPC sites use Lustre as a site-wide global file system, serving dozens of clusters. - The ability of a Lustre file system to scale capacity and performance for any need reduces the need to deploy many separate file systems, such as one for each compute cluster. Storage management is simplified by avoiding the need to copy data between compute clusters. In addition to aggregating storage capacity of many servers, the I/O throughput is also aggregated and scales with additional servers. Moreover, throughput and/or capacity can be easily increased by adding servers dynamically. - While Lustre can function in many work environments, it is not necessarily the best choice for all applications. It is best suited for uses that exceed the capacity that a single server can provide, though in some use cases Lustre can perform better with a single server than other filesystems due to its strong locking and data coherency. - Lustre is currently not particularly well suited for "peer-to-peer" usage models where there are clients and servers running on the same node, each sharing a small amount of storage, due to the lack of Lustre-level data replication. In such uses, if one client/server fails, then the data stored on that node will not be accessible until the node is restarted. + <indexterm> + <primary>Lustre</primary> + </indexterm>What a Lustre File System Is (and What It Isn't) + The Lustre architecture is a storage architecture for clusters. The central component of + the Lustre architecture is the Lustre file system, which is supported on the Linux operating + system and provides a POSIX-compliant UNIX file system interface. + The Lustre storage architecture is used for many different kinds of clusters. It is best + known for powering many of the largest high-performance computing (HPC) clusters worldwide, + with tens of thousands of client systems, petabytes (PB) of storage and hundreds of gigabytes + per second (GB/sec) of I/O throughput. Many HPC sites use a Lustre file system as a site-wide + global file system, serving dozens of clusters. + The ability of a Lustre file system to scale capacity and performance for any need reduces + the need to deploy many separate file systems, such as one for each compute cluster. Storage + management is simplified by avoiding the need to copy data between compute clusters. In + addition to aggregating storage capacity of many servers, the I/O throughput is also + aggregated and scales with additional servers. Moreover, throughput and/or capacity can be + easily increased by adding servers dynamically. + While a Lustre file system can function in many work environments, it is not necessarily + the best choice for all applications. It is best suited for uses that exceed the capacity that + a single server can provide, though in some use cases, a Lustre file system can perform better + with a single server than other file systems due to its strong locking and data + coherency. + A Lustre file system is currently not particularly well suited for + "peer-to-peer" usage models where clients and servers are running on the same node, + each sharing a small amount of storage, due to the lack of Lustre-level data replication. In + such uses, if one client/server fails, then the data stored on that node will not be + accessible until the node is restarted.
- <indexterm><primary>Lustre</primary><secondary>features</secondary></indexterm>Lustre Features - Lustre runs on a variety of vendor's kernels. For more details, see Lustre Support Matrix on the Intel Lustre wiki. - A Lustre installation can be scaled up or down with respect to the number of client nodes, disk storage and bandwidth. Scalability and performance are dependent on available disk and network bandwidth and the processing power of the servers in the system. Lustre can be deployed in a wide variety of configurations that can be scaled well beyond the size and performance observed in production systems to date. - shows the practical range of scalability and performance characteristics of the Lustre file system and some test results in production systems. + <indexterm> + <primary>Lustre</primary> + <secondary>features</secondary> + </indexterm>Lustre Features + Lustre file systems run on a variety of vendor's kernels. For more details, see the + Lustre Support + Matrix on the Intel Lustre community wiki. + A Lustre installation can be scaled up or down with respect to the number of client + nodes, disk storage and bandwidth. Scalability and performance are dependent on available + disk and network bandwidth and the processing power of the servers in the system. A Lustre + file system can be deployed in a wide variety of configurations that can be scaled well + beyond the size and performance observed in production systems to date. + shows the practical range of scalability and + performance characteristics of a Lustre file system and some test results in production + systems. Lustre Scalability and Performance @@ -54,7 +88,8 @@ - Client Scalability + + Client Scalability 100-100000 @@ -68,13 +103,15 @@ Client Performance - Single client: + + Single client: I/O 90% of network bandwidth Aggregate: 2.5 TB/sec I/O - Single client: + + Single client: 2 GB/sec I/O, 1000 metadata ops/sec Aggregate: 240 GB/sec I/O @@ -82,63 +119,79 @@ - OSS Scalability + + OSS Scalability - Single OSS: + + Single OSS: 1-32 OSTs per OSS, 128TB per OST - OSS count: + + OSS count: 500 OSSs, with up to 4000 OSTs - - Single OSS: + + Single OSS: 8 OSTs per OSS, 16TB per OST - OSS count: + + OSS count: 450 OSSs with 1000 4TB OSTs 192 OSSs with 1344 8TB OSTs - OSS Performance + + OSS Performance - Single OSS: + + Single OSS: 5 GB/sec - Aggregate: + + Aggregate: 2.5 TB/sec - Single OSS: + + Single OSS: 2.0+ GB/sec - Aggregate: + + Aggregate: 240 GB/sec - MDS Scalability + + MDS Scalability - Single MDS: + + Single MDS: 4 billion files - MDS count: + + MDS count: 1 primary + 1 backup - Since Lustre 2.4: up to 4096 MDSs and up to 4096 MDTs. + Since Lustre* Release 2.4: up to 4096 MDSs and up to 4096 + MDTs. - Single MDS: + + Single MDS: 750 million files - MDS count: + + MDS count: 1 primary + 1 backup - MDS Performance + + MDS Performance 35000/s create operations, @@ -151,18 +204,23 @@ - Filesystem Scalability + + File system Scalability - Single File: + + Single File: 2.5 PB max file size - Aggregate: + + Aggregate: 512 PB space, 4 billion files - Single File: + + Single File: multi-TB max file size - Aggregate: + + Aggregate: 10 PB space, 750 million files @@ -172,108 +230,207 @@ Other Lustre features are: - Performance-enhanced ext4 file system: Lustre uses an improved version of the ext4 journaling file system to store data and metadata. This version, called ldiskfs, has been enhanced to improve performance and provide additional functionality needed by Lustre. + Performance-enhanced ext4 file system: The Lustre + file system uses an improved version of the ext4 journaling file system to store data + and metadata. This version, called ldiskfs, has been enhanced to improve performance and + provide additional functionality needed by the Lustre file system. - POSIX compliance: The full POSIX test suite passes in an identical manner to a local ext4 filesystem, with limited exceptions on Lustre clients. In a cluster, most operations are atomic so that clients never see stale data or metadata. Lustre supports mmap() file I/O. + POSIX* compliance: The full POSIX test suite passes + in an identical manner to a local ext4 filesystem, with limited exceptions on Lustre + clients. In a cluster, most operations are atomic so that clients never see stale data + or metadata. The Lustre software supports mmap() file I/O. - High-performance heterogeneous networking: Lustre supports a variety of high performance, low latency networks and permits Remote Direct Memory Access (RDMA) for Infiniband (OFED) and other advanced networks for fast and efficient network transport. Multiple RDMA networks can be bridged using Lustre routing for maximum performance. Lustre also provides integrated network diagnostics. + High-performance heterogeneous networking: The + Lustre software supports a variety of high performance, low latency networks and permits + Remote Direct Memory Access (RDMA) for Infiniband* (OFED) and other advanced networks + for fast and efficient network transport. Multiple RDMA networks can be bridged using + Lustre routing for maximum performance. The Lustre software also includes integrated + network diagnostics. - High-availability: Lustre offers active/active failover using shared storage partitions for OSS targets (OSTs). Lustre 2.3 and earlier offers active/passive failover using a shared storage partition for the MDS target (MDT).With Lustre 2.4 or later servers and clients it is possible to configure active/active failover of multiple MDTsThis allows application transparent recovery. Lustre can work with a variety of high availability (HA) managers to allow automated failover and has no single point of failure (NSPF). Multiple mount protection (MMP) provides integrated protection from errors in highly-available systems that would otherwise cause file system corruption. + High-availability: The Lustre file system supports + active/active failover using shared storage partitions for OSS targets (OSTs). Lustre + Release 2.3 and earlier releases offer active/passive failover using a shared storage + partition for the MDS target (MDT). + With Lustre Release 2.4 or later servers and clients it is possible + to configure active/active failover of multiple MDTs. This allows application + transparent recovery. The Lustre file system can work with a variety of high + availability (HA) managers to allow automated failover and has no single point of + failure (NSPF). Multiple mount protection (MMP) provides integrated protection from + errors in highly-available systems that would otherwise cause file system + corruption. - Security: By default TCP connections are only allowed from privileged ports. Unix group membership is verified on the MDS. + Security: By default TCP connections are only + allowed from privileged ports. UNIX group membership is verified on the MDS. - Access control list (ACL), extended attributes: the Lustre security model follows that of a UNIX file system, enhanced with POSIX ACLs. Noteworthy additional features include root squash. + Access control list (ACL), extended attributes: the + Lustre security model follows that of a UNIX file system, enhanced with POSIX ACLs. + Noteworthy additional features include root squash. - Interoperability: Lustre runs on a variety of CPU architectures and mixed-endian clusters and is interoperable between successive major Lustre software releases. + Interoperability: The Lustre file system runs on a + variety of CPU architectures and mixed-endian clusters and is interoperable between + successive major Lustre software releases. - Object-based architecture: Clients are isolated from the on-disk file structure enabling upgrading of the storage architecture without affecting the client. + Object-based architecture: Clients are isolated + from the on-disk file structure enabling upgrading of the storage architecture without + affecting the client. - Byte-granular file and fine-grained metadata locking: Many clients can read and modify the same file or directory concurrently. The Lustre distributed lock manager (LDLM) ensures that files are coherent between all clients and servers in the filesystem. The MDT LDLM manages locks on inode permissions and pathnames. Each OST has its own LDLM for locks on file stripes stored thereon, which scales the locking performance as the filesystem grows. + Byte-granular file and fine-grained metadata + locking: Many clients can read and modify the same file or directory + concurrently. The Lustre distributed lock manager (LDLM) ensures that files are coherent + between all clients and servers in the file system. The MDT LDLM manages locks on inode + permissions and pathnames. Each OST has its own LDLM for locks on file stripes stored + thereon, which scales the locking performance as the file system grows. - Quotas: User and group quotas are available for Lustre. + Quotas: User and group quotas are available for a + Lustre file system. - Capacity growth: The size of a Lustre file system and aggregate cluster bandwidth can be increased without interruption by adding a new OSS with OSTs to the cluster. + Capacity growth: The size of a Lustre file system + and aggregate cluster bandwidth can be increased without interruption by adding a new + OSS with OSTs to the cluster. - Controlled striping: The layout of files across OSTs can be configured on a per file, per directory, or per file system basis. This allows file I/O to be tuned to specific application requirements within a single filesystem. Lustre uses RAID-0 striping and balances space usage across OSTs. + Controlled striping: The layout of files across + OSTs can be configured on a per file, per directory, or per file system basis. This + allows file I/O to be tuned to specific application requirements within a single file + system. The Lustre file system uses RAID-0 striping and balances space usage across + OSTs. - Network data integrity protection: A checksum of all data sent from the client to the OSS protects against corruption during data transfer. + Network data integrity protection: A checksum of + all data sent from the client to the OSS protects against corruption during data + transfer. - MPI I/O: Lustre has a dedicated MPI ADIO layer that optimizes parallel I/O to match the underlying file system architecture. + MPI I/O: The Lustre architecture has a dedicated + MPI ADIO layer that optimizes parallel I/O to match the underlying file system + architecture. - NFS and CIFS export: Lustre files can be re-exported using NFS or CIFS (via Samba) enabling them to be shared with non-Linux clients. + NFS and CIFS export: Lustre files can be + re-exported using NFS or CIFS (via Samba) enabling them to be shared with non-Linux + clients. - Disaster recovery tool: Lustre provides a distributed file system check (lfsck) that can restore consistency between storage components in case of a major file system error. Lustre can operate even in the presence of file system inconsistencies, so lfsck is not required before returning the file system to production. + Disaster recovery tool: The Lustre file system + provides a distributed file system check (lfsck) that can restore consistency between + storage components in case of a major file system error. A Lustre file system can + operate even in the presence of file system inconsistencies, so lfsck is not required + before returning the file system to production. - Performance monitoring: Lustre offers a variety of mechanisms to examine performance and tuning. + Performance monitoring: The Lustre file system + offers a variety of mechanisms to examine performance and tuning. - Open source: Lustre is licensed under the GPL 2.0 license for use with Linux. + Open source: The Lustre software is licensed under + the GPL 2.0 license for use with Linux.
- <indexterm><primary>Lustre</primary><secondary>components</secondary></indexterm>Lustre Components - An installation of the Lustre software includes a management server (MGS) and one or more Lustre file systems interconnected with Lustre networking (LNET). - A basic configuration of Lustre components is shown in . + <indexterm> + <primary>Lustre</primary> + <secondary>components</secondary> + </indexterm>Lustre Components + An installation of the Lustre software includes a management server (MGS) and one or more + Lustre file systems interconnected with Lustre networking (LNET). + A basic configuration of Lustre components is shown in .
- Lustre components in a basic cluster + Lustre* components in a basic cluster - Lustre components in a basic cluster + Lustre* components in a basic cluster
- <indexterm><primary>Lustre</primary><secondary>MGS</secondary></indexterm>Management Server (MGS) - The MGS stores configuration information for all the Lustre file systems in a cluster and provides this information to other Lustre components. Each Lustre target contacts the MGS to provide information, and Lustre clients contact the MGS to retrieve information. - It is preferable that the MGS have its own storage space so that it can be managed independently. However, the MGS can be co-located and share storage space with an MDS as shown in . + <indexterm> + <primary>Lustre</primary> + <secondary>MGS</secondary> + </indexterm>Management Server (MGS) + The MGS stores configuration information for all the Lustre file systems in a cluster + and provides this information to other Lustre components. Each Lustre target contacts the + MGS to provide information, and Lustre clients contact the MGS to retrieve + information. + It is preferable that the MGS have its own storage space so that it can be managed + independently. However, the MGS can be co-located and share storage space with an MDS as + shown in .
Lustre File System Components Each Lustre file system consists of the following components: - Metadata Server (MDS) - The MDS makes metadata stored in one or more MDTs available to Lustre clients. Each MDS manages the names and directories in the Lustre file system(s) and provides network request handling for one or more local MDTs. + Metadata Server (MDS) - The MDS makes metadata + stored in one or more MDTs available to Lustre clients. Each MDS manages the names and + directories in the Lustre file system(s) and provides network request handling for one + or more local MDTs. - Metadata Target (MDT ) - For Lustre 2.3 and earlier, each filesystem has one MDT. The MDT stores metadata (such as filenames, directories, permissions and file layout) on storage attached to an MDS. Each file system has one MDT. An MDT on a shared storage target can be available to multiple MDSs, although only one can access it at a time. If an active MDS fails, a standby MDS can serve the MDT and make it available to clients. This is referred to as MDS failover. - Since Lustre 2.4, multiple MDTs are supported. Each filesystem has at least one MDT. An MDT on shared storage target can be available via multiple MDSs, although only one MDS can export the MDT to the clients at one time. Two MDS machines share storage for two or more MDTs. After the failure of one MDS, the remaining MDS begins serving the MDT(s) of the failed MDS. + Metadata Target (MDT ) - For Lustre Release 2.3 and + earlier, each file system has one MDT. The MDT stores metadata (such as filenames, + directories, permissions and file layout) on storage attached to an MDS. Each file + system has one MDT. An MDT on a shared storage target can be available to multiple MDSs, + although only one can access it at a time. If an active MDS fails, a standby MDS can + serve the MDT and make it available to clients. This is referred to as MDS + failover. + Since Lustre Release 2.4, multiple MDTs are supported. Each file + system has at least one MDT. An MDT on a shared storage target can be available via + multiple MDSs, although only one MDS can export the MDT to the clients at one time. Two + MDS machines share storage for two or more MDTs. After the failure of one MDS, the + remaining MDS begins serving the MDT(s) of the failed MDS. - Object Storage Servers (OSS) : The OSS provides file I/O service, and network request handling for one or more local OSTs. Typically, an OSS serves between 2 and 8 OSTs, up to 16 TB each. A typical configuration is an MDT on a dedicated node, two or more OSTs on each OSS node, and a client on each of a large number of compute nodes. + Object Storage Servers (OSS) : The OSS provides + file I/O service and network request handling for one or more local OSTs. Typically, an + OSS serves between two and eight OSTs, up to 16 TB each. A typical configuration is an + MDT on a dedicated node, two or more OSTs on each OSS node, and a client on each of a + large number of compute nodes. - Object Storage Target (OST) : User file data is stored in one or more objects, each object on a separate OST in a Lustre file system. The number of objects per file is configurable by the user and can be tuned to optimize performance for a given workload. + Object Storage Target (OST) : User file data is + stored in one or more objects, each object on a separate OST in a Lustre file system. + The number of objects per file is configurable by the user and can be tuned to optimize + performance for a given workload. - Lustre clients : Lustre clients are computational, visualization or desktop nodes that are running Lustre client software, allowing them to mount the Lustre file system. + Lustre clients : Lustre clients are computational, + visualization or desktop nodes that are running Lustre client software, allowing them to + mount the Lustre file system. - The Lustre client software provides an interface between the Linux virtual file system and the Lustre servers. The client software includes a Management Client (MGC), a Metadata Client (MDC), and multiple Object Storage Clients (OSCs), one corresponding to each OST in the file system. - A logical object volume (LOV) aggregates the OSCs to provide transparent access across all the OSTs. Thus, a client with the Lustre file system mounted sees a single, coherent, synchronized namespace. Several clients can write to different parts of the same file simultaneously, while, at the same time, other clients can read from the file. - provides the requirements for attached storage for each Lustre file system component and describes desirable characteristics of the hardware used. + The Lustre client software provides an interface between the Linux virtual file system + and the Lustre servers. The client software includes a management client (MGC), a metadata + client (MDC), and multiple object storage clients (OSCs), one corresponding to each OST in + the file system. + A logical object volume (LOV) aggregates the OSCs to provide transparent access across + all the OSTs. Thus, a client with the Lustre file system mounted sees a single, coherent, + synchronized namespace. Several clients can write to different parts of the same file + simultaneously, while, at the same time, other clients can read from the file. + provides the requirements for + attached storage for each Lustre file system component and describes desirable + characteristics of the hardware used.
- <indexterm><primary>Lustre</primary><secondary>requirements</secondary></indexterm>Storage and hardware requirements for Lustre components + <indexterm> + <primary>Lustre</primary> + <secondary>requirements</secondary> + </indexterm>Storage and hardware requirements for Lustre* components @@ -294,7 +451,8 @@ - MDSs + + MDSs 1-2% of file system capacity @@ -305,18 +463,21 @@ - OSSs + + OSSs 1-16 TB per OST, 1-8 OSTs per OSS - Good bus bandwidth. Recommended that storage be balanced evenly across OSSs. + Good bus bandwidth. Recommended that storage be balanced evenly across + OSSs. - Clients + + Clients None @@ -328,36 +489,67 @@
- For additional hardware requirements and considerations, see . + For additional hardware requirements and considerations, see .
- <indexterm><primary>Lustre</primary><secondary>LNET</secondary></indexterm>Lustre Networking (LNET) - Lustre Networking (LNET) is a custom networking API that provides the communication infrastructure that handles metadata and file I/O data for the Lustre file system servers and clients. For more information about LNET, see . + <indexterm> + <primary>Lustre</primary> + <secondary>LNET</secondary> + </indexterm>Lustre Networking (LNET) + Lustre Networking (LNET) is a custom networking API that provides the communication + infrastructure that handles metadata and file I/O data for the Lustre file system servers + and clients. For more information about LNET, see .
- <indexterm><primary>Lustre</primary><secondary>cluster</secondary></indexterm>Lustre Cluster - At scale, the Lustre cluster can include hundreds of OSSs and thousands of clients (see ). More than one type of network can be used in a Lustre cluster. Shared storage between OSSs enables failover capability. For more details about OSS failover, see . + <indexterm> + <primary>Lustre</primary> + <secondary>cluster</secondary> + </indexterm>Lustre Cluster + At scale, the Lustre cluster can include hundreds of OSSs and thousands of clients (see + ). More than one type of network can + be used in a Lustre cluster. Shared storage between OSSs enables failover capability. For + more details about OSS failover, see .
- <indexterm><primary>Lustre</primary><secondary>at scale</secondary></indexterm>Lustre cluster at scale + <indexterm> + <primary>Lustre</primary> + <secondary>at scale</secondary> + </indexterm>Lustre* cluster at scale - Lustre clustre at scale + Lustre* clustre at scale
- <indexterm><primary>Lustre</primary><secondary>storage</secondary></indexterm> - <indexterm><primary>Lustre</primary><secondary>I/O</secondary></indexterm> - Lustre Storage and I/O - In a Lustre file system, a file stored on the MDT points to one or more objects associated with a data file, as shown in . Each object contains data and is stored on an OST. If the MDT file points to one object, all the file data is stored in that object. If the file points to more than one object, the file data is 'striped' across the objects (using RAID 0) and each object is stored on a different OST. (For more information about how striping is implemented in Lustre, see ) - In , each filename points to an inode. The inode contains all of the file attributes, such as owner, access permissions, Lustre striping layout, access time, and access control. Multiple filenames may point to the same inode. + <indexterm> + <primary>Lustre</primary> + <secondary>storage</secondary> + </indexterm> + <indexterm> + <primary>Lustre</primary> + <secondary>I/O</secondary> + </indexterm> Lustre Storage and I/O + In a Lustre file system, a file stored on the MDT points to one or more objects associated + with a data file, as shown in . Each object + contains data and is stored on an OST. If the MDT file points to one object, all the file data + is stored in that object. If the file points to more than one object, the file data is + 'striped' across the objects (using RAID 0) and each object is stored on a different + OST. (For more information about how striping is implemented in a Lustre file system, see + ) + In , each filename points to an inode. The + inode contains all of the file attributes, such as owner, access permissions, Lustre striping + layout, access time, and access control. Multiple filenames may point to the same + inode.
- MDT file points to objects on OSTs containing file data + MDT file points to objects on OSTs containing + file data @@ -367,63 +559,110 @@
- When a client opens a file, the fileopen operation transfers the file layout from the MDS to the client. The client then uses this information to perform I/O on the file, directly interacting with the OSS nodes where the objects are stored. This process is illustrated in . + When a client opens a file, the fileopen operation transfers the file + layout from the MDS to the client. The client then uses this information to perform I/O on the + file, directly interacting with the OSS nodes where the objects are stored. This process is + illustrated in .
- File open and file I/O in Lustre + File open and file I/O in Lustre* - File open and file I/O in Lustre + File open and file I/O in Lustre*
- Each file on the MDT contains the layout of the associated data file, including the OST number and object identifier. Clients request the file layout from the MDS and then perform file I/O operations by communicating directly with the OSSs that manage that file data. + Each file on the MDT contains the layout of the associated data file, including the OST + number and object identifier. Clients request the file layout from the MDS and then perform + file I/O operations by communicating directly with the OSSs that manage that file data. The available bandwidth of a Lustre file system is determined as follows: - The network bandwidth equals the aggregated bandwidth of the OSSs to the targets. + The network bandwidth equals the aggregated bandwidth of the OSSs + to the targets. - The disk bandwidth equals the sum of the disk bandwidths of the storage targets (OSTs) up to the limit of the network bandwidth. + The disk bandwidth equals the sum of the disk bandwidths of the + storage targets (OSTs) up to the limit of the network bandwidth. - The aggregate bandwidth equals the minimum of the disk bandwidth and the network bandwidth. + The aggregate bandwidth equals the minimum of the disk bandwidth + and the network bandwidth. - The available file system space equals the sum of the available space of all the OSTs. + The available file system space equals the sum of the available + space of all the OSTs.
- - <indexterm><primary>Lustre</primary><secondary>striping</secondary></indexterm> - <indexterm><primary>striping</primary><secondary>overview</secondary></indexterm> - Lustre File System and Striping - One of the main factors leading to the high performance of Lustre file systems is the ability to stripe data across multiple OSTs in a round-robin fashion. Users can optionally configure for each file the number of stripes, stripe size, and OSTs that are used. - Striping can be used to improve performance when the aggregate bandwidth to a single file exceeds the bandwidth of a single OST. The ability to stripe is also useful when a single OST does not have enough free space to hold an entire file. For more information about benefits and drawbacks of file striping, see . - Striping allows segments or 'chunks' of data in a file to be stored on different OSTs, as shown in . In the Lustre file system, a RAID 0 pattern is used in which data is "striped" across a certain number of objects. The number of objects in a single file is called the stripe_count. - Each object contains a chunk of data from the file. When the chunk of data being written to a particular object exceeds the stripe_size, the next chunk of data in the file is stored on the next object. - Default values for stripe_count and stripe_size are set for the file system. The default value for stripe_count is 1 stripe for file and the default value for stripe_size is 1MB. The user may change these values on a per directory or per file basis. For more details, see . - , the stripe_size for File C is larger than the stripe_size for File A, allowing more data to be stored in a single stripe for File C. The stripe_count for File A is 3, resulting in data striped across three objects, while the stripe_count for File B and File C is 1. - No space is reserved on the OST for unwritten data. File A in . + + <indexterm> + <primary>Lustre</primary> + <secondary>striping</secondary> + </indexterm> + <indexterm> + <primary>striping</primary> + <secondary>overview</secondary> + </indexterm> Lustre File System and Striping + One of the main factors leading to the high performance of Lustre file systems is the + ability to stripe data across multiple OSTs in a round-robin fashion. Users can optionally + configure for each file the number of stripes, stripe size, and OSTs that are used. + Striping can be used to improve performance when the aggregate bandwidth to a single + file exceeds the bandwidth of a single OST. The ability to stripe is also useful when a + single OST does not have enough free space to hold an entire file. For more information + about benefits and drawbacks of file striping, see . + Striping allows segments or 'chunks' of data in a file to be stored on + different OSTs, as shown in . In the + Lustre file system, a RAID 0 pattern is used in which data is "striped" across a + certain number of objects. The number of objects in a single file is called the + stripe_count. + Each object contains a chunk of data from the file. When the chunk of data being written + to a particular object exceeds the stripe_size, the next chunk of data in + the file is stored on the next object. + Default values for stripe_count and stripe_size + are set for the file system. The default value for stripe_count is 1 + stripe for file and the default value for stripe_size is 1MB. The user + may change these values on a per directory or per file basis. For more details, see . + , the stripe_size + for File C is larger than the stripe_size for File A, allowing more data + to be stored in a single stripe for File C. The stripe_count for File A + is 3, resulting in data striped across three objects, while the + stripe_count for File B and File C is 1. + No space is reserved on the OST for unwritten data. File A in .
- File striping on Lustre + File striping on a Lustre* file + system - File striping pattern across three OSTs for three different data files. The file is sparse and missing chunk 6. + File striping pattern across three OSTs for three different data files. The file + is sparse and missing chunk 6.
- The maximum file size is not limited by the size of a single target. Lustre can stripe files across multiple objects (up to 2000), and each object can be up to 16 TB in size with ldiskfs. This leads to a maximum file size of 31.25 PB. (Note that Lustre itself can support files up to 2^64 bytes depending on the backing storage used by OSTs.) + The maximum file size is not limited by the size of a single target. In a Lustre file + system, files can be striped across multiple objects (up to 2000), and each object can be + up to 16 TB in size with ldiskfs. This leads to a maximum file size of 31.25 PB. (Note that + a Lustre file system can support files up to 2^64 bytes depending on the backing storage + used by OSTs.) - Versions of Lustre prior to 2.2 had a maximum stripe count for a single file was limited to 160 OSTs. + Versions of the Lustre software prior to Release 2.2 limited the maximum stripe count + for a single file to 160 OSTs. - Although a single file can only be striped over 2000 objects, Lustre file systems can have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to utilize the full file system bandwidth. - For more information about striping, see . + Although a single file can only be striped over 2000 objects, Lustre file systems can + have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O + bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000 + servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to + utilize the full file system bandwidth. + For more information about striping, see .
diff --git a/index.xml b/index.xml index 66a1818..73e4b41 100644 --- a/index.xml +++ b/index.xml @@ -1,7 +1,7 @@ - Lustre 2.x Filesystem + Lustre File System 2.x Operations Manual 2010 -- 1.8.3.1