X-Git-Url: https://git.whamcloud.com/?a=blobdiff_plain;f=file_system_operations.txt;h=f4a62a9c41f0d5671daaa117d1fa61557915fb16;hb=a35c4d03a85eaab5762e11b54ed982bb9f74e5cb;hp=d70f6a5869a7f5c985b4b5596f80bca96e5cad4e;hpb=f7539c5d000a6cce00dd20959e071892011561dc;p=doc%2Fprotocol.git diff --git a/file_system_operations.txt b/file_system_operations.txt index d70f6a5..f4a62a9 100644 --- a/file_system_operations.txt +++ b/file_system_operations.txt @@ -4,281 +4,26 @@ File System Operations [[file-system-operations]] Lustre is a POSIX compliant file system that provides namespace and -data storage services to clients. It implements all the usual file -system functionality including creating, writing, reading, and +data storage services to clients. It implements the normal file system +functionality including creating, writing, reading, and removing files and directories. These file system operations are -implemented via <>, which carry +implemented via <>, which carry out communication and coordination with the servers. In this section we present the sequence of Lustre Operations, along with their effects, of a variety of file system operations. -Mount -~~~~~ +include::mount.txt[] -Before any other interaction can take place between a client and a -Lustre file system the client must 'mount' the file system, and Lustre -services must already be in place (on the servers). A file system -mount may be initiated at the Linux shell command line, which in turn -invokes the 'mount()' system call. Kernel modules for Lustre -exchange a series of messages with the servers, beginning with -messages that retrieve details about the file system from the -management server (MGS). This provides the client with the identities -of all the metadata servers (MDSs) and targets (MDTs) as well as all -the object storage servers (OSSs) and targets (OSTs). The client then -sequences through each of the targets exchanging additional messages -to initiate the connections with them. The following sections present -the details of the Lustre operations that accomplish the file system -mount. +include::umount.txt[] -Messages Between the Client and the MGS -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +include::create.txt[] -In order to be able to mount the Lustre file system the client needs -to know the identities of the various servers and targets so that it -can initiate connections to them. The following sequence of operations -accomplishes this. +include::getattr.txt[] ----- -MGS_CONNECT -LDLM_ENQUEUE (concurrent read) -LLOG_ORIGIN_HANDLE_CREATE (filename: lfs-sptlrpc) -LDLM_ENQUEUE (concurrent read) -LLOG_ORIGIN_HANDLE_CREATE (filename: lfs-client) -LLOG_ORIGIN_HANDLE_READ_HEADER -LLOG_ORIGIN_HANDLE_NEXT_BLOCK -LDLM_ENQUEUE (concurrent read) -MGS_CONFIG_READ (name: lfs-cliir) -LDLM_ENQUEUE (concurrent read) -LLOG_ORIGIN_HANDLE_CREATE (filename: params) -LLOG_ORIGIN_HANDLE_READ_HEADER ----- - -Prior to any other interaction between a client and a Lustre server -(or between two servers) the client must establish a 'connection'. The -connection establishes shared state between the two hosts. On the -client this connection state information is called an 'import', and -there is an import on the client for each target it connects to. On -the server this connection state is referred to as an 'export', and -again the server has an export for each client that has connected to -it. There a separate export for each client for each target. - -The client begins by carrying out the MGS_CONNECT Lustre operation, -which establishes the connection (creates the import and the export) -between the client and the MGS. The connect message from the client -includes a 'handle' to uniquely identify itself (subsequent messages -to the LDLM will refer to that client-handle). The connection data -from the client also proposes the set of <> appropriate to connecting to an MGS. - -.Flags for the client connection to an MGS -[options="header"] -|==== -| obd_connect_data->ocd_connect_flags -| OBD_CONNECT_VERSION -| OBD_CONNECT_AT -| OBD_CONNECT_FULL20 -| OBD_CONNECT_IMP_RECOV -| OBD_CONNECT_MNE_SWAB -| OBD_CONNECT_PINGLESS -|==== - -The MGS's reply to the connection request will include the handle that -the server and client will both use to identify this connection in -subsequent messages. This is the 'connection-handle' (as opposed to -the client-handle mentioned a moment ago). The MGS also replies with -the same set of connection flags. - -Once the connection is established the client gets configuration -information for the file system from the MGS in four stages. First, -the two exchange messages establishing the file system wide security -policy that will be followed in all subsequent communications. Second, -the client gets a bitmap instructing it as to which among the -configuration records on the MGS it needs. Third, reading those -records from the MGS gives the client the list of all the servers and -targets it will need to communicate with. Fourth, the client reads -cluster wide configuration data (the sort that might be set at the -client command line with a 'lctl conf_param' command). The following -paragraphs go into these four stages in more detail. - -Each time the client is going to read information from server storage -it needs to first acquire the appropriate lock. Since the client is -only reading data, the locks will be 'concurrent read' locks. The -LDLM_ENQUEUE command communicates this lock request to the MGS -target. The request identifies the target via the connection-handle -from the connection reply, and identifies the client (itself) with the -client-handle from its original connection request. The MGS's reply -grants that lock, if appropriate. If other clients were making some -sort of modification to the MGS data then the lock exchange might -result in a delay while the client waits. More details about the -behavior of the <> are in that -section. For now, let's assume the locks are granted for each of these -four operations. The first LLOG_ORIGIN_HANDLE_CREATE operation (the -client is creating its own local handle not the target's file) asks -for the security configuration file ("lfs-sptlrpc"). <> -discusses security, and for now let's assume there is nothing to be -done for security. That is, subsequent messages will all use an "empty -security flavor" and no encryption will take place. In this case the -MGS's reply ('pb_status' == -2, ENOENT) indicated that there was no -such file, so nothing actually gets read. - -Another LDLM_ENQUEUE and LLOG_ORIGIN_HANDLE_CREATE pair of operations -identifies the configuration client data ("lfs-client") file, and in -this case there is data to read. The LLOG_ORIGIN_HANDLE_CREATE reply -identifies the actual object of interest on the MGS via the -'llog_logid' field in the 'struct llogd_body'. The MGS stores -configuration data in log records. A header at the beginning of -"lfs-client" uses a bitmap to identify the log records that are -actually needed. The header includes both which records to retrieve -and how large those records are. The LLOG_ORIGIN_HANDLE_READ_HEADER -request uses the 'llog_logid' to identify desired log file, and the -reply provides the bitmap and size information identifying the -records that are actually needed. The -LLOG_ORIGIN_HANDLE_NEXT_BLOCK operations retrieves the data thus -identified. - -Knowing the specific configuration records it wants, the client then -proceeds to retrieve them. This requires another LDLM_ENQUEUE -operation, followed this time by the MGS_CONFIG_READ operation, which -get the UUIDs for the servers and targets from the configuration log -("lfs-cliir"). - -A final LDLM_ENQUEUE, LLOG_ORIGIN_HANDLE_CREATE, and -LLOG_ORIGIN_HANDLE_READ_HEADER then retrieve the cluster wide -configuration data ("params"). - -Messages Between the Client and the MDSs -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -After the foregoing interaction with the MGS the client has a list of -the MDSs and MDTs in the file system. Next, the client invokes four -Lustre operations with each MDT on the list. - ----- -MDS_CONNECT -MDS_STATFS -MDS_GETSTATUS -MDS_GETATTR ----- - -The MDS_CONNECT operation establishes a connection between the client -and a specific target (MDT) on an MDS. Thus, if an MDS has multiple -targets, there is a separate MDS_CONNECT operation for each. This -creates an import for the target on the client and an export for the -client and target on the MDS. As with the connect operation for the -MGS, the connect message from the client includes a UUID to uniquely -identify this connection, and subsequent messages to the lock manager -on the server will refer to that UUID. The connection data from the -client also proposes the set of <> -appropriate to connecting to an MDS. The following are the flags -always included. - -.Always included flags for the client connection to an MDS -[options="header"] -|==== -| obd_connect_data->ocd_connect_flags -| OBD_CONNECT_RDONLY -| OBD_CONNECT_VERSION -| OBD_CONNECT_ACL -| OBD_CONNECT_XATTR -| OBD_CONNECT_IBITS -| OBD_CONNECT_NODEVOH -| OBD_CONNECT_ATTRFID -| OBD_CONNECT_CANCELSET -| OBD_CONNECT_AT -| OBD_CONNECT_RMT_CLIENT -| OBD_CONNECT_RMT_CLIENT_FORCE -| OBD_CONNECT_BRW_SIZE -| OBD_CONNECT_MDS_CAPA -| OBD_CONNECT_OSS_CAPA -| OBD_CONNECT_MDS_MDS -| OBD_CONNECT_FID -| LRU_RESIZE_CONNECT_FLAG -| OBD_CONNECT_VBR -| OBD_CONNECT_LOV_V3 -| OBD_CONNECT_SOM -| OBD_CONNECT_FULL20 -| OBD_CONNECT_64BITHASH -| OBD_CONNECT_JOBSTATS -| OBD_CONNECT_EINPROGRESS -| OBD_CONNECT_LIGHTWEIGHT -| OBD_CONNECT_UMASK -| OBD_CONNECT_LVB_TYPE -| OBD_CONNECT_LAYOUTLOCK -| OBD_CONNECT_PINGLESS -| OBD_CONNECT_MAX_EASIZE -| OBD_CONNECT_FLOCK_DEAD -| OBD_CONNECT_DISP_STRIPE -| OBD_CONNECT_LFSCK -| OBD_CONNECT_OPEN_BY_FID -| OBD_CONNECT_DIR_STRIPE -|==== - -.Optional flags for the client connection to an MDS -[options="header"] -|==== -| obd_connect_data->ocd_connect_flags -| OBD_CONNECT_SOM -| OBD_CONNECT_LRU_RESIZE -| OBD_CONNECT_ACL -| OBD_CONNECT_UMASK -| OBD_CONNECT_RDONLY -| OBD_CONNECT_XATTR -| OBD_CONNECT_XATTR -| OBD_CONNECT_RMT_CLIENT_FORCE -|==== - -The MDS replies to the connect message with a subset of the flags -proposed by the client, and the client notes those values in its -import. The MDS's reply to the connection request will include a UUID -that the server and client will both use to identify this connection -in subsequent messages. - -The client next uses an MDS_STATFS operation to request 'statfs' -information from the target, and that data is returned in the reply -message. The actual fields closely resemble the results of a 'statfs' -system call. See the 'obd_statfs' structure in the <>. - -The client uses the MDS_GETSTATUS operation to request information -about the mount point of the file system. fixme: Does MDS_GETSTATUS -only ask about the root (so it would seem)? The server reply contains -the 'fid' of the root directory of the file system being mounted. If -there is a security policy the capabilities of that security policy -are included in the reply. - -The client then uses the MDS_GETATTR operation to get get further -information about the root directory of the file system. The request -message includes the above fid. It will also include the security -capability (if appropriate). The reply also holds the same fid, and in -this case the 'mdt_body' has several additional fields filled -in. These include the mtime, atime, ctime, mode, uid, and gid. It also -includes the size of the extended attributes and the size of the ACL -information. The reply message also includes the extended attributes -and the ACL. From the extended attributes the client can find out -about striping information for the root, if any. - -Messages Between the Client and the OSSs -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Additional CONNECT messages flow between the client and each OST -enumerated by the MGS. - ----- -OST_CONNECT ----- - -Unmount -~~~~~~~ - ----- -OST_DISCONNECT -MDS_DISCONNECT -MGS_DISCONNECT ----- +include::setattr.txt[] -Create -~~~~~~ +include::statfs.txt[] -Further discussion of the 'creat()' system call. +include::getxattr.txt[] -include::setattr.txt[] +include::setxattr.txt[]