Data Structures and Defines ~~~~~~~~~~~~~~~~~~~~~~~~~~~ [[data-structs]] The following data types are used in the Lustre protocol description. .Basic Data Types [options="header"] |===== | data types | size | __u8 | an 8-bit unsigned integer | __u16 | a 16-bit unsigned integer | __u32 | a 32-bit unsigned integer | __u64 | a 64-bit unsigned integer | __s64 | a 64-bit signed integer | obd_time | an __s64 |===== The following topics introduce the various kinds of data that are represented and manipulated in Lustre messages and representations of the shared state on clients and servers. Grant ^^^^^ [[grant]] A grant value is part of a client's state for a given target. It provides an upper bound to the amount of dirty cache data the client will allow that is destined for the target. The value is established by agreement between the server and the client and represents a guarantee by the server that the target storage has enough free space for at least the amount of granted dirty data. The client can ask for additional grant with each write RPC, which the server may provide depending on how much available (ungranted and unallocated) space the target has. LOV Index ^^^^^^^^^ [[lov-index]] Each target is assigned an LOV index (by the 'mkfs.lustre' command line) as the target is added to the file system. This value is stored by the target locally as well as on the MGS in order to serve as a unique identifier in the file system. Transaction Number ^^^^^^^^^^^^^^^^^^ [[transno]] For each target there is a sequence of values (a strictly increasing series of numbers) where each operation that can modify the file system is assigned the next number in the series. This is the transaction number, and it imposes a strict serial ordering for all file system modifying operations. For file system modifying requests the server assigns the next value in the sequence and informs the client of the value in the 'pb_transno' field of the 'ptlrpc_body' of its reply to the client's request. For replys to requests that do not modify the file system the 'pb_transno' field in the 'ptlrpc_body' is just set to 0. Extended Attributes ^^^^^^^^^^^^^^^^^^^ I have not figured out how so called 'eadata' buffers are handled, yet. I am told that this is not just for extended attributes, but is a generic structure. Also, see <>. MGS Configuration Data ^^^^^^^^^^^^^^^^^^^^^^ ---- #define MTI_NAME_MAXLEN 64 struct mgs_config_body { char mcb_name[MTI_NAME_MAXLEN]; /* logname */ __u64 mcb_offset; /* next index of config log to request */ __u16 mcb_type; /* type of log: CONFIG_T_[CONFIG|RECOVER] */ __u8 mcb_reserved; __u8 mcb_bits; /* bits unit size of config log */ __u32 mcb_units; /* # of units for bulk transfer */ }; ---- The 'mgs_config_body' structure has information identifying to the MGS which Lustre file system the client is requesting configuration information from. 'mcb_name' contains the filesystem name (fsname). 'mcb_offset' contains the next record number in the configuration llog to process (see <> for details), not the byte offset or bulk transfer units. 'mcb_bits' is the log2 of the units of minimum bulk transfer size, typically 4096 or 8192 bytes, while 'mcb_units' is the maximum number of 2^mcb_bits sized units that can be transferred in a single request. ---- struct mgs_config_res { __u64 mcr_offset; /* index of last config log */ __u64 mcr_size; /* size of the log */ }; ---- The 'mgs_config_res' structure returns information describing the replied configuration llog data requested in 'mgs_config_body'. 'mcr_offset' contains the last configuration record number returned by this reply. 'mcr_size' contains the maximum record index in the entire configuration llog. When 'mcr_offset' equals 'mcr_size' there are no more records to process in the log. include::lustre_handle.txt[] Lustre Message Header ^^^^^^^^^^^^^^^^^^^^^ [[struct-lustre-msg]] Every message has an initial header that informs the receiver about the number of buffers and their size for the rest of the message to follow, along with other important information about the request or reply message. ---- #define LUSTRE_MSG_MAGIC_V2 0x0BD00BD3 #define MSGHDR_AT_SUPPORT 0x1 struct lustre_msg_v2 { __u32 lm_bufcount; __u32 lm_secflvr; __u32 lm_magic; __u32 lm_repsize; __u32 lm_cksum; __u32 lm_flags; __u32 lm_padding_2; __u32 lm_padding_3; __u32 lm_buflens[0]; }; #define lustre_msg lustre_msg_v2 ---- The 'lm_bufcount' field holds the number of buffers that will follow the header. The header and sequence of buffers constitutes one message. Each of the buffers is a sequence of bytes whose contents corresponds to one of the structures described in this section. Each message will always have at least one buffer, and no message can have more than thirty-one buffers. The 'lm_secflvr' field gives an indication of whether any sort of cyptographic encoding of the subsequent buffers will be in force. The value is zero if there is no "crypto" and gives a code identifying the "flavor" of crypto if it is employed. Further, if crypto is employed there will only be one buffer following (i.e. 'lm_bufcount' = 1), and that buffer holds an encoding of what would otherwise have been the sequence of buffers normally following the header. Cryptography will be discussed in a separate chapter. The 'lm_magic' field is a "magic" value (LUSTRE_MSG_MAGIC_V2 = 0x0BD00BD3, 'OBD' for 'object based device') that is checked in order to positively identify that the message is intended for the use to which it is being put. That is, we are indeed dealing with a Lustre message, and not, for example, corrupted memory or a bad pointer. The 'lm_repsize' field in a request indicates the maximum available space that has been reserved for any reply to the request. A reply that attempts to use more than the reserved space will be discarded. The 'lm_cksum' field contains a checksum of the 'ptlrpc_body' buffer to allow the receiver to verify that the message is intact. This is used to verify that an 'early reply' has not been overwritten by the actual reply message. If the 'MSGHDR_CKSUM_INCOMPAT18' flag is set in requests since Lustre 1.8 (the server will send early reply messages with the appropriate 'lm_cksum' if it understands the flag and is mandatory in Lustre 2.8 and later. The 'lm_flags' field contains flags that affect the low-level RPC protocol. The 'MSGHDR_AT_SUPPORT' (0x1) bit indicates that the sender understands adaptive timeouts and can receive 'early reply' messages to extend its waiting period rather than timing out. This flag was introduced in Lustre 1.6. The 'MSGHDR_CKSUM_INCOMPAT18' (0x2) bit indicates that 'lm_cksum' is computed on the full 'ptlrpc_body' message buffer rather than on the original 'ptlrpc_body_v2' structure size (88 bytes). It was introduced in Lustre 1.8 and is mandatory for all requests in Lustre 2.8 and later. The 'lm_padding*' fields are reserved for future use. The array of 'lm_buflens' values has 'lm_bufcount' entries. Each entry corresponds to, and gives the length in bytes of, one of the buffers that will follow. The entire header, and each of the buffers, is required to be a multiple of eight bytes long to ensure the buffers are properly aligned to hold __u64 values. Thus there may be an extra four bytes of padding after the 'lm_buflens' array if that array has an odd number of entries. include::ptlrpc_body.txt[] Object Based Disk UUID ^^^^^^^^^^^^^^^^^^^^^^ [[struct-obd-uuid]] ---- #define UUID_MAX 40 struct obd_uuid { char uuid[UUID_MAX]; }; ---- The 'ost_uuid' contains an ASCII-formatted string that identifies the entity uniquely within the filesystem. Clients use an RFC-4122 hexadecimal UUID of the form ''de305d54-75b4-431b-adb2-eb6b9e546014'' that is randomly generated. Servers may use a string-based identifier of the form ''fsname-TGTindx_UUID''. File IDentifier (FID) ^^^^^^^^^^^^^^^^^^^^^ See <>. OST ID ^^^^^^ [[struct-ost-id]] The 'ost_id' identifies a single object on a particular OST. ---- struct ost_id { union { struct ostid { __u64 oi_id; __u64 oi_seq; } oi; struct lu_fid oi_fid; }; }; ---- The 'ost_id' structure contains an identifier for a single OST object. The 'oi' structure holds the OST object identifier as used with Lustre 1.8 and earlier, where the 'oi_seq' field is typically zero, and the 'oi_id' field is an integer identifying an object on a particular OST (which is identified separately). Since Lustre 2.5 it is possible for OST objects to also be identified with a unique FID that identifies both the OST on which it resides as well as the object identifier itself.