From 083ac0ef937906fdf8b908b9f3d2e5cddc1919b3 Mon Sep 17 00:00:00 2001 From: "Andrew C. Uselton" Date: Fri, 27 Feb 2015 17:12:31 -0800 Subject: [PATCH] LUDOC-270 protocol: Add a 'basement' dir with support files The basement has been added as a place to put content that may be interesting to the community, but is nevertheless not directly a part of the documentation for this project. It has, for instance, a catalog of structures and other definitions that relate to how Lustre messages are formed. Those files (see the README) make a direct reference to the Lustre sources, and were the raw material for the content that will be added shortly. Signed-off-by: Andrew C. Uselton Change-Id: I3f264a0f7cd0a5c50511b29af92ee9103b348d80 Reviewed-on: http://review.whamcloud.com/13919 Tested-by: Jenkins Reviewed-by: Andreas Dilger --- basement/README | 31 ++ basement/message_formats.txt | 567 +++++++++++++++++++ basement/old_outline.txt | 199 +++++++ basement/ptlrpc.txt | 167 ++++++ basement/request_message_pairs.txt | 511 ++++++++++++++++++ basement/struct_defs.txt | 797 +++++++++++++++++++++++++++ basement/structures.txt | 104 ++++ basement/structures_list.txt | 1049 ++++++++++++++++++++++++++++++++++++ 8 files changed, 3425 insertions(+) create mode 100644 basement/README create mode 100644 basement/message_formats.txt create mode 100644 basement/old_outline.txt create mode 100644 basement/ptlrpc.txt create mode 100644 basement/request_message_pairs.txt create mode 100644 basement/struct_defs.txt create mode 100644 basement/structures.txt create mode 100644 basement/structures_list.txt diff --git a/basement/README b/basement/README new file mode 100644 index 0000000..3a7f29d --- /dev/null +++ b/basement/README @@ -0,0 +1,31 @@ +ptlrpc.txt is a beginning effort to document the precise structure of +PtlRPC messages including all fields and their meanings. For now it +has some general discussion and details about the message header and +the ptlrpc_body_v3 structures. It will eventually include all the +structures. + +request_message_pairs.txt lists the 94 named message pairs and the +names of the message format for each of the two messages in the +pair. The first in the pair is a request to be sent in a request-reply +communication model. The second is the reply appropriate to the +request. + +message_formats.txt lists the 95 named message formats along with the +list of structures that constitue each. There is also an initial +discussion of what the message formats are and how they relate to +PtlRPC request message pairs. + +structures.txt lists the 69 symbols (RMF_*) used to denote the +sequence of structures in the defnition of the message formats. The +file just associates the symbolic name used in the message format +defnition with the actual 'struct ' that defines its sequence of +fields. In some cases there is no struct associated with the RMF_ +symbol. In those cases the associated bytes, if any, will have to have +their sematics explained directly. + +struct_defs.txt details the fields in each of the structures named in +the structures.txt list along with the sematics for each field. The +special structure for the message header is included here. + + + diff --git a/basement/message_formats.txt b/basement/message_formats.txt new file mode 100644 index 0000000..9413b01 --- /dev/null +++ b/basement/message_formats.txt @@ -0,0 +1,567 @@ +Each message format appears in one or more of the request message +pairs (cf. request_message_pairs.txt). The symbol for the format +connects the format to its use in the request message pairs and is not +actually used in any kind of output or reports. That is, it is a +symbol in the source and not a string anywhere. + +For each format there is a sequnce of symbols starting with "RMF_", +and each of those symbols, in turn, refers to a structure of the type +"struct req_msg_field". Do not be mislead by the word "field" in the +name. It is a "struct" with a collection of fields, in C source code +terms. It is misleadingly called a "field" simply because it is a +subsection within the definition of the message format. + +Every format begins with the structure "RMF_PTLRPC_BODY". That +structure gives additional details that will assist the receiver with +decoding the PtlRPC message. This includes, especially, the pb_opc +field for the op code corresponding to the operation being +requested. Thus the message format specifies a sequence of structures +whose fields together define a sequence of bytes. Together with a +header message (and some optional padding), this sequence of bytes +constitutes one PtlRPC message. + +There are 95 message formats, and between them they employ a total of +66 different sturctures. See structures.txt for an alphabetic list +that associates each messgage format symbol (RMF_*) with the structure +definition (if any) that it corresponds to. Note athat there is an +imperfect mapping from the symbol used for a structure in a message +format and the actual structure definition in the source code. A +series of definitions in layout.c makes the connection. The details of +all the structures along with the format for the message header are +presented in struct_defs.txt. + +Note especially that when a named message pair calls for a message +format that is "empty", that does not mean that no request is sent or +no reply expected. The "empty" format consists of an RMF_PTLRPC_BODY +(together with the header) and nothing else. + +empty + RMF_PTLRPC_BODY + +fld_query_client + RMF_PTLRPC_BODY + RMF_FLD_OPC + RMF_FLD_MDFLD + +fld_query_server + RMF_PTLRPC_BODY + RMF_FLD_MDFLD + +fld_read_client + RMF_PTLRPC_BODY + RMF_FLD_MDFLD + +fld_read_server + RMF_PTLRPC_BODY + RMF_GENERIC_DATA + +ldlm_cp_callback_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_DLM_LVB + +ldlm_enqueue_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + +ldlm_enqueue_lvb_server + RMF_PTLRPC_BODY + RMF_DLM_REP + RMF_DLM_LVB + +ldlm_enqueue_server + RMF_PTLRPC_BODY + RMF_DLM_REP + +ldlm_gl_callback_desc_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_DLM_GL_DESC + +ldlm_gl_callback_server + RMF_PTLRPC_BODY + RMF_DLM_LVB + +ldlm_intent_basic_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + +ldlm_intent_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_REC_REINT + +ldlm_intent_create_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + RMF_EADATA + +ldlm_intent_getattr_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_MDT_BODY + RMF_CAPA1 + RMF_NAME + +ldlm_intent_getattr_server + RMF_PTLRPC_BODY + RMF_DLM_REP + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + RMF_CAPA1 + +ldlm_intent_getxattr_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_MDT_BODY + RMF_CAPA1 + +ldlm_intent_getxattr_server + RMF_PTLRPC_BODY + RMF_DLM_REP + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + RMF_EADATA + RMF_EAVALS + RMF_EAVALS_LENS + +ldlm_intent_layout_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_LAYOUT_INTENT + RMF_EADATA + +ldlm_intent_open_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_REC_REINT + RMF_CAPA1 + RMF_CAPA2 + RMF_NAME + RMF_EADATA + +ldlm_intent_open_server + RMF_PTLRPC_BODY + RMF_DLM_REP + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + RMF_CAPA1 + RMF_CAPA2 + +ldlm_intent_quota_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_QUOTA_BODY + +ldlm_intent_quota_server + RMF_PTLRPC_BODY + RMF_DLM_REP + RMF_DLM_LVB + RMF_QUOTA_BODY + +ldlm_intent_server + RMF_PTLRPC_BODY + RMF_DLM_REP + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + +ldlm_intent_unlink_client + RMF_PTLRPC_BODY + RMF_DLM_REQ + RMF_LDLM_INTENT + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + +llog_log_hdr_only + RMF_PTLRPC_BODY + RMF_LLOG_LOG_HDR + +llog_origin_handle_create_client + RMF_PTLRPC_BODY + RMF_LLOGD_BODY + RMF_NAME + +llog_origin_handle_next_block_server + RMF_PTLRPC_BODY + RMF_LLOGD_BODY + RMF_EADATA + +llogd_body_only + RMF_PTLRPC_BODY + RMF_LLOGD_BODY + +llogd_conn_body_only + RMF_PTLRPC_BODY + RMF_LLOGD_CONN_BODY + +log_cancel_client + RMF_PTLRPC_BODY + RMF_LOGCOOKIES + +mds_getattr_name_client + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_CAPA1 + RMF_NAME + +mds_getattr_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + RMF_CAPA1 + RMF_CAPA2 + +mds_getinfo_client + RMF_PTLRPC_BODY + RMF_GETINFO_KEY + RMF_GETINFO_VALLEN + +mds_getinfo_server + RMF_PTLRPC_BODY + RMF_GETINFO_VAL + +mds_getxattr_client + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_CAPA1 + RMF_NAME + RMF_EADATA + +mds_getxattr_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_EADATA + +mds_last_unlink_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDT_MD + RMF_LOGCOOKIES + RMF_CAPA1 + RMF_CAPA2 + +mds_reint_client + RMF_PTLRPC_BODY + RMF_REC_REINT + +mds_reint_create_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + +mds_reint_create_rmt_acl_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + RMF_EADATA + RMF_DLM_REQ + +mds_reint_create_slave_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + RMF_EADATA + RMF_DLM_REQ + +mds_reint_create_sym_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + RMF_SYMTGT + RMF_DLM_REQ + +mds_reint_link_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_CAPA2 + RMF_NAME + RMF_DLM_REQ + +mds_reint_open_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_CAPA2 + RMF_NAME + RMF_EADATA + +mds_reint_open_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + RMF_CAPA1 + RMF_CAPA2 + +mds_reint_rename_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_CAPA2 + RMF_NAME + RMF_SYMTGT + RMF_DLM_REQ + +mds_reint_setattr_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_MDT_EPOCH + RMF_EADATA + RMF_LOGCOOKIES + RMF_DLM_REQ + +mds_reint_setxattr_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + RMF_EADATA + RMF_DLM_REQ + +mds_reint_unlink_client + RMF_PTLRPC_BODY + RMF_REC_REINT + RMF_CAPA1 + RMF_NAME + RMF_DLM_REQ + +mds_setattr_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDT_MD + RMF_ACL + RMF_CAPA1 + RMF_CAPA2 + +mds_update_client + RMF_PTLRPC_BODY + RMF_OUT_UPDATE + +mds_update_server + RMF_PTLRPC_BODY + RMF_OUT_UPDATE_REPLY + +mdt_body_capa + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_CAPA1 + +mdt_body_only + RMF_PTLRPC_BODY + RMF_MDT_BODY + +mdt_close_client + RMF_PTLRPC_BODY + RMF_MDT_EPOCH + RMF_REC_REINT + RMF_CAPA1 + +mdt_hsm_action_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDS_HSM_CURRENT_ACTION + +mdt_hsm_ct_register + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDS_HSM_ARCHIVE + +mdt_hsm_ct_unregister + RMF_PTLRPC_BODY + RMF_MDT_BODY + +mdt_hsm_progress + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDS_HSM_PROGRESS + +mdt_hsm_request + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_MDS_HSM_REQUEST + RMF_MDS_HSM_USER_ITEM + RMF_GENERIC_DATA + +mdt_hsm_state_get_server + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_HSM_USER_STATE + +mdt_hsm_state_set + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_CAPA1 + RMF_HSM_STATE_SET + +mdt_release_close_client + RMF_PTLRPC_BODY + RMF_MDT_EPOCH + RMF_REC_REINT + RMF_CAPA1 + RMF_CLOSE_DATA + +mdt_swap_layouts + RMF_PTLRPC_BODY + RMF_MDT_BODY + RMF_SWAP_LAYOUTS + RMF_CAPA1 + RMF_CAPA2 + RMF_DLM_REQ + +mgs_config_read_client + RMF_PTLRPC_BODY + RMF_MGS_CONFIG_BODY + +mgs_config_read_server + RMF_PTLRPC_BODY + RMF_MGS_CONFIG_RES + +mgs_set_info + RMF_PTLRPC_BODY + RMF_MGS_SEND_PARAM + +mgs_target_info_only + RMF_PTLRPC_BODY + RMF_MGS_TARGET_INFO + +obd_connect_client + RMF_PTLRPC_BODY + RMF_TGTUUID + RMF_CLUUID + RMF_CONN + RMF_CONNECT_DATA + +obd_connect_server + RMF_PTLRPC_BODY + RMF_CONNECT_DATA + +obd_idx_read_client + RMF_PTLRPC_BODY + RMF_IDX_INFO + +obd_idx_read_server + RMF_PTLRPC_BODY + RMF_IDX_INFO + +obd_lfsck_reply + RMF_PTLRPC_BODY + RMF_LFSCK_REPLY + +obd_lfsck_request + RMF_PTLRPC_BODY + RMF_LFSCK_REQUEST + +obd_set_info_client + RMF_PTLRPC_BODY + RMF_SETINFO_KEY + RMF_SETINFO_VAL + +obd_statfs_server + RMF_PTLRPC_BODY + RMF_OBD_STATFS + +ost_body_capa + RMF_PTLRPC_BODY + RMF_OST_BODY + RMF_CAPA1 + +ost_body_only + RMF_PTLRPC_BODY + RMF_OST_BODY + +ost_brw_client + RMF_PTLRPC_BODY + RMF_OST_BODY + RMF_OBD_IOOBJ + RMF_NIOBUF_REMOTE + RMF_CAPA1 + +ost_brw_read_server + RMF_PTLRPC_BODY + RMF_OST_BODY + +ost_brw_write_server + RMF_PTLRPC_BODY + RMF_OST_BODY + RMF_RCS + +ost_destroy_client + RMF_PTLRPC_BODY + RMF_OST_BODY + RMF_DLM_REQ + RMF_CAPA1 + +ost_get_fiemap_client + RMF_PTLRPC_BODY + RMF_FIEMAP_KEY + RMF_FIEMAP_VAL + +ost_get_fiemap_server + RMF_PTLRPC_BODY + RMF_FIEMAP_VAL + +ost_get_info_generic_client + RMF_PTLRPC_BODY + RMF_GETINFO_KEY + +ost_get_info_generic_server + RMF_PTLRPC_BODY + RMF_GENERIC_DATA + +ost_get_last_fid_client + RMF_PTLRPC_BODY + RMF_GETINFO_KEY + RMF_FID + +ost_get_last_fid_server + RMF_PTLRPC_BODY + RMF_FID + +ost_get_last_id_server + RMF_PTLRPC_BODY + RMF_OBD_ID + +ost_grant_shrink_client + RMF_PTLRPC_BODY + RMF_SETINFO_KEY + RMF_OST_BODY + +quota_body_only + RMF_PTLRPC_BODY + RMF_QUOTA_BODY + +quotactl_only + RMF_PTLRPC_BODY + RMF_OBD_QUOTACTL + +seq_query_client + RMF_PTLRPC_BODY + RMF_SEQ_OPC + RMF_SEQ_RANGE + +seq_query_server + RMF_PTLRPC_BODY + RMF_SEQ_RANGE diff --git a/basement/old_outline.txt b/basement/old_outline.txt new file mode 100644 index 0000000..7a754b3 --- /dev/null +++ b/basement/old_outline.txt @@ -0,0 +1,199 @@ +Old Content +----------- + +[NOTE] +This initial list combines some actual message names or types with the +POSIX semantic operations they are being used to implement, as well as +a few other underlying mechanisms (cf. "grant"). A subsequent +refinement will separate the various items and relate them to one +another. + +Client-MDS RPCs for POSIX namespace operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +'Content to be provided' + +=== mount === + +'Content to be provided' + +=== unmount === + +'Content to be provided' + +=== create === + +'Content to be provided' + +=== open === + +'Content to be provided' + +=== close === + +'Content to be provided' + +=== unlink === + +'Content to be provided' + +=== mkdir === + +image:mkdir1.png[mkdir] + +=== rmdir === + +'Content to be provided' + +=== rename === + +'Content to be provided' + +=== link === + +'Content to be provided' + +=== symlink === + +'Content to be provided' + +=== getattr === + +'Content to be provided' + +=== setattr === + +'Content to be provided' + +=== statfs === + +'Content to be provided' + +=== ... === + +'Content to be provided' + + +Client-MDS RPCs for internal state management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +'Content to be provided' + +=== connect === + +'Content to be provided' + +=== disconnect === + +'Content to be provided' + +=== FLD === + +'Content to be provided' + +=== SEQ === + +'Content to be provided' + +=== PING === + +'Content to be provided' + +=== LDLM === + +'Content to be provided' + +=== ... === + +'Content to be provided' + +Client-OSS RPCs for IO Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +'Content to be provided' + +=== read === + +'Content to be provided' + +=== write === + +'Content to be provided' + +=== truncate === + +'Content to be provided' + +=== setattr === + +'Content to be provided' + +=== grant === + +'Content to be provided' + +=== ... === + +'Content to be provided' + +MDS-OSS RPCs for internal state management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +'Content to be provided' + +=== object precreation === + +'Content to be provided' + +=== orphan recovery === + +'Content to be provided' + +=== UID/GID change === + +'Content to be provided' + +=== unlink === + +'Content to be provided' + +=== ... === + +'Content to be provided' + +MDS-OSS RPCs for quota management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +'Content to be provided' + + +MDS-OSS OUT RPCs for distributed updates +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +'Content to be provided' + +=== DNE1 remote directories === + +'Content to be provided' + +=== DNE2 striped directories === + +'Content to be provided' + +=== LFSCK2/3 verification and repair === + +'Content to be provided' + +Message Flows +------------- + + Each file operation (in Lustre) generates a set of messages in a + particular sequence. There is one sequence for any particular + concrete operation, but under varying circumstances the same file + operation may generate a different sequence. + +State Machines +-------------- + + For each File operation, the collection of possible sequences of + messages is governed by a state machine. diff --git a/basement/ptlrpc.txt b/basement/ptlrpc.txt new file mode 100644 index 0000000..0e0e3bf --- /dev/null +++ b/basement/ptlrpc.txt @@ -0,0 +1,167 @@ +Lustre runs across multiple hosts, coordinating the activities among +those hosts via the exchange of messages over a network. On each host, +Lustre is implemented via a collection of threads. This discussion +will abstract some of the thread-level details in order to describe +the activities on each host as a collection of processes. Each process +may be thought of as a state machine, or automoton, following a fixed +set of rules for how it consumes messages, changes state, and produces +other messages; that is, its behavior. Processes communicate with each +other on a host via shared memory and with processes on other hosts +via messages. The Lustre protocol is the collection of messages the +processes exchange along with the rules governing the behavior of +those processes. + +In order to understand the Lustre protocol it is helpful to begin with +a description of messages being exchanged. Lustre uses a particular +format for its messages called PtlRPC. A PtlRPC message is a sequence +of bytes in a particular order and with specific meaning associated +with bytes in the message. The message (sequence of bytes) is +delivered to a lower level communication mechanism called LNet in +order to be transported from one host to another. This document will +not discuss LNet beyond identifying it as a transport layer that +abstracts any underlying details of the actual networking hardware. + +The following discussion is intended to be self-contained, in that +additional external documents are not necessary in order for one to +understand (and indeed implement) the behaviors and messges +described. Nevertheless, for the interested there will be occasional +references directly into the Lustre code-base where one may see the +protocol as it is realized in one particular implementation, that +being Lustre-2.6.92-0 as pulled from the git repository for Lustre on +January 26th, 2015. The sole exception to the rule that this document +is self-contained is that the discussion will not be burdened by the +actual numerical values for hard-coded implentation details like +"magic" value numbers or flags and their fields. References to the +source code will be provided as needed for a prospective (otherwise) +black-box implementer to build a compatible implementation. This +document will confine itself to the symbolic values. + +The structure of a PtlRPC message +================================= + +A PtlRPC message is a sequence of bytes. It can vary in length and has +additional structure, but its simplest expression is just a byte +array. The bytes of a message can be divided into an initial "header" +and one or more "buffers" that follow the header. The header at +beginning of a message can be further divided into a sequence of +(cf. lustre/include/lustre/lustre_idl.h: "struct lustre_msg_v2") eight +4-byte "fields" (32-bit unsinged integers) followed by a variable +length sequence of additional 4-byte entries organized as an +array. The fields, in order and using names abstracted from the +sources, are: + +header +------ +1) buffcount - The number of buffers that will follow the header. The + form and content of these buffers is discussed below. +2) secflvr - An indication of whether any sort of cyptographic + encoding of the susequent buffers will be in force. The value is + zero if there is no "crypto" and gives a code identifying the + "flavor" of crypto if it is employed. Further, if crypto is + employed there will only be one buffer following (i.e. buffcount = + 1), and that buffer is an encoding of what would otherwise have + been the sequence of buffers normally following the header. This + document will defer all discussion of cryptograpy. An addendum is + planned that will address it separately. +3) magic - PtlRPC messages include a "magic" value + (ibid. "LUSTRE_MSG_MAGIC_V2") that is checked in order to + positively identify that the message is intended for the use to + which it is being put. That is, we are indeed dealing with a PtlRPC + message, and not, for example, corrupted memory or a bad pointer. +4) repsize - An indication from the sender of an action request of the + maximum available space that has been set asside for any reply to + the request. A reply that attempts to use more than that much + space will be discarded. Question: How does the receiver know, at + the time of receipt, what the repsize value was from the request + the reply is in reply to? +5) cksum - The checksum (CRC-32-bit) of the header, including any + padding (see below) but not the additional buffers. +6) flags - On of two values (ibid. "LUSTRE_MSG_MAGIC_V1" and + "LUSTRE_MSG_MAGIC_V2") indicating ===What?== I forget. +7) padding - This field and the next are two 4-byte fields used to + assure that the following array is aligned on a 16-byte boundary. +8) padding - The second 4-byte padding field. +9) bufflens[] - An array of 4-byte unsigned integers with 'bufcount' + entries. Each entry corresponds to, and gives the length of, one + of the buffers that will follow and that constitute the remainder + of the message. +10) padding - The first of the buffers following the header must be + aligned on a 16-byte boundary. Since the length of the 'buflens' + array is in increments of four bytes we may need up to twelve + additional bytes of padding before the first buffer. + +The 'buffcount' field gives the number of buffers that follow. The +length of the i^{th} buffer is given by the field 'bufffen[i]', and +the buffers themselves follow immediately and in order. As mentioned +above, the 'secflvr' field will be zero unless some sort of +cryptographic encoding is employed, and the interpretation of +encrypted PtlRPC messages is left to another document. + +Each buffer has additional structure imposed on it, and the first +buffer always has the following format (ibid. "struct ptlrpc_body_v3") +with fields: +1) handle - A 64-bit value to uniquely determine shared state between + a sender and a reciever. When a communication is initiated, as in a + "connect" message (from a client to a server), the value will be + 0. A reply (from the server back to the client) to this message + will contain a value (a "cookie") to identify the shared + state information (the "export") for the client that is maintained + on the server. The client will then associate this cookie with the + shared state information (the "import") that it maintains for about + the server. Subsequent messages between this client and this server + will refer to the same shared state by using this cookie as the + handle in this field. +2) type - One of the three message types (ibid.) + "PTL_RPC_MSG_REQUEST", "PTL_RPC_MSG_ERR", or + "PTL_RPC_MSG_REPLY". As one might expect, "request" and "reply" are + the two usual message types, one for initiating and exchange and + the other for completing it. Teh "err" message type is only for + responding to a PtlRPC message that failed to be interpeted as an + actual message. That is, "err" does not reflect any kind of an + error in processing a PtlRPC once it has be decoded into its + constituent components, but only if and when that decoding fails. +3) version - This field encodes (ibid.) the "PTLRPC_MSG_VERSION" value + in combination ('or'ed) with one of the Lustre version symbols: + LUSTRE_OBD_VERSION + LUSTRE_MDS_VERSION + LUSTRE_OST_VERSION + LUSTRE_DLM_VERSION + LUSTRE_LOG_VERSION + LUSTRE_MGS_VERSION + What exactly is the significance of these? +4) opc - Gives the actual operation that is the subject of this + PtlRPC. There is a long list of such "op codes". Documenting the + semantics of each of them is one of the core purposes of this + document. For reference (ibid.) they are detailed elsewhere. + + If you look at all the instances in the source code defined in + *_cmd_t enumerations you get the above list of 73 items. If you look + in the req_formats struct in layout.c you will see a list of 94 + items. They have 44 items in common. Let's figure out the + connection between the two, if any. + + There are 95 distinct patterns of PtlRPC structures (grep for + "static const struct req_msg_field *" in + lustre/ptlrpc/layout.c). There are 94 named dialogs where each + dialog consistes of two of the foregoing PtlRPC structure + patterns. The pair of patterns is in the form of a call and + response pair, though there is also the option for having no + response or even for having neither a call nor a response. In those + cases the special PtlRPC structure pattern is refered to as + "empty". + +5) status - +6) last_xid - +7) last_seen - +8) last_committed - +9) transno - +10) flags - +11) op_flags - +12) conn_cnt - +13) timeout - +14) service_time - +15) limit - +16) slv - +17) pre_versions[PTLRPC_NUM_VERSIONS] - +18) padding[4] - +19) jobid[LUSTRE_JOBID_SIZE] - diff --git a/basement/request_message_pairs.txt b/basement/request_message_pairs.txt new file mode 100644 index 0000000..1c4bc9a --- /dev/null +++ b/basement/request_message_pairs.txt @@ -0,0 +1,511 @@ +Named Request Message Pairs +=========================== + +Each request message pair is identified in the source code by a symbol +that starts iwth "RQF_". Each has a name and two message formats. The +request message pair's name is a string. The two message formats in a +message pair are the symbols given for a request message, on the one +hand, and a reply message, on the other. + +Names +===== + +The name appears as a string in the first position (argument) of a +"struct req_format" declaration. The symbol for the message pair is +usually the same as the name string but with "RQF_" in front of +it. Thus the message pair named "MDS_CONNECT" appears in the code as +the symbol "RQF_MDS_CONNECT". The name itself is only used for +reporting purposes, for example in the traces decoded and reported +through the Lustre wireshark extension. There are a few occasions +(likely bugs) where the name and the symbol do not actually match in +the way described above. Those instances are: + +- Both RQF_LDLM_GL_DESC_CALLBACK and RQF_LDLM_GL_CALLBACK use the name + LDLM_GL_CALLBACK +- RQF_LOG_CANCEL is named OBD_LOG_CANCEL +- RQF_MDS_REINT_CREATE_SLAVE is named MDS_REINT_CREATE_EA +- Both RQF_MDS_CLOSE and RQF_MDS_RELEASE_CLOSE are named MDS_CLOSE +- Both RQF_OUT_UPDATE and RQF_OUT_UPDATE_OBJ are named OUT_UPDATE + +The RQF_* symbols are gathered in the: + + static struct req_format *req_formats[] = + +array in layout.c. Their declarations, with the pair's name and the +names of the two actual message formats, follow soon after in the same +file. + +Messages +======== + +The pair of symbols in a request message pair identify the format for +each corresponding PtlRPC message, with the first of the pair being a +request for action of some sort and the second being the reply +appropriate to the request. See message_formats.txt for details about +the structure of each message format. In many cases the host making +the request will be a Lustre client, but that is not universally the +case. For example, a lock callback might be initiated by the MDS and +the request sent from the MDS to a Lustre client. In other cases the +message request is between two Lustre servers. The message pairs +always list the (symbol for the) request first and then the +reply. Many of those pairs employ the words "client" and "server" in +the symbol's name. This is unfortunate. Such "client" messages are not +necessarily sent from Lustre clients and such "server" messages are +not necessarily replies sent from Lustre servers. A symbol with the +word "client" should always be thought of simply as the format of a +message initiating a request. Likewise, "server" symbols simply mean +the format for a message in reply to a request, whatever the actual +host that sends the reply. So the message pair should be understood +as a message initiating a request and the message that is anticipated +in reply to that request. + +Op Codes +======== + +The name in the RQF_BLA definition will sometimes, but not always, +correspond to the source code symbolic representation (in +lustre/include/lustre/lustre_idl.h) of an "op code" as encoded in the +"__u32 pb_opc" field of the "struct ptlrpc_body_v3". The op code +symbols are defined in a collection of *_cmd_t enumerations in +lustre_idl.h. + +Historically, PtlRPC message consturuction was somewhat ad hoc, and +carried out at the point where the message was needed. A newer and +more systematic construction mechanism resulted in the list of "struct +req_format RQF_*" message pair declarations. There are still some 21 or +so op codes defined in lustre_idl.h that have no corresponding "struct +req_format RQF_BLA" in layouts.c. One example is the "MGS_CONNECT" op +code. A code audit will be necessary to see if any of those are +obsolete (and/or unused) op codes or if they should be updated to the +new form. There are also about 50 "struct req_format RQF_BLA" message +pairs defined in layout.c for which there is no corresponding "BLA" op +code. A code audit will be needed to determine if any or all of them +should be introduced to lustre_idl.h as new op codes or if they (some +of them) are obsolete or unused. In some cases (eg. RQF_MDS_REINT_*) +one op code (eb. MDS_OPEN) is overloaded for all the message pairs of +that type. + +The following are the 94 named message pairs from layout.c. Each one +defines a pair of message formats, which are identified by name. The +message formats are reused and there are 95 such message formats for +the 94 named message pairs. The message formats themselves are +detailed in message_formats.txt. As mentioned above, the first of the +pair should be thought of as the actual "request message" being made, +and the second of the pair is the "reply message". + +struct req_format name op code +----------------- ----------------- ----------------- +RQF_CONNECT CONNECT + obd_connect_client + obd_connect_server + +RQF_FLD_QUERY FLD_QUERY + fld_query_client + fld_query_server + +RQF_FLD_READ FLD_READ + fld_read_client + fld_read_server + +RQF_LDLM_BL_CALLBACK LDLM_BL_CALLBACK LDLM_BL_CALLBACK + ldlm_enqueue_client + empty + +RQF_LDLM_CALLBACK LDLM_CALLBACK + ldlm_enqueue_client + empty + +RQF_LDLM_CANCEL LDLM_CANCEL LDLM_CANCEL + ldlm_enqueue_client + empty + +RQF_LDLM_CONVERT LDLM_CONVERT LDLM_CONVERT + ldlm_enqueue_client + ldlm_enqueue_server + +RQF_LDLM_CP_CALLBACK LDLM_CP_CALLBACK LDLM_CP_CALLBACK + ldlm_cp_callback_client + empty + +RQF_LDLM_ENQUEUE LDLM_ENQUEUE LDLM_ENQUEUE + ldlm_enqueue_client + ldlm_enqueue_lvb_server + +RQF_LDLM_ENQUEUE_LVB LDLM_ENQUEUE_LVB + ldlm_enqueue_client + ldlm_enqueue_lvb_server + +RQF_LDLM_GL_CALLBACK LDLM_GL_CALLBACK LDLM_GL_CALLBACK + ldlm_enqueue_client + ldlm_gl_callback_server + +RQF_LDLM_GL_DESC_CALLBACK LDLM_GL_CALLBACK + ldlm_gl_callback_desc_client + ldlm_gl_callback_server + +RQF_LDLM_INTENT LDLM_INTENT + ldlm_intent_client + ldlm_intent_server + +RQF_LDLM_INTENT_BASIC LDLM_INTENT_BASIC + ldlm_intent_basic_client + ldlm_enqueue_lvb_server + +RQF_LDLM_INTENT_CREATE LDLM_INTENT_CREATE + ldlm_intent_create_client + ldlm_intent_getattr_server + +RQF_LDLM_INTENT_GETATTR LDLM_INTENT_GETATTR + ldlm_intent_getattr_client + ldlm_intent_getattr_server + +RQF_LDLM_INTENT_GETXATTR LDLM_INTENT_GETXATTR + ldlm_intent_getxattr_client + ldlm_intent_getxattr_server + +RQF_LDLM_INTENT_LAYOUT LDLM_INTENT_LAYOUT + ldlm_intent_layout_client + ldlm_enqueue_lvb_server + +RQF_LDLM_INTENT_OPEN LDLM_INTENT_OPEN + ldlm_intent_open_client + ldlm_intent_open_server + +RQF_LDLM_INTENT_QUOTA LDLM_INTENT_QUOTA + ldlm_intent_quota_client + ldlm_intent_quota_server + +RQF_LDLM_INTENT_UNLINK LDLM_INTENT_UNLINK + ldlm_intent_unlink_client + ldlm_intent_server + +RQF_LFSCK_NOTIFY LFSCK_NOTIFY LFSCK_NOTIFY + obd_lfsck_request + empty + +RQF_LFSCK_QUERY LFSCK_QUERY LFSCK_QUERY + obd_lfsck_request + obd_lfsck_reply + +RQF_LLOG_ORIGIN_CONNECT LLOG_ORIGIN_CONNECT + llogd_conn_body_only + empty + +RQF_LLOG_ORIGIN_HANDLE_CREATE LLOG_ORIGIN_HANDLE_CREATE + llog_origin_handle_create_client + llogd_body_only + +RQF_LLOG_ORIGIN_HANDLE_DESTROY LLOG_ORIGIN_HANDLE_DESTROY + llogd_body_only + llogd_body_only + +RQF_LLOG_ORIGIN_HANDLE_NEXT_BLOCK LLOG_ORIGIN_HANDLE_NEXT_BLOCK + llogd_body_only + llog_origin_handle_next_block_server + +RQF_LLOG_ORIGIN_HANDLE_PREV_BLOCK LLOG_ORIGIN_HANDLE_PREV_BLOCK + llogd_body_only + llog_origin_handle_next_block_server + +RQF_LLOG_ORIGIN_HANDLE_READ_HEADER LLOG_ORIGIN_HANDLE_READ_HEADER + llogd_body_only + llog_log_hdr_only + +RQF_LOG_CANCEL OBD_LOG_CANCEL OBD_LOG_CANCEL + log_cancel_client + empty + +RQF_MDS_CLOSE MDS_CLOSE MDS_CLOSE + mdt_close_client + mds_last_unlink_server + +RQF_MDS_CONNECT MDS_CONNECT MDS_CONNECT + obd_connect_client + obd_connect_server + +RQF_MDS_DISCONNECT MDS_DISCONNECT MDS_DISCONNECT + empty + empty + +RQF_MDS_DONE_WRITING MDS_DONE_WRITING MDS_DONE_WRITING + mdt_close_client + mdt_body_only + +RQF_MDS_GETATTR MDS_GETATTR MDS_GETATTR + mdt_body_capa + mds_getattr_server + +RQF_MDS_GETATTR_NAME MDS_GETATTR_NAME MDS_GETATTR_NAME + mds_getattr_name_client + mds_getattr_server + +RQF_MDS_GETSTATUS MDS_GETSTATUS MDS_GETSTATUS + mdt_body_only + mdt_body_capa + +RQF_MDS_GETXATTR MDS_GETXATTR MDS_GETXATTR + mds_getxattr_client + mds_getxattr_server + +RQF_MDS_GET_INFO MDS_GET_INFO MDS_GET_INFO + mds_getinfo_client + mds_getinfo_server + +RQF_MDS_HSM_ACTION MDS_HSM_ACTION MDS_HSM_ACTION + mdt_body_capa + mdt_hsm_action_server + +RQF_MDS_HSM_CT_REGISTER MDS_HSM_CT_REGISTER MDS_HSM_CT_REGISTER + mdt_hsm_ct_register + empty + +RQF_MDS_HSM_CT_UNREGISTER MDS_HSM_CT_UNREGISTER MDS_HSM_CT_UNREGISTER + mdt_hsm_ct_unregister + empty + +RQF_MDS_HSM_PROGRESS MDS_HSM_PROGRESS MDS_HSM_PROGRESS + mdt_hsm_progress + empty + +RQF_MDS_HSM_REQUEST MDS_HSM_REQUEST MDS_HSM_REQUEST + mdt_hsm_request + empty + +RQF_MDS_HSM_STATE_GET MDS_HSM_STATE_GET MDS_HSM_STATE_GET + mdt_body_capa + mdt_hsm_state_get_server + +RQF_MDS_HSM_STATE_SET MDS_HSM_STATE_SET MDS_HSM_STATE_SET + mdt_hsm_state_set + empty + +RQF_MDS_QUOTACHECK MDS_QUOTACHECK MDS_QUOTACHECK + quotactl_only + empty + +RQF_MDS_QUOTACTL MDS_QUOTACTL MDS_QUOTACTL + quotactl_only + quotactl_only + +RQF_MDS_READPAGE MDS_READPAGE MDS_READPAGE + mdt_body_capa + mdt_body_only + +RQF_MDS_REINT MDS_REINT MDS_REINT + mds_reint_client + mdt_body_only + +RQF_MDS_REINT_CREATE MDS_REINT_CREATE MDS_OPEN + mds_reint_create_client + mdt_body_capa + +RQF_MDS_REINT_CREATE_RMT_ACL MDS_REINT_CREATE_RMT_ACL MDS_OPEN + mds_reint_create_rmt_acl_client + mdt_body_capa + +RQF_MDS_REINT_CREATE_SLAVE MDS_REINT_CREATE_EA MDS_OPEN + mds_reint_create_slave_client + mdt_body_capa + +RQF_MDS_REINT_CREATE_SYM MDS_REINT_CREATE_SYM MDS_OPEN + mds_reint_create_sym_client + mdt_body_capa + +RQF_MDS_REINT_LINK MDS_REINT_LINK MDS_OPEN + mds_reint_link_client + mdt_body_only + +RQF_MDS_REINT_OPEN MDS_REINT_OPEN MDS_OPEN + mds_reint_open_client + mds_reint_open_server + +RQF_MDS_REINT_RENAME MDS_REINT_RENAME MDS_OPEN + mds_reint_rename_client + mds_last_unlink_server + +RQF_MDS_REINT_SETATTR MDS_REINT_SETATTR MDS_OPEN + mds_reint_setattr_client + mds_setattr_server + +RQF_MDS_REINT_SETXATTR MDS_REINT_SETXATTR MDS_OPEN + mds_reint_setxattr_client + mdt_body_only + +RQF_MDS_REINT_UNLINK MDS_REINT_UNLINK MDS_OPEN + mds_reint_unlink_client + mds_last_unlink_server + +RQF_MDS_RELEASE_CLOSE MDS_CLOSE + mdt_release_close_client + mds_last_unlink_server + +RQF_MDS_STATFS MDS_STATFS MDS_STATFS + empty + obd_statfs_server + +RQF_MDS_SWAP_LAYOUTS MDS_SWAP_LAYOUTS MDS_SWAP_LAYOUTS + mdt_swap_layouts + empty + +RQF_MDS_SYNC MDS_SYNC MDS_SYNC + mdt_body_capa + mdt_body_only + +RQF_MGS_CONFIG_READ MGS_CONFIG_READ MGS_CONFIG_READ + mgs_config_read_client + mgs_config_read_server + +RQF_MGS_SET_INFO MGS_SET_INFO MGS_SET_INFO + mgs_set_info + mgs_set_info + +RQF_MGS_TARGET_REG MGS_TARGET_REG MGS_TARGET_REG + mgs_target_info_only + mgs_target_info_only + +RQF_OBD_IDX_READ OBD_IDX_READ OBD_IDX_READ + obd_idx_read_client + obd_idx_read_server + +RQF_OBD_PING OBD_PING OBD_PING + empty + empty + +RQF_OBD_SET_INFO OBD_SET_INFO + obd_set_info_client + empty + +RQF_OST_BRW_READ OST_BRW_READ + ost_brw_client + ost_brw_read_server + +RQF_OST_BRW_WRITE OST_BRW_WRITE + ost_brw_client + ost_brw_write_server + +RQF_OST_CONNECT OST_CONNECT OST_CONNECT + obd_connect_client + obd_connect_server + +RQF_OST_CREATE OST_CREATE OST_CREATE + ost_body_only + ost_body_only + +RQF_OST_DESTROY OST_DESTROY OST_DESTROY + ost_destroy_client + ost_body_only + +RQF_OST_DISCONNECT OST_DISCONNECT OST_DISCONNECT + empty + empty + +RQF_OST_GETATTR OST_GETATTR OST_GETATTR + ost_body_capa + ost_body_only + +RQF_OST_GET_INFO OST_GET_INFO OST_GET_INFO + ost_get_info_generic_client + ost_get_info_generic_server + +RQF_OST_GET_INFO_FIEMAP OST_GET_INFO_FIEMAP + ost_get_fiemap_client + ost_get_fiemap_server + +RQF_OST_GET_INFO_LAST_FID OST_GET_INFO_LAST_FID + ost_get_last_fid_client + ost_get_last_fid_server + +RQF_OST_GET_INFO_LAST_ID OST_GET_INFO_LAST_ID + ost_get_info_generic_client + ost_get_last_id_server + +RQF_OST_PUNCH OST_PUNCH OST_PUNCH + ost_body_capa + ost_body_only + +RQF_OST_QUOTACHECK OST_QUOTACHECK OST_QUOTACHECK + quotactl_only + empty + +RQF_OST_QUOTACTL OST_QUOTACTL OST_QUOTACTL + quotactl_only + quotactl_only + +RQF_OST_SETATTR OST_SETATTR OST_SETATTR + ost_body_capa + ost_body_only + +RQF_OST_SET_GRANT_INFO OST_SET_GRANT_INFO + ost_grant_shrink_client + ost_body_only + +RQF_OST_SET_INFO_LAST_FID OST_SET_INFO_LAST_FID + obd_set_info_client + empty + +RQF_OST_STATFS OST_STATFS OST_STATFS + empty + obd_statfs_server + +RQF_OST_SYNC OST_SYNC OST_SYNC + ost_body_capa + ost_body_only + +RQF_OUT_UPDATE OUT_UPDATE_OBJ OUT_UPDATE + mds_update_client + mds_update_server + +RQF_QC_CALLBACK QC_CALLBACK + quotactl_only + empty + +RQF_QUOTA_DQACQ QUOTA_DQACQ + quota_body_only + quota_body_only + +RQF_SEC_CTX SEC_CTX + empty + empty + +RQF_SEQ_QUERY SEQ_QUERY + seq_query_client + seq_query_server + +Extra Op Codes +============== + +These op codes apear in some *_cmd_t enumeration in lustre_idl.h, but +do not correspond to any RQF_* request format declaration in layout.c: + +LDLM_SET_INFO +MDS_IS_SUBDIR +MDS_PIN +MDS_SETXATTR +MDS_SET_INFO +MDS_UNPIN +MDS_WRITEPAGE +MGS_CONNECT +MGS_DISCONNECT +MGS_EXCEPTION +MGS_TARGET_DEL +OBD_LOG_CANCEL +OST_CLOSE +OST_OPEN +OST_QUOTA_ADJUST_QUNIT +OST_READ +OST_REPLY +OST_SET_INFO +OST_WRITE +QUOTA_DQREL +SEC_CTX_FINI +SEC_CTX_INIT +SEC_CTX_INIT_CONT + +A code audit will be needed to determine if they are unused or, like +MGS_CONNECT, are introduced in an ad hoc fashion at the point the +message is needed. If possible, the still in use op codes should be +folded into the new "struct req_format" style of declaration, and +their creation and management made uniform with the rest. In cases +where this is impossible, the reason and the handling should both be +explained clearly. + diff --git a/basement/struct_defs.txt b/basement/struct_defs.txt new file mode 100644 index 0000000..ece35eb --- /dev/null +++ b/basement/struct_defs.txt @@ -0,0 +1,797 @@ +This is all the structs defined in lustre_idl.h iand lustre_user.h, in +the lustre/include/lustre/ directory, in support of the message format +symbols. If the right column has that means there is no +corresponding "RMF_BLA" declaration, it is either unused or used in a +way I haven't quite sorted out. If there is an X in the right column +then the RMF_BLA declaration's name is identical to the 'struct' +declaration, but if the "name" is different it is noted in +parentheses. If there is a name in the right column then the space +allocated for the RMF_BLA declaration is the size for this struct, but +the name in the RMF_BLA declaration doesn't exactly match. The +ptlrpc_body_v2 and ptlrpc_body_v3 structs do not appear in any +declarations, but there is a #DEFINE for ptlrpc_body that defines it +as ptlrpc_body_v3 so I treat it as matching. A #if construct will +substitute obd_connect_data_v1 for obdconnect_data in code from +Lustre-2.0 and before. + +struct definiiton RMF_* name +----------------- ----------- +close_data X (data_version) +getinfo_fid2path +hsm_current_action X +hsm_progress_kernel hsm_progress +hsm_request X +hsm_state_set X +hsm_user_item X +idx_info X +layout_intent X +ldlm_intent X +ldlm_reply dlm_rep +ldlm_request dlm_req +lfsck_reply X +lfsck_request X +ll_fiemap_info_key fiemap +llog_cookie llog_cookies +llog_log_hdr X +llogd_body X +llogd_conn_body X +lu_fid fid +lu_seq_range fld_query_mdfld +lustre_capa capa +lustre_capa_key +lustre_msg_v2 [the is the header, not a message format, per se] +mdc_swap_layouts swap_layouts +mdt_body X +mdt_ioepoch X +mdt_rec_reint rec_reint +mdt_rec_rename +mgs_config_res X (mgs_config_read reply) +mgs_send_param X +mgs_target_info X +niobuf_remote X +obd_connect_data cdata +obd_connect_data_v1 +obd_ioobj X +obd_quotactl X +obd_statfs X +obd_uuid tgtuuid +object_update X +object_update_param +object_update_reply X +object_update_request +object_update_result +ost_body X +ptlrpc_body_v2 +ptlrpc_body_v3 X (as ptlrpc_body) +quota_body X + +---------------------------------------------------------------------- +These are definitions from the header files that appear in +structures.txt but are not (or do not correspond to) struct +definitions: + +lustre_acl.h +# define LUSTRE_POSIX_ACL_MAX_SIZE \ + (sizeof(posix_acl_xattr_header) + \ + LUSTRE_POSIX_ACL_MAX_ENTRIES * sizeof(posix_acl_xattr_entry)) + +lustre_idl.h +#define MIN_MD_SIZE (sizeof(struct lov_mds_md) + 1 * sizeof(struct lov_ost_data)) + +---------------------------------------------------------------------- + +The following are the actual struct definitions for the two header +files. Which file is listed, then the struct defnition. + +lustre_idl.h +struct close_data { + struct lustre_handle cd_handle; + struct lu_fid cd_fid; + __u64 cd_data_version; + __u64 cd_reserved[8]; +}; + +lustre_user.h +struct hsm_current_action { + /** The current undergoing action, if there is one */ + /* state is one of hsm_progress_states */ + __u32 hca_state; + /* action is one of hsm_user_action */ + __u32 hca_action; + struct hsm_extent hca_location; +}; + +lustre_user.h +struct hsm_extent { + __u64 offset; + __u64 length; +} __attribute__((packed)); + +lustre_idl.h +struct hsm_progress_kernel { + /* Field taken from struct hsm_progress */ + lustre_fid hpk_fid; + __u64 hpk_cookie; + struct hsm_extent hpk_extent; + __u16 hpk_flags; + __u16 hpk_errval; /* positive val */ + __u32 hpk_padding1; + /* Additional fields */ + __u64 hpk_data_version; + __u64 hpk_padding2; +} __attribute__((packed)); + +lustre_user.h +struct hsm_request { + __u32 hr_action; /* enum hsm_user_action */ + __u32 hr_archive_id; /* archive id, used only with HUA_ARCHIVE */ + __u64 hr_flags; /* request flags */ + __u32 hr_itemcount; /* item count in hur_user_item vector */ + __u32 hr_data_len; +}; + +lustre_idl.h +struct hsm_state_set { + __u32 hss_valid; + __u32 hss_archive_id; + __u64 hss_setmask; + __u64 hss_clearmask; +}; + +lustre_user.h +struct hsm_user_item { + lustre_fid hui_fid; + struct hsm_extent hui_extent; +} __attribute__((packed)); + +lustre_user.h +struct hsm_user_state { + /** Current HSM states, from enum hsm_states. */ + __u32 hus_states; + __u32 hus_archive_id; + /** The current undergoing action, if there is one */ + __u32 hus_in_progress_state; + __u32 hus_in_progress_action; + struct hsm_extent hus_in_progress_location; + char hus_extended_info[]; +}; + +lustre_idl.h +struct idx_info { + __u32 ii_magic; + + /* reply: see idx_info_flags below */ + __u32 ii_flags; + + /* request & reply: number of lu_idxpage (to be) transferred */ + __u16 ii_count; + __u16 ii_pad0; + + /* request: requested attributes passed down to the iterator API */ + __u32 ii_attrs; + + /* request & reply: index file identifier (FID) */ + struct lu_fid ii_fid; + + /* reply: version of the index file before starting to walk the index. + * Please note that the version can be modified at any time during the + * transfer */ + __u64 ii_version; + + /* request: hash to start with: + * reply: hash of the first entry of the first lu_idxpage and hash + * of the entry to read next if any */ + __u64 ii_hash_start; + __u64 ii_hash_end; + + /* reply: size of keys in lu_idxpages, minimal one if II_FL_VARKEY is + * set */ + __u16 ii_keysize; + + /* reply: size of records in lu_idxpages, minimal one if II_FL_VARREC + * is set */ + __u16 ii_recsize; + + __u32 ii_pad1; + __u64 ii_pad2; + __u64 ii_pad3; +}; + +lustre_idl.h +struct layout_intent { + __u32 li_opc; /* intent operation for enqueue, read, write etc */ + __u32 li_flags; + __u64 li_start; + __u64 li_end; +}; + +lustre_idl.h +union ldlm_gl_desc { + struct ldlm_gl_lquota_desc lquota_desc; +}; + +lustre_idl.h +struct ldlm_gl_lquota_desc { + union lquota_id gl_id; /* quota ID subject to the glimpse */ + __u64 gl_flags; /* see LQUOTA_FL* below */ + __u64 gl_ver; /* new index version */ + __u64 gl_hardlimit; /* new hardlimit or qunit value */ + __u64 gl_softlimit; /* new softlimit */ + __u64 gl_time; + __u64 gl_pad2; +}; + +lustre_idl.h +struct ldlm_intent { + __u64 opc; +}; + +lustre_idl.h +struct ldlm_lock_desc { + struct ldlm_resource_desc l_resource; + ldlm_mode_t l_req_mode; + ldlm_mode_t l_granted_mode; + ldlm_wire_policy_data_t l_policy_data; +}; + +lustre_idl.h +struct ldlm_reply { + __u32 lock_flags; + __u32 lock_padding; /* also fix lustre_swab_ldlm_reply */ + struct ldlm_lock_desc lock_desc; + struct lustre_handle lock_handle; + __u64 lock_policy_res1; + __u64 lock_policy_res2; +}; + +lustre_idl.h +struct ldlm_request { + __u32 lock_flags; + __u32 lock_count; + struct ldlm_lock_desc lock_desc; + struct lustre_handle lock_handle[LDLM_LOCKREQ_HANDLES]; +}; + +lustre_idl.h +#define RES_NAME_SIZE 4 +struct ldlm_res_id { + __u64 name[RES_NAME_SIZE]; +}; + +lustre_idl.h +struct ldlm_resource_desc { + ldlm_type_t lr_type; + __u32 lr_padding; /* also fix lustre_swab_ldlm_resource_desc */ + struct ldlm_res_id lr_name; +}; + +lustre_idl.h +struct lfsck_reply { + __u32 lr_status; + __u32 lr_padding_1; + __u64 lr_padding_2; +}; + +lustre_idl.h +struct lfsck_request { + __u32 lr_event; + __u32 lr_index; + __u32 lr_flags; + __u32 lr_valid; + union { + __u32 lr_speed; + __u32 lr_status; + __u32 lr_type; + }; + __u16 lr_version; + __u16 lr_active; + __u16 lr_param; + __u16 lr_async_windows; + __u32 lr_flags2; + struct lu_fid lr_fid; + struct lu_fid lr_fid2; + struct lu_fid lr_fid3; + __u64 lr_padding_1; + __u64 lr_padding_2; +}; + +lustre_idl.h +struct ll_fiemap_info_key { + char name[8]; + struct obdo oa; + struct ll_user_fiemap fiemap; +}; + +ll_fiemap.h +struct ll_user_fiemap { + __u64 fm_start; /* logical offset (inclusive) at + * which to start mapping (in) */ + __u64 fm_length; /* logical length of mapping which + * userspace wants (in) */ + __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ + __u32 fm_mapped_extents;/* number of extents that were mapped (out) */ + __u32 fm_extent_count; /* size of fm_extents array (in) */ + __u32 fm_reserved; + struct ll_fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ +}; + +ll_fiemap.h +struct ll_fiemap_extent { + __u64 fe_logical; /* logical offset in bytes for the start of + * the extent from the beginning of the file */ + __u64 fe_physical; /* physical offset in bytes for the start + * of the extent from the beginning of the disk */ + __u64 fe_length; /* length in bytes for this extent */ + __u64 fe_reserved64[2]; + __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ + __u32 fe_device; /* device number for this extent */ + __u32 fe_reserved[2]; +}; + +lustre_idl.h +struct llog_cookie { + struct llog_logid lgc_lgl; + __u32 lgc_subsys; + __u32 lgc_index; + __u32 lgc_padding; +} __attribute__((packed)); + +lustre_idl.h +struct llog_gen { + __u64 mnt_cnt; + __u64 conn_cnt; +} __attribute__((packed)); + +lustre_idl.h +struct llog_log_hdr { + struct llog_rec_hdr llh_hdr; + __s64 llh_timestamp; + __u32 llh_count; + __u32 llh_bitmap_offset; + __u32 llh_size; + __u32 llh_flags; + __u32 llh_cat_idx; + /* for a catalog the first plain slot is next to it */ + struct obd_uuid llh_tgtuuid; + __u32 llh_reserved[LLOG_HEADER_SIZE/sizeof(__u32) - 23]; + __u32 llh_bitmap[LLOG_BITMAP_BYTES/sizeof(__u32)]; + struct llog_rec_tail llh_tail; +} __attribute__((packed)); + +lustre_idl.h +struct llog_logid { + struct ost_id lgl_oi; + __u32 lgl_ogen; +} __attribute__((packed)); + +lustre_idl.h +struct llog_rec_hdr { + __u32 lrh_len; + __u32 lrh_index; + __u32 lrh_type; + __u32 lrh_id; +}; + +lustre_idl.h +struct llog_rec_tail { + __u32 lrt_len; + __u32 lrt_index; +}; + +lustre_idl.h +struct llogd_body { + struct llog_logid lgd_logid; + __u32 lgd_ctxt_idx; + __u32 lgd_llh_flags; + __u32 lgd_index; + __u32 lgd_saved_index; + __u32 lgd_len; + __u64 lgd_cur_offset; +} __attribute__((packed)); + +lustre_idl.h +struct llogd_conn_body { + struct llog_gen lgdc_gen; + struct llog_logid lgdc_logid; + __u32 lgdc_ctxt_idx; +} __attribute__((packed)); + +#define lov_mds_md lov_mds_md_v1 +struct lov_mds_md_v1 { /* LOV EA mds/wire data (little-endian) */ + __u32 lmm_magic; /* magic number = LOV_MAGIC_V1 */ + __u32 lmm_pattern; /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */ + struct ost_id lmm_oi; /* LOV object ID */ + __u32 lmm_stripe_size; /* size of stripe in bytes */ + /* lmm_stripe_count used to be __u32 */ + __u16 lmm_stripe_count; /* num stripes in use for this object */ + __u16 lmm_layout_gen; /* layout generation number */ + struct lov_ost_data_v1 lmm_objects[0]; /* per-stripe data */ +}; + +lustre_user.h +#define lov_ost_data lov_ost_data_v1 +struct lov_ost_data_v1 { /* per-stripe data structure (little-endian)*/ + struct ost_id l_ost_oi; /* OST object ID */ + __u32 l_ost_gen; /* generation of this l_ost_idx */ + __u32 l_ost_idx; /* OST index in LOV (lov_tgt_desc->tgts) */ +}; + +lustre_user.h +struct lu_fid { + /** + * FID sequence. Sequence is a unit of migration: all files (objects) + * with FIDs from a given sequence are stored on the same server. + * Lustre should support 2^64 objects, so even if each sequence + * has only a single object we can still enumerate 2^64 objects. + **/ + __u64 f_seq; + /* FID number within sequence. */ + __u32 f_oid; + /** + * FID version, used to distinguish different versions (in the sense + * of snapshots, etc.) of the same file system object. Not currently + * used. + **/ + __u32 f_ver; +}; + +lustre_idl.h +struct lu_seq_range { + __u64 lsr_start; + __u64 lsr_end; + __u32 lsr_index; + __u32 lsr_flags; +}; + +lustre_idl.h +struct lustre_capa { + struct lu_fid lc_fid; /** fid */ + __u64 lc_opc; /** operations allowed */ + __u64 lc_uid; /** file owner */ + __u64 lc_gid; /** file group */ + __u32 lc_flags; /** HMAC algorithm & flags */ + __u32 lc_keyid; /** key# used for the capability */ + __u32 lc_timeout; /** capa timeout value (sec) */ + __u32 lc_expiry; /** expiry time (sec) */ + __u8 lc_hmac[CAPA_HMAC_MAX_LEN]; /** HMAC */ +} __attribute__((packed)); + +lustre_idl.h +struct lustre_handle { + __u64 cookie; +}; + +lustre_idl.h +struct lustre_msg_v2 { + __u32 lm_bufcount; + __u32 lm_secflvr; + __u32 lm_magic; + __u32 lm_repsize; + __u32 lm_cksum; + __u32 lm_flags; + __u32 lm_padding_2; + __u32 lm_padding_3; + __u32 lm_buflens[0]; +}; + +lustre_idl.h +struct mdc_swap_layouts { + __u64 msl_flags; +} __packed; + +lustre_idl.h +struct mdt_body { + struct lu_fid mbo_fid1; + struct lu_fid mbo_fid2; + struct lustre_handle mbo_handle; + __u64 mbo_valid; + __u64 mbo_size; /* Offset, in the case of MDS_READPAGE */ + __s64 mbo_mtime; + __s64 mbo_atime; + __s64 mbo_ctime; + __u64 mbo_blocks; /* XID, in the case of MDS_READPAGE */ + __u64 mbo_ioepoch; + __u64 mbo_t_state; /* transient file state defined in + * enum md_transient_state + * was "ino" until 2.4.0 */ + __u32 mbo_fsuid; + __u32 mbo_fsgid; + __u32 mbo_capability; + __u32 mbo_mode; + __u32 mbo_uid; + __u32 mbo_gid; + __u32 mbo_flags; + __u32 mbo_rdev; + __u32 mbo_nlink; /* #bytes to read in the case of MDS_READPAGE */ + __u32 mbo_unused2; /* was "generation" until 2.4.0 */ + __u32 mbo_suppgid; + __u32 mbo_eadatasize; + __u32 mbo_aclsize; + __u32 mbo_max_mdsize; + __u32 mbo_max_cookiesize; + __u32 mbo_uid_h; /* high 32-bits of uid, for FUID */ + __u32 mbo_gid_h; /* high 32-bits of gid, for FUID */ + __u32 mbo_padding_5; /* also fix lustre_swab_mdt_body */ + __u64 mbo_padding_6; + __u64 mbo_padding_7; + __u64 mbo_padding_8; + __u64 mbo_padding_9; + __u64 mbo_padding_10; +}; /* 216 */ + +lustre_idl.h +struct mdt_ioepoch { + struct lustre_handle handle; + __u64 ioepoch; + __u32 flags; + __u32 padding; +}; + +lustre_idl.h +struct mdt_rec_reint { + __u32 rr_opcode; + __u32 rr_cap; + __u32 rr_fsuid; + __u32 rr_fsuid_h; + __u32 rr_fsgid; + __u32 rr_fsgid_h; + __u32 rr_suppgid1; + __u32 rr_suppgid1_h; + __u32 rr_suppgid2; + __u32 rr_suppgid2_h; + struct lu_fid rr_fid1; + struct lu_fid rr_fid2; + __s64 rr_mtime; + __s64 rr_atime; + __s64 rr_ctime; + __u64 rr_size; + __u64 rr_blocks; + __u32 rr_bias; + __u32 rr_mode; + __u32 rr_flags; + __u32 rr_flags_h; + __u32 rr_umask; + __u32 rr_padding_4; /* also fix lustre_swab_mdt_rec_reint */ +}; + +lustre_idl.h +struct mgs_config_res { + __u64 mcr_offset; /* index of last config log */ + __u64 mcr_size; /* size of the log */ +}; + +lustre_idl.h +struct mgs_send_param { + char mgs_param[MGS_PARAM_MAXLEN]; +}; + +lustre_idl.h +struct mgs_target_info { + __u32 mti_lustre_ver; + __u32 mti_stripe_index; + __u32 mti_config_ver; + __u32 mti_flags; + __u32 mti_nid_count; + __u32 mti_instance; /* Running instance of target */ + char mti_fsname[MTI_NAME_MAXLEN]; + char mti_svname[MTI_NAME_MAXLEN]; + char mti_uuid[sizeof(struct obd_uuid)]; + __u64 mti_nids[MTI_NIDS_MAX]; /* host nids (lnet_nid_t)*/ + char mti_params[MTI_PARAM_MAXLEN]; +}; + +lustre_idl.h +struct niobuf_remote { + __u64 rnb_offset; + __u32 rnb_len; + __u32 rnb_flags; +}; + +lustre_idl.h +struct obd_connect_data { + __u64 ocd_connect_flags; /* OBD_CONNECT_* per above */ + __u32 ocd_version; /* lustre release version number */ + __u32 ocd_grant; /* initial cache grant amount (bytes) */ + __u32 ocd_index; /* LOV index to connect to */ + __u32 ocd_brw_size; /* Maximum BRW size in bytes */ + __u64 ocd_ibits_known; /* inode bits this client understands */ + __u8 ocd_blocksize; /* log2 of the backend filesystem blocksize */ + __u8 ocd_inodespace; /* log2 of the per-inode space consumption */ + __u16 ocd_grant_extent; /* per-extent grant overhead, in 1K blocks */ + __u32 ocd_unused; /* also fix lustre_swab_connect */ + __u64 ocd_transno; /* first transno from client to be replayed */ + __u32 ocd_group; /* MDS group on OST */ + __u32 ocd_cksum_types; /* supported checksum algorithms */ + __u32 ocd_max_easize; /* How big LOV EA can be on MDS */ + __u32 ocd_instance; /* instance # of this target */ + __u64 ocd_maxbytes; /* Maximum stripe size in bytes */ + /* Fields after ocd_maxbytes are only accessible by the receiver + * if the corresponding flag in ocd_connect_flags is set. Accessing + * any field after ocd_maxbytes on the receiver without a valid flag + * may result in out-of-bound memory access and kernel oops. */ + __u64 padding1; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding2; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding3; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding4; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding5; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding6; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding7; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding8; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 padding9; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 paddingA; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 paddingB; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 paddingC; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 paddingD; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 paddingE; /* added 2.1.0. also fix lustre_swab_connect */ + __u64 paddingF; /* added 2.1.0. also fix lustre_swab_connect */ +}; + +lustre_user.h +struct obd_dqblk { + __u64 dqb_bhardlimit; + __u64 dqb_bsoftlimit; + __u64 dqb_curspace; + __u64 dqb_ihardlimit; + __u64 dqb_isoftlimit; + __u64 dqb_curinodes; + __u64 dqb_btime; + __u64 dqb_itime; + __u32 dqb_valid; + __u32 dqb_padding; +}; + +lustre_user.h +struct obd_dqinfo { + __u64 dqi_bgrace; + __u64 dqi_igrace; + __u32 dqi_flags; + __u32 dqi_valid; +}; + +lustre_idl.h +struct obd_ioobj { + struct ost_id ioo_oid; /* object ID, if multi-obj BRW */ + __u32 ioo_max_brw; /* low 16 bits were o_mode before 2.4, + * now (PTLRPC_BULK_OPS_COUNT - 1) in + * high 16 bits in 2.4 and later */ + __u32 ioo_bufcnt; /* number of niobufs for this object */ +}; + +lustre_idl.h +struct obd_quotactl { + __u32 qc_cmd; + __u32 qc_type; /* see Q_* flag below */ + __u32 qc_id; + __u32 qc_stat; + struct obd_dqinfo qc_dqinfo; + struct obd_dqblk qc_dqblk; +}; + +lustre_user.h +struct obd_statfs { + __u64 os_type; + __u64 os_blocks; + __u64 os_bfree; + __u64 os_bavail; + __u64 os_files; + __u64 os_ffree; + __u8 os_fsid[40]; + __u32 os_bsize; + __u32 os_namelen; + __u64 os_maxbytes; + __u32 os_state; /**< obd_statfs_state OS_STATE_* flag */ + __u32 os_fprecreated; /* objs available now to the caller */ + /* used in QoS code to find preferred + * OSTs */ + __u32 os_spare2; + __u32 os_spare3; + __u32 os_spare4; + __u32 os_spare5; + __u32 os_spare6; + __u32 os_spare7; + __u32 os_spare8; + __u32 os_spare9; +}; + +lustre_user.h +#define UUID_MAX 40 +struct obd_uuid { + char uuid[UUID_MAX]; +}; + +struct obdo { + __u64 o_valid; /* hot fields in this obdo */ + struct ost_id o_oi; + __u64 o_parent_seq; + __u64 o_size; /* o_size-o_blocks == ost_lvb */ + __s64 o_mtime; + __s64 o_atime; + __s64 o_ctime; + __u64 o_blocks; /* brw: cli sent cached bytes */ + __u64 o_grant; + + /* 32-bit fields start here: keep an even number of them via padding */ + __u32 o_blksize; /* optimal IO blocksize */ + __u32 o_mode; /* brw: cli sent cache remain */ + __u32 o_uid; + __u32 o_gid; + __u32 o_flags; + __u32 o_nlink; /* brw: checksum */ + __u32 o_parent_oid; + __u32 o_misc; /* brw: o_dropped */ + + __u64 o_ioepoch; /* epoch in ost writes */ + __u32 o_stripe_idx; /* holds stripe idx */ + __u32 o_parent_ver; + struct lustre_handle o_handle; /* brw: lock handle to prolong + * locks */ + struct llog_cookie o_lcookie; /* destroy: unlink cookie from + * MDS */ + __u32 o_uid_h; + __u32 o_gid_h; + + __u64 o_data_version; /* getattr: sum of iversion for + * each stripe. + * brw: grant space consumed on + * the client for the write */ + __u64 o_padding_4; + __u64 o_padding_5; + __u64 o_padding_6; +}; + +lustre_idl.h +struct ost_body { + struct obdo oa; +}; + +lustre_user.h +struct ost_id { + union { + struct { + __u64 oi_id; + __u64 oi_seq; + } oi; + struct lu_fid oi_fid; + }; +}; + +lustre_idl.h +struct ptlrpc_body_v3 { + struct lustre_handle pb_handle; + __u32 pb_type; + __u32 pb_version; + __u32 pb_opc; + __u32 pb_status; + __u64 pb_last_xid; + __u64 pb_last_seen; + __u64 pb_last_committed; + __u64 pb_transno; + __u32 pb_flags; + __u32 pb_op_flags; + __u32 pb_conn_cnt; + __u32 pb_timeout; /* for req, the deadline, for rep, the service est */ + __u32 pb_service_time; /* for rep, actual service time */ + __u32 pb_limit; + __u64 pb_slv; + /* VBR: pre-versions */ + __u64 pb_pre_versions[PTLRPC_NUM_VERSIONS]; + /* padding for future needs */ + __u64 pb_padding[4]; + char pb_jobid[LUSTRE_JOBID_SIZE]; +}; + +lustre_idl.h +struct quota_body { + struct lu_fid qb_fid; /* FID of global index packing the pool ID + * and type (data or metadata) as well as + * the quota type (user or group). */ + union lquota_id qb_id; /* uid or gid or directory FID */ + __u32 qb_flags; /* see below */ + __u32 qb_padding; + __u64 qb_count; /* acquire/release count (kbytes/inodes) */ + __u64 qb_usage; /* current slave usage (kbytes/inodes) */ + __u64 qb_slv_ver; /* slave index file version */ + struct lustre_handle qb_lockh; /* per-ID lock handle */ + struct lustre_handle qb_glb_lockh; /* global lock handle */ + __u64 qb_padding1[4]; +}; diff --git a/basement/structures.txt b/basement/structures.txt new file mode 100644 index 0000000..81f8136 --- /dev/null +++ b/basement/structures.txt @@ -0,0 +1,104 @@ +lustre/ptlrpc/layout.c establishes a mapping between the symbols +(starting with RMF_) for message format structures and the actual +structure definitions they represent. There are three entries in this +mapping that do not appear in any message format. They are: + RMF_OST_ID => ost_id + RMF_STRING => string + RMF_U32 => u32 +There are also three duplicates: + RMF_CAPA1 => capa and + RMF_CAPA2 => capa + RMF_FIEMAP_KEY => fiemap and + RMF_FIEMAP_VAL => fiemap + RMF_NIOBUF_REMOTE => niobuf_remote and + RMF_RCS => niobuf_remote + +The mesage format symbols RMF_MGS_CONFIG_BODY and RMF_MGS_CONFIG_RES +have the names, respecitvely, "mgs_config_read request" and +"mgs_config_read reply" with two words (or a missing +underscore). Similarly, RMF_U32 has the name "generic u32". + +Many of the message format symbols have a name that is the same as the +symbol but in lower case and with out the leading "RMF_". On the other +hand many of the symbols do not follow this pattern. Each declaration +also has a field for the size to be allocated for the structure, and +most use the "sizeof(struct the_struct)" construction to identify the +size needed. That construct then gives a notion of what actual +structure definition is going to give meaning to the bytes in that +structure. The sizeof column presents that 'struct' name or a hint +about the size of the message. A value of "-1" means, "the size is not +going to be well defined in advance." Thus additional hits will need +ot be included in the rest of the message. + +message format symbol name sizeof +--------------------- ---------------- ------------------ +RMF_ACL acl LUSTRE_POSIX_ACL_MAX_SIZE +RMF_CAPA1 capa lustre_capa +RMF_CAPA2 capa lustre_capa +RMF_CLOSE_DATA data_version close_data +RMF_CLUUID cluuid obd_uuid +RMF_CONN conn lustre_handle +RMF_CONNECT_DATA cdata obd_connect_data +RMF_DLM_GL_DESC dlm_gl_desc ldlm_gl_desc +RMF_DLM_LVB dlm_lvb -1 +RMF_DLM_REP dlm_rep ldlm_reply +RMF_DLM_REQ dlm_req ldlm_request +RMF_EADATA eadata -1 +RMF_EAVALS eavals -1 +RMF_EAVALS_LENS eavals_lens __u32 +RMF_FID fid lu_fid +RMF_FIEMAP_KEY fiemap ll_fiemap_info_key +RMF_FIEMAP_VAL fiemap -1 +RMF_FLD_MDFLD fld_query_mdfld lu_seq_range +RMF_FLD_OPC fld_query_opc __u32 +RMF_GENERIC_DATA generic_data -1 +RMF_GETINFO_KEY getinfo_key -1 +RMF_GETINFO_VAL getinfo_val -1 +RMF_GETINFO_VALLEN getinfo_vallen __u32 +RMF_HSM_STATE_SET hsm_state_set hsm_state_set +RMF_HSM_USER_STATE hsm_user_state hsm_user_state +RMF_IDX_INFO idx_info idx_info +RMF_LAYOUT_INTENT layout_intent layout_intent +RMF_LDLM_INTENT ldlm_intent ldlm_intent +RMF_LFSCK_REPLY lfsck_reply lfsck_reply +RMF_LFSCK_REQUEST lfsck_request lfsck_request +RMF_LLOGD_BODY llogd_body llogd_body +RMF_LLOGD_CONN_BODY llogd_conn_body llogd_conn_body +RMF_LLOG_LOG_HDR llog_log_hdr llog_log_hdr +RMF_LOGCOOKIES logcookies llog_cookie +RMF_MDS_HSM_ARCHIVE hsm_archive __u32 +RMF_MDS_HSM_CURRENT_ACTION hsm_current_action hsm_current_action +RMF_MDS_HSM_PROGRESS hsm_progress hsm_progress_kernel +RMF_MDS_HSM_REQUEST hsm_request hsm_request +RMF_MDS_HSM_USER_ITEM hsm_user_item hsm_user_item +RMF_MDT_BODY mdt_body mdt_body +RMF_MDT_EPOCH mdt_ioepoch mdt_ioepoch +RMF_MDT_MD mdt_md MIN_MD_SIZE +RMF_MGS_CONFIG_BODY mgs_config_read request mgs_config_body +RMF_MGS_CONFIG_RES mgs_config_read reply mgs_config_res +RMF_MGS_SEND_PARAM mgs_send_param mgs_send_param +RMF_MGS_TARGET_INFO mgs_target_info mgs_target_info +RMF_NAME name -1 +RMF_NIOBUF_REMOTE niobuf_remote niobuf_remote +RMF_OBD_ID obd_id __u64 +RMF_OBD_IOOBJ obd_ioobj obd_ioobj +RMF_OBD_QUOTACTL obd_quotactl obd_quotactl +RMF_OBD_STATFS obd_statfs obd_statfs +RMF_OST_BODY ost_body ost_body +RMF_OST_ID ost_id ost_id +RMF_OUT_UPDATE object_update -1 +RMF_OUT_UPDATE_REPLY object_update_reply -1 +RMF_PTLRPC_BODY ptlrpc_body (aka ptlrpc_body_v3) ptlrpc_body +RMF_QUOTA_BODY quota_body quota_body +RMF_RCS niobuf_remote __u32 +RMF_REC_REINT rec_reint mdt_rec_reint +RMF_SEQ_OPC seq_query_opc __u32 +RMF_SEQ_RANGE seq_query_range lu_seq_range +RMF_SETINFO_KEY setinfo_key -1 +RMF_SETINFO_VAL setinfo_val -1 +RMF_STRING string -1 +RMF_SWAP_LAYOUTS swap_layouts mdc_swap_layouts +RMF_SYMTGT symtgt -1 +RMF_TGTUUID tgtuuid obd_uuid +RMF_U32 generic u32 u32 __u32 + diff --git a/basement/structures_list.txt b/basement/structures_list.txt new file mode 100644 index 0000000..6a73009 --- /dev/null +++ b/basement/structures_list.txt @@ -0,0 +1,1049 @@ +This set of structure specificiations includes all those directly +refereneced in the message formats and all those subsidiary structures +mentioned in them. + +acl +^^^ + +define LUSTRE_POSIX_ACL_MAX_SIZE +sizeof(posix_acl_xattr_header) + +LUSTRE_POSIX_ACL_MAX_ENTRIES * sizeof(posix_acl_xattr_entry)) + +mdt_md +^^^^^^ + +MIN_MD_SIZE (sizeof(struct lov_mds_md) + 1 * sizeof(struct lov_ost_data)) + + +close_data +^^^^^^^^^^ + +.close_data +[options="header"] +|===== +| type | field +| struct lustre_handle | cd_handle +| struct lu_fid | cd_fid +| __u64 | cd_data_version +| __u64 | cd_reserved[8] +|===== + +hsm_current_action +^^^^^^^^^^^^^^^^^^ + +.hsm_current_action +[options="header"] +|===== +| type | field +| __u32 | hca_state +| __u32 | hca_action +| struct hsm_extent | hca_location +|===== + +hsm_extent +^^^^^^^^^^ + +.hsm_extent +[options="header"] +|===== +| type | field +| __u64 | offset +| __u64 | length +|===== + +hsm_progress_kernel +^^^^^^^^^^^^^^^^^^^ + +.hsm_progress_kernel +[options="header"] +|===== +| type | field +| lustre_fid | hpk_fid +| __u64 | hpk_cookie +| struct hsm_extent | hpk_extent +| __u16 | hpk_flags +| __u16 | hpk_errval +| __u32 | hpk_padding1 +| __u64 | hpk_data_version +| __u64 | hpk_padding2 + +|===== + +hsm_request +^^^^^^^^^^^ + +.hsm_request +[options="header"] +|===== +| type | field +| __u32 | hr_action +| __u32 | hr_archive_id +| __u64 | hr_flags +| __u32 | hr_itemcount +| __u32 | hr_data_len +|===== + +hsm_state_set +^^^^^^^^^^^^^ + +.hsm_state_set +[options="header"] +|===== +| type | field +| __u32 | hss_valid +| __u32 | hss_archive_id +| __u64 | hss_setmask +| __u64 | hss_clearmask +|===== + +hsm_user_item +^^^^^^^^^^^^^ + +.hsm_user_item +[options="header"] +|===== +| type | field +| lustre_fid | hui_fid +| struct hsm_extent | hui_extent +|===== + +hsm_user_state +^^^^^^^^^^^^^^ + +.hsm_user_state +[options="header"] +|===== +| type | field +| __u32 | hus_states +| __u32 | hus_archive_id +| __u32 | hus_in_progress_state +| __u32 | hus_in_progress_action +| struct hsm_extent | hus_in_progress_location +| char | hus_extended_info[] +|===== + +idx_info +^^^^^^^^ + +.idx_info +[options="header"] +|===== +| type | field +| __u32 | ii_magic +| __u32 | ii_flags +| __u16 | ii_count +| __u16 | ii_pad0 +| __u32 | ii_attrs +| struct lu_fid | ii_fid +| __u64 | ii_version +| __u64 | ii_hash_start +| __u64 | ii_hash_end +| __u16 | ii_keysize +| __u16 | ii_recsize +| __u32 | ii_pad1 +| __u64 | ii_pad2 +| __u64 | ii_pad3 +|===== + +layout_intent +^^^^^^^^^^^^^ + +.layout_intent +[options="header"] +|===== +| type | field +| __u32 | li_opc +| __u32 | li_flags +| __u64 | li_start +| __u64 | li_end +|===== + +ldlm_gl_lquota_desc +^^^^^^^^^^^^^^^^^^^ + +.ldlm_gl_lquota_desc +[options="header"] +|===== +| type | field +| union lquota_id | gl_id +| __u64 | gl_flags +| __u64 | gl_ver +| __u64 | gl_hardlimit +| __u64 | gl_softlimit +| __u64 | gl_time +| __u64 | gl_pad2 +|===== + +ldlm_intent +^^^^^^^^^^^ + +.ldlm_intent +[options="header"] +|===== +| type | field +| __u64 | opc +|===== + +ldlm_lock_desc +^^^^^^^^^^^^^^ + +.ldlm_lock_desc +[options="header"] +|===== +| type | field +| struct ldlm_resource_desc | l_resource +| ldlm_mode_t | l_req_mode +| ldlm_mode_t | l_granted_mode +| ldlm_wire_policy_data_t | l_policy_data +|===== + +ldlm_reply +^^^^^^^^^^ + +.ldlm_reply +[options="header"] +|===== +| type | field +| __u32 | lock_flags +| __u32 | lock_padding +| struct ldlm_lock_desc | lock_desc +| struct lustre_handle | lock_handle +| __u64 | lock_policy_res1 +| __u64 | lock_policy_res2 +|===== + +ldlm_request +^^^^^^^^^^^^ + +.ldlm_request +[options="header"] +|===== +| type | field +| __u32 | lock_flags +| __u32 | lock_count +| struct ldlm_lock_desc | lock_desc +| struct lustre_handle | lock_handle\[LDLM_LOCKREQ_HANDLES\] +|===== + +ldlm_res_id +^^^^^^^^^^^ + +.ldlm_res_id +[options="header"] +|===== +| type | field +| __u64 | name[RES_NAME_SIZE]; +|===== + +ldlm_resource_desc +^^^^^^^^^^^^^^^^^^ + +.ldlm_resource_desc +[options="header"] +|===== +| type | field +| ldlm_type_t | lr_type +| __u32 | lr_padding +| struct ldlm_res_id | lr_name +|===== + +lfsck_reply +^^^^^^^^^^^ + +.lfsck_reply +[options="header"] +|===== +| type | field +| __u32 | lr_status +| __u32 | lr_padding_1 +| __u64 | lr_padding_2 +|===== + +lfsck_request +^^^^^^^^^^^^^ + +.lfsck_request +[options="header"] +|===== +| type | field +| __u32 | lr_event +| __u32 | lr_index +| __u32 | lr_flags +| __u32 | lr_valid +| union __u32 | lr_speed, lr_status, lr_type +| __u16 | lr_version +| __u16 | lr_active +| __u16 | lr_param +| __u16 | lr_async_windows +| __u32 | lr_flags2 +| struct lu_fid | lr_fid +| struct lu_fid | lr_fid2 +| struct lu_fid | lr_fid3 +| __u64 | lr_padding_1 +| __u64 | lr_padding_2 +|===== + +ll_fiemap_info_key +^^^^^^^^^^^^^^^^^^ + +.ll_fiemap_info_key +[options="header"] +|===== +| type | field +| char | name[8] +| struct obdo | oa +| struct ll_user_fiemap | fiemap +|===== + +ll_user_fiemap +^^^^^^^^^^^^^^ + +.ll_user_fiemap +[options="header"] +|===== +| type | field +| __u64 | fm_start +| __u64 | fm_length +| __u32 | fm_flags +| __u32 | fm_mapped_extents +| __u32 | fm_extent_count +| __u32 | fm_reserved +| struct ll_fiemap_extent | fm_extents[0] +|===== + +ll_fiemap_extent +^^^^^^^^^^^^^^^^ + +.ll_fiemap_extent +[options="header"] +|===== +| type | field +| __u64 | fe_logical +| __u64 | fe_physical +| __u64 | fe_length +| __u64 | fe_reserved64[2] +| __u32 | fe_flags +| __u32 | fe_device +| __u32 | fe_reserved[2] +|===== + +llog_cookie +^^^^^^^^^^^ + +.llog_cookie +[options="header"] +|===== +| type | field +| struct llog_logid | lgc_lgl +| __u32 | lgc_subsys +| __u32 | lgc_index +| __u32 | lgc_padding + +|===== + +llog_gen +^^^^^^^^ + +.llog_gen +[options="header"] +|===== +| type | field +| __u64 | mnt_cnt; +| __u64 | conn_cnt +|===== + +llog_log_hdr +^^^^^^^^^^^^ + +.llog_log_hdr +[options="header"] +|===== +| type | field +| struct llog_rec_hdr | llh_hdr +| __s64 | llh_timestamp +| __u32 | llh_count +| __u32 | llh_bitmap_offset +| __u32 | llh_size +| __u32 | llh_flags +| __u32 | llh_cat_idx +| struct obd_uuid | llh_tgtuuid +| __u32 | llh_reserved[LLOG_HEADER_SIZE/sizeof(__u32) - 23] +| __u32 | llh_bitmap[LLOG_BITMAP_BYTES/sizeof(__u32)] +| struct llog_rec_tail | llh_tail +|===== + +llog_rec_hdr +^^^^^^^^^^^^ + +.llog_rec_hdr +[options="header"] +|===== +| type | field +| __u32 | lrh_len +| __u32 | lrh_index +| __u32 | lrh_type +| __u32 | lrh_id +|===== + +llog_rec_tail +^^^^^^^^^^^^^ + +.llog_rec_tail +[options="header"] +|===== +| type | field +| __u32 | lrt_len; +| __u32 | lrt_index +|===== + +llog_logid +^^^^^^^^^^ + +.llog_logid +[options="header"] +|===== +| type | field +| struct ost_id | lgl_oi +| __u32 | lgl_ogen +|===== + +llogd_body +^^^^^^^^^^ + +.llogd_body +[options="header"] +|===== +| type | field +| struct llog_logid | lgd_logid +| __u32 | lgd_ctxt_idx +| __u32 | lgd_llh_flags +| __u32 | lgd_index +| __u32 | lgd_saved_index +| __u32 | lgd_len +| __u64 | lgd_cur_offset +|===== + +llogd_conn_body +^^^^^^^^^^^^^^^ + +.llogd_conn_body +[options="header"] +|===== +| type | field +| struct llog_gen | lgdc_gen +| struct llog_logid | lgdc_logid +| __u32 | lgdc_ctxt_idx +|===== + +lov_mds_md_v1 +^^^^^^^^^^^^^ + +.lov_mds_md_v1 +[options="header"] +|===== +| type | field +| __u32 | lmm_magic +| __u32 | lmm_pattern +| struct ost_id | lmm_oi +| __u32 | lmm_stripe_size +| __u16 | lmm_stripe_count +| __u16 | lmm_layout_gen +| struct lov_ost_data | lmm_objects[0] +|===== + +lov_ost_data +^^^^^^^^^^^^ + +.lov_ost_data +[options="header"] +|===== +| type | field +| struct ost_id | l_ost_oi +| __u32 l_ost_gen +| __u32 l_ost_idx +|===== + +lu_fid +^^^^^^ + +.lu_fid +[options="header"] +|===== +| type | field +| __u64 | f_seq +| __u32 | f_oid +| __u32 | f_ver +|===== + +lu_seq_range +^^^^^^^^^^^^ + +.lu_seq_range +[options="header"] +|===== +| type | field +| __u64 | lsr_start +| __u64 | lsr_end +| __u32 | lsr_index +| __u32 | lsr_flags +|===== + +lustre_capa +^^^^^^^^^^^ + +.lustre_capa +[options="header"] +|===== +| type | field +| struct lu_fid | lc_fid +| __u64 | lc_opc +| __u64 | lc_uid +| __u64 | lc_gid +| __u32 | lc_flags +| __u32 | lc_keyid +| __u32 | lc_timeout +| __u32 | lc_expiry +| __u8 | lc_hmac[CAPA_HMAC_MAX_LEN] +|===== + +lustre_handle +^^^^^^^^^^^^^ + +.lustre_handle +[options="header"] +|===== +| type | field +| __u64 | cookie +|===== + +mdc_swap_layouts +^^^^^^^^^^^^^^^^ + +.mdc_swap_layouts +[options="header"] +|===== +| type | field +| __u64 | msl_flags +|===== + +mdt_body +^^^^^^^^ + +.mdt_body +[options="header"] +|===== +| type | field +| struct lu_fid | mbo_fid1 +| struct lu_fid | mbo_fid2 +| struct lustre_handle | mbo_handle +| __u64 | mbo_valid +| __u64 | mbo_size +| __s64 | mbo_mtime +| __s64 | mbo_atime +| __s64 | mbo_ctime +| __u64 | mbo_blocks +| __u64 | mbo_ioepoch +| __u64 | mbo_t_state +| __u32 | mbo_fsuid +| __u32 | mbo_fsgid +| __u32 | mbo_capability +| __u32 | mbo_mode +| __u32 | mbo_uid +| __u32 | mbo_gid +| __u32 | mbo_flags +| __u32 | mbo_rdev +| __u32 | mbo_nlink +| __u32 | mbo_unused2 +| __u32 | mbo_suppgid +| __u32 | mbo_eadatasize +| __u32 | mbo_aclsize +| __u32 | mbo_max_mdsize +| __u32 | mbo_max_cookiesize +| __u32 | mbo_uid_h +| __u32 | mbo_gid_h +| __u32 | mbo_padding_5 +| __u64 | mbo_padding_6 +| __u64 | mbo_padding_7 +| __u64 | mbo_padding_8 +| __u64 | mbo_padding_9 +| __u64 | mbo_padding_10 +|===== + +mdt_ioepoch +^^^^^^^^^^^ + +.mdt_ioepoch +[options="header"] +|===== +| type | field +| struct lustre_handle | handle +| __u64 | ioepoch +| __u32 | flags +| __u32 | padding +|===== + +mdt_rec_reint +^^^^^^^^^^^^^ + +.mdt_rec_reint +[options="header"] +|===== +| type | field +| __u32 | rr_opcode +| __u32 | rr_cap +| __u32 | rr_fsuid +| __u32 | rr_fsuid_h +| __u32 | rr_fsgid +| __u32 | rr_fsgid_h +| __u32 | rr_suppgid1 +| __u32 | rr_suppgid1_h +| __u32 | rr_suppgid2 +| __u32 | rr_suppgid2_h +| struct lu_fid | rr_fid1 +| struct lu_fid | rr_fid2 +| __s64 | rr_mtime +| __s64 | rr_atime +| __s64 | rr_ctime +| __u64 | rr_size +| __u64 | rr_blocks +| __u32 | rr_bias +| __u32 | rr_mode +| __u32 | rr_flags +| __u32 | rr_flags_h +| __u32 | rr_umask +| __u32 | rr_padding_4 +|===== + +mgs_config_res +^^^^^^^^^^^^^^ + +.mgs_config_res +[options="header"] +|===== +| type | field +| __u64 | mcr_offset +| __u64 | mcr_size +|===== + +mgs_send_param +^^^^^^^^^^^^^^ + +.mgs_send_param +[options="header"] +|===== +| type | field +| char | mgs_param[MGS_PARAM_MAXLEN] +|===== + +mgs_target_info +^^^^^^^^^^^^^^^ + +.mgs_target_info +[options="header"] +|===== +| type | field +| __u32 | mti_lustre_ver +| __u32 | mti_stripe_index +| __u32 | mti_config_ver +| __u32 | mti_flags +| __u32 | mti_nid_count +| __u32 | mti_instance +| char | mti_fsname[MTI_NAME_MAXLEN] +| char | mti_svname[MTI_NAME_MAXLEN] +| char | mti_uuid[sizeof(struct obd_uuid)] +| __u64 | mti_nids[MTI_NIDS_MAX] +| char | mti_params[MTI_PARAM_MAXLEN] +|===== + +niobuf_remote +^^^^^^^^^^^^^ + +.niobuf_remote +[options="header"] +|===== +| type | field +| __u64 | rnb_offset +| __u32 | rnb_len +| __u32 | rnb_flags +|===== + +obd_connect_data +^^^^^^^^^^^^^^^^ + +.obd_connect_data +[options="header"] +|===== +| type | field +| __u64 | ocd_connect_flags +| __u32 | ocd_version +| __u32 | ocd_grant +| __u32 | ocd_index +| __u32 | ocd_brw_size +| __u64 | ocd_ibits_known +| __u8 | ocd_blocksize +| __u8 | ocd_inodespace +| __u16 | ocd_grant_extent +| __u32 | ocd_unused +| __u64 | ocd_transno +| __u32 | ocd_group +| __u32 | ocd_cksum_types +| __u32 | ocd_max_easize +| __u32 | ocd_instance +| __u64 | ocd_maxbytes +| __u64 | padding1 +| __u64 | padding2 +| __u64 | padding3 +| __u64 | padding4 +| __u64 | padding5 +| __u64 | padding6 +| __u64 | padding7 +| __u64 | padding8 +| __u64 | padding9 +| __u64 | paddingA +| __u64 | paddingB +| __u64 | paddingC +| __u64 | paddingD +| __u64 | paddingE +| __u64 | paddingF +|===== + +obd_dqblk +^^^^^^^^^ + +.obd_dqblk +[options="header"] +|===== +| type | field +| __u64 | dqb_bhardlimit +| __u64 | dqb_bsoftlimit +| __u64 | dqb_curspace +| __u64 | dqb_ihardlimit +| __u64 | dqb_isoftlimit +| __u64 | dqb_curinodes +| __u64 | dqb_btime +| __u64 | dqb_itime +| __u32 | dqb_valid +| __u32 | dqb_paddin +|===== + +obd_dqinfo +^^^^^^^^^^ + +.obd_dqinfo +[options="header"] +|===== +| type | field +| __u64 | dqi_bgrace +| __u64 | dqi_igrace +| __u32 | dqi_flags +| __u32 | dqi_valid +|===== + +obd_ioobj +^^^^^^^^^ + +.obd_ioobj +[options="header"] +|===== +| type | field +| struct ost_id | ioo_oid +| __u32 | ioo_max_brw +| __u32 | ioo_bufcnt +|===== + +obd_quotactl +^^^^^^^^^^^^ + +.obd_quotactl +[options="header"] +|===== +| type | field +| __u32 | qc_cmd +| __u32 | qc_type +| __u32 | qc_id +| __u32 | qc_stat +| struct obd_dqinfo | qc_dqinfo +| struct obd_dqblk | qc_dqblk +|===== + +obd_statfs +^^^^^^^^^^ + +.obd_statfs +[options="header"] +|===== +| type | field +| __u64 | os_type +| __u64 | os_blocks +| __u64 | os_bfree +| __u64 | os_bavail +| __u64 | os_files +| __u64 | os_ffree +| __u8 | os_fsid[40] +| __u32 | os_bsize +| __u32 | os_namelen +| __u64 | os_maxbytes +| __u32 | os_state +| __u32 | os_fprecreated + +| __u32 | os_spare2 +| __u32 | os_spare3 +| __u32 | os_spare4 +| __u32 | os_spare5 +| __u32 | os_spare6 +| __u32 | os_spare7 +| __u32 | os_spare8 +| __u32 | os_spare9 +|===== + +obd_uuid +^^^^^^^^ + +.obd_uuid +[options="header"] +|===== +| type | field +| char | uuid[UUID_MAX] +|===== + +obdo +^^^^ + +.obdo +[options="header"] +|===== +| type | field +| __u64 | o_valid +| struct ost_id | o_oi +| __u64 | o_parent_seq +| __u64 | o_size +| __s64 | o_mtime +| __s64 | o_atime +| __s64 | o_ctime +| __u64 | o_blocks +| __u64 | o_grant +| __u32 | o_blksize +| __u32 | o_mode +| __u32 | o_uid +| __u32 | o_gid +| __u32 | o_flags +| __u32 | o_nlink +| __u32 | o_parent_oid +| __u32 | o_misc +| __u64 | o_ioepoch +| __u32 | o_stripe_idx +| __u32 | o_parent_ver +| struct lustre_handle | o_handle +| struct llog_cookie | o_lcookie +| __u32 | o_uid_h +| __u32 | o_gid_h +| __u64 | o_data_version +| __u64 | o_padding_4 +| __u64 | o_padding_5 +| __u64 | o_padding_6 +|===== + +ost_body +^^^^^^^^ + +.ost_body +[options="header"] +|===== +| type | field +| struct obdo | oa +|===== + +ost_id +^^^^^^ + +.ost_id +[options="header"] +|===== +| type | field + union struct __u64 | oi_id, oi_seq +| struct lu_fid | oi_fid +|===== + +ptlrpc_body +^^^^^^^^^^^ + +Each buffer has additional structure imposed on it, and the first +buffer always has the format given by a 'ptlrpc_body' structure. + +.ptlrpc_body +[options="header"] +|===== +| type | field +| struct lustre_handle | pb_handle +| __u32 | pb_type +| __u32 | pb_version +| __u32 | pb_opc +| __u32 | pb_status +| __u64 | pb_last_xid +| __u64 | pb_last_seen +| __u64 | pb_last_committed +| __u64 | pb_transno +| __u32 | pb_flags +| __u32 | pb_op_flags +| __u32 | pb_conn_cnt +| __u32 | pb_timeout +| __u32 | pb_service_time +| __u32 | pb_limit +| __u64 | pb_slv +| __u64 | pb_pre_versions[PTLRPC_NUM_VERSIONS] +| __u64 | pb_padding[4] +| char | pb_jobid[LUSTRE_JOBID_SIZE] +|===== + +A 'struct lustre_handle' contains a single 64-bit field called 'cookie' +that ... + +The semantics of each field may be different between requst messages +and replies. + +'pb_handle' is a 64-bit value to uniquely determine shared state between +a sender and a reciever. When communication is initiated, as in a +"connect" message (eg. MDS_CONNCET, from a client to a server), the value will be +0. A reply (from the server back to the client) to this message +will contain a value (a "cookie") to identify the shared +state information (the "export") for the client that is maintained +on the server. The client will then associate this cookie with the +shared state information (the "import") that it maintains about +the server. Subsequent messages between this client and this server +will refer to the same shared state by using this cookie as the +handle in this field. + +'pb_type' is one of the three message types PTL_RPC_MSG_REQUEST, +PTL_RPC_MSG_ERR, or PTL_RPC_MSG_REPL. As one might expect, "request" +and "reply" are the two usual message types, one for initiating and +exchange and the other for completing it. The "err" message type is +only for responding to a PtlRPC message that failed to be interpeted +as an actual message. Note that other errors, such as those that +emerge from processing the actual message content, do not use the +PTL_RPC_MSG_ERR symbol. + +'pb_version' is a field that encodes the Lustre protocol version +in combination ('or'-ed) with one of the service type version. One of: + LUSTRE_OBD_VERSION + LUSTRE_MDS_VERSION + LUSTRE_OST_VERSION + LUSTRE_DLM_VERSION + LUSTRE_LOG_VERSION + LUSTRE_MGS_VERSION +What exactly is the significance of these? + +'pb_opc' gives the actual operation that is the subject of this +PtlRPC. There is a long list of such "op codes". List a few. + +'pb_status' allows for the return of a status code or error code +(eg. "permissoin denied"). This is one of the ways to return an error +(the other is if an RPC could not even be interpreted, which results +in an pb_type=RPC_MSG_ERR) given that the particular pb_opc had an +error in its processing. A value of zero signifies that the request +was successfully executed. Note that for operations that modify the +file system this indicates the operation has been initiated, not +necessarily completed (cf. pb_last_commited). The actual status values +will be consistent with standard Liunx kernel (POsIX) error codes +(eg. ENOENT). This field is always zero in requests. + +'pb_last_xid' is not used. + +'pb_last_seen' is not used. + +'pb_last_committed' is the highest transaction number that has been +commited to storage. The transaction numbers are maintained on a +per-target basis and each such sequence is a monotonically increasing +sequence. This field is only set in reply messages and can accomany +any kind of message including pings and non-modifying transactions. + +'pb_transno' is the server assigned (and is unique for each target +that server for all time) 64-bit number assigned to any file system +modifying operation for that server. It is zero for and message that +does not modify the file system. + +'pb_flags' is one among: +MSG_LAST_REPLAY +MSG_RESENT +MSG_REPLAY +MSG_DELAY_REPLAY +MSG_VERSION_REPLAY +MSG_REQ_REPLAY_DONE +MSG_LOCK_REPLAY_DONE + + +'pb_op_flags' is one among: + +MSG_CONNECT_RECOVERING 0x00000001 +MSG_CONNECT_RECONNECT 0x00000002 +MSG_CONNECT_REPLAYABLE 0x00000004 +MSG_CONNECT_LIBCLIENT 0x00000010 +MSG_CONNECT_INITIAL 0x00000020 +MSG_CONNECT_ASYNC 0x00000040 +MSG_CONNECT_NEXT_VER 0x00000080 /* use next version of lustre_msg */ +MSG_CONNECT_TRANSNO 0x00000100 /* report transno */ + + +'pb_conn_cnt' is a monotonically increasing number that identifyies to +the server the connection era for the client than was current when the +message was constructed. The era for a client is the portion of the +shared state that reflects its connection count. This count is +intialized to one at the first connection and subsequent eviction and +reconnect events will increment the count. This enables the server to +discard requests from clients whose era has expired. + +'pb_timeout' tells how long the client is willing to wait for its +specific reply message. In the reply, it signifies how long the +service is estimated to take for this type of requests (op +codes). There are multiple request queues, called "portals". The +server may send an "early reply" for express purpose of extending the +client's timeout. Such an "early reply" will still be followed by the +actual reply. + +'pb_service_time' is how long this particular operation actually took +from the time it first arrived in the request queue to the time the +server replied. Note that the client can use this value and the local +elapsed time to calculate network latency. + +'pb_limit' is a value, in a reply message, sent from a lock server to +a client to set the maximum number of locks available to the +client. When dynamic lock LRU's are enable this allows for managing +thier sizes. + +'pb_slv' is the "server lock volume" which is the product of the +number of locks and their age. It is used to estimate the lock traffic +load. In the reply it is this client's share of the total lock +load on the server. It is prescriptive. + +'pb_pre_versions[PTLRPC_NUM_VERSIONS]' has up to four entries +(PTLRPC_NUM_VERSIONS = 4). The values are sent in reply messages. Each +entry returns the previous versions of an object modified by this +operation. The version being communicated is the transaction number +(pb_transno) of the request that last modified that object. + +'pb_padding[4]' is reserved for use and must also respect the 8 byte +alignment requirement. + +'pb_jobid[LUSTRE_JOBID_SIZE]' gives a unique identifier aassociated by +the process on behalf of which this meeage was generated. The +identifier is assigned to the user process by a job scheduler, if any. + + + +quota_body +^^^^^^^^^^ + +.quota_body +[options="header"] +|===== +| type | field +| struct lu_fid | qb_fid +| union | lquota_id qb_id +| __u32 | qb_flags +| __u32 | qb_padding +| __u64 | qb_count +| __u64 | qb_usage +| __u64 | qb_slv_ver +| struct lustre_handle | qb_lockh +| struct lustre_handle | qb_glb_lockh +| __u64 | qb_padding1[4] +|===== -- 1.8.3.1