Setattr ~~~~~~~ The 'setattr' VFS method is used to modify the attributes associated with a resource (it is an inode operation). The attributes are the same ones returned by a 'stat' operation: mode, uid, guid, size, atime, ctime, and mtime. Changing the File Mode Attribute ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If only the file 'mode' is being modified (a 'chmod' command, for instance) then the interaction is relatively simple as shown in <>. .Setattr RPCs for Changing the Resource's Mode [[chmod-rpcs]] image::chmod_rpcs.png["setattr RPCs for changing mode",height=100] ////////////////////////////////////////////////////////////////////// The chmod_rpcs.png diagram resembles this text art: Time Step Client MDT OST ------- ------- ------- 1 MDS_REINT-------> 2 <-------MDS_REINT ////////////////////////////////////////////////////////////////////// *1 - Client1 issues an MDS_REINT with the REINT_SETATTR sub-operation.* In addition to the 'ptlrpc_body' (Lustre RPC descriptor), the MDS_REINT request RPC from the client has the REINT structure 'mdt_rec_setattr', and a lock request 'ldlm_request'. For a detailed discussion of all the fields in the 'mdt_rec_setattr' and 'ldlm_request' refer to <> and <>. .MDS_REINT:REINT_SETATTR Request Packet Structure image::mds-reint-setattr-request.png["MDS_REINT:REINT_SETATTR Request Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The mds-reint-setattr-request.png diagram resembles this text art: MDS_REINT: --REINT_SETATTR-request------------------------- | ptlrpc_body | mdt_rec_setattr | ldlm_request | ------------------------------------------------ ////////////////////////////////////////////////////////////////////// See <>. In this case the 'setattr' wants to set the mode attribute on the resource. The 'mdt_rec_setattr' identifies the resource with the 'sa_fid' field, and the 'sa_valid' field is set to 0x2041: .Flags for 'sa_valid' field of 'struct mdt_rec_setattr' [options="header"] |==== | Flag | Meaning | MDS_ATTR_MODE | mode attribute | MDS_ATTR_CTIME | ctime attribute | MDS_ATTR_CTIME_SET | ctime is being set |==== So the 'ctime' is also updated on the MDT. The mode and time values are put in the corresponding fields of the 'mdt_rec_setattr', and the other attribute fields will be ignored. The 'ldlm_request' structure encompasses an early lock cancellation (see <>) on the lock that the client had previously acquired for the target resource. The lock handle identifies this lock. Only lock_count and lock_handle are used, and the rest of the ldlm_request is cleared, i.e. all fields set to zero. *2 - The MDS_REINT reply acknowledges the updated attributes.* In addition to the 'ptlrpc_body' (Lustre RPC descriptor), the MDS_REINT reply RPC to the client has the 'mdt_body' structure. For a detailed discussion of the fields in the 'mdt_body' refer to <>. .MDS_REINT:REINT_SETATTR Reply Packet Structure image::mds-reint-setattr-reply.png["MDS_REINT:REINT_SETATTR Reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The mds-reint-setattr-reply.png diagram resembles this text art: --REINT_SETATTR-reply----- | ptlrpc_body | mdt_body | -------------------------- ////////////////////////////////////////////////////////////////////// The reply from the MDT after the setattr operation has these valid 'mdt_body' fields: .Flags for 'mbo_valid' field of 'struct mdt_body' [options="header"] |==== | Flag | Meaning | OBD_MD_FLID | FID | OBD_MD_FLMTIME | mtime attribute | OBD_MD_FLSIZE | size attribute | OBD_MD_FLBLOCKS | blocks attribute | OBD_MD_BLKSZ | block size attribute | OBD_MD_FLTYPE | type attribute | OBD_MD_FLNLINK | number of links attribute | OBD_MD_FLRDEV | device attribute |==== So the client is updated with any other information the MDT has after the attributes were set at the client's request. Changing the File Time Attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The RPC(s) that get sent for the 'setattr' depend on specifically what values are being set. If the time values are being set (as in a "touch" command) then there are RPCs in addition to the MDS_REINT, with the REINT_SETATTR sub-operation, that update the time values on the MDT. That operation is followed by an OST_SETATTR that sets the time values on the OST (or OSTs if there are several). But in order to know what OSTs to contact the client must first get the layout of the resource. Then it can send the OST_SETATTR RPC to the appropriate OSTs and update the time attributes. .Setattr RPCs for Changing the Resource's Time Attributes [[touch-rpcs]] image::touch_rpcs.png["setattr RPCs for the time attributes",height=200] ////////////////////////////////////////////////////////////////////// The touch_rpcs.png diagram resembles this text art: Time Step Client MDT OST ------- ------- ------- 1 MDS_REINT-------> 2 <-------MDS_REINT 3 LDLM_ENQUEUE----> 4 <-------LDLM_ENQUEUE 5 OST_SETATTR------------------> 6 <--------------------OST_SETATTR ////////////////////////////////////////////////////////////////////// *1 - The client issues an MDS_REINT with the REINT_SETATTR sub-operation.* See <>. The MDS_REINT request RPC closely resembles the one described above, but in this case the 'setattr' wants to set the time attributes on the resource. The 'mdt_rec_setattr' again identifies the resource with the 'sa_fid' field, and the 'sa_valid' field is set to 0x21f0: .Flags for 'sa_valid' field of 'struct mdt_rec_setattr' [options="header"] |==== | Flag | Meaning | MDS_ATTR_ATIME | atime attribute | MDS_ATTR_MTIME | mtime attribute | MDS_ATTR_CTIME | ctime attribute | MDS_ATTR_ATIME_SET | atime is being set | MDS_ATTR_MTIME_SET | mtime is being set | MDS_ATTR_CTIME_SET | ctime is being set |==== The time values are put in the corresponding fields of the 'mdt_rec_setattr', and the other attribute fields will be ignored. There is again an early lock cancellation, since the client knows it no longer need to have a lock on the MDT resource attributes. *2 - The MDS_REINT reply acknowledges the updated times.* The MDS_REINT reply is identical to the previous case in every way, including which valid attributes it echoes back. *3 - The client asks for a intent lock on the layout data for the resource.* Before communicating with the OSTs the client needs to know which ones are involved with this resource, and before it can ask for that 'layout' information it must acquire a 'layout lock'. The LDLM_ENQUEUE RPC in this case has (in addition to the 'ptlrpc_body' structure) an 'ldlm_request', an 'ldlm_intent', and a 'layout_intent'. .LDLM_ENQUEUE Intent:Layout Request Packet Structure image::ldlm-enqueue-intent-layout-request.png["LDLM_ENQUEUE Intent:Layout request Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-enqueue-intent-layout-request.png diagram resembles this text art: LDLM_ENQUEUE: --intent:layout request------------------------------------ | ptlrpc_body | ldlm_request |ldlm_intent | layout_intent | ----------------------------------------------------------- ////////////////////////////////////////////////////////////////////// See <>. The 'ldlm_request' asks for a read lock on the resource and has its intent flag set. The 'ldlm_intent' has the intent opcode is 0x800: IT_LAYOUT. The 'layout_intent' has the 'li_opc' value 0: LAYOUT_INTENT_ACCESS. *4 - The MDS replies with a read lock on the layout.* The LDLM_ENQUEUE reply that the MDS sends back grants the read lock on the layout and provides a Lock Value Block (LVB) describing the layout of the resource. That layout is from the extended attribute 'trusted.lov' and has the structure 'lov_mds_md'. .LDLM_ENQUEUE Intent:Layout Reply Packet Structure image::ldlm-enqueue-intent-layout-reply.png["LDLM_ENQUEUE Intent:Layout reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-enqueue-intent-layout-reply.png diagram resembles this text art: LDLM_ENQUEUE: --intent:layout reply-------------------- | ptlrpc_body | ldlm_reply | lov_mds_md | ----------------------------------------- ////////////////////////////////////////////////////////////////////// *5 - The client issues an OST_SETATTR with the updated times, which are maintained on the OST.* At last the client can send an update to the OST. The OST_SETATTR RPC has an 'ost_body' structure. .OST_SETATTR Request Packet Structure image::ost-setattr-request.png["OST_SETATTR Request Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ost-setattr-request.png diagram resembles this text art: OST_SETATTR: --request----------------- | ptlrpc_body | ost_body | -------------------------- ////////////////////////////////////////////////////////////////////// See <>. The 'ost_body' structure is documented in the <> section. In this case the 'o_valid' field is 0x300400f, so the valid fields are given by: .Flags for 'o_valid' field of 'struct os_body' [options="header"] |==== | Flag | Meaning | OBD_MD_FLID | FID | OBD_MD_FLATIME | atime attribute | OBD_MD_FLMTIME | mtime attribute | OBD_MD_FLCTIME | ctime attribute | OBD_MD_FLGENER | generation | OBD_MD_FLGROUP | group | OBD_MD_FLFID | |==== *6 - The OST acknowledges the update.* The reply RPC for the OST_SETATTR operation has the same form as the request. .OST_SETATTR Reply Packet Structure image::ost-setattr-reply.png["OST_SETATTR Reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ost-setattr-reply.png diagram resembles this text art: OST_SETATTR: --reply------------------- | ptlrpc_body | ost_body | -------------------------- ////////////////////////////////////////////////////////////////////// The OST_SETATTR reply acknowledges the update and sends back an 'o_valid' of 0x10007bf, which indicates the fields: .Flags for 'o_valid' field of 'struct os_body' [options="header"] |==== | Flag | Meaning | OBD_MD_FLID | OST ID | OBD_MD_FLATIME | atime attribute | OBD_MD_FLMTIME | mtime attribute | OBD_MD_FLCTIME | ctime attribute | OBD_MD_FLSIZE | size attribute | OBD_MD_FLBLOCKS | blocks attribute | OBD_MD_FLMODE | mode attribute | OBD_MD_FLTYPE | type attribute | OBD_MD_FLUID | UID attribute | OBD_MD_FLGID | GID attribute | OBD_MD_FLGROUP | group |==== Changing the Size Attribute ^^^^^^^^^^^^^^^^^^^^^^^^^^^ If the size is being set (as in a "truncate" command) then the client (Client1) will issue an LDLM_ENQUEUE to the OST for a write lock on the extent attributes of the resource. If another client (Client2) had a lock on the resource, then before the OST can grant the lock to Client1 it has to interact with Client2. The OST sends an LDLM_BL_CALLBACK request to Client2 asking Client 2 to finish up with the lock it has. Client2 replies with a simple acknowledgment. When Client2 is no longer using the lock it will send an LDLM_CANCEL RPC to the OST. At that point the OST grants the original request sending an LDLM_CP_CALLBACK request to Client1 to notify it. With that taken care of Client1 is finally able to issue the OST_PUNCH request that actually modifies the size attribute of the affected resources. Meanwhile, the OST also replies to Client2 acknowledging its LDLM_CANCEL. .Setattr RPCs for Changing the Resource's Size Attribute [[truncate-rpcs]] image::truncate_rpcs.png["setattr RPCs for the size attribute",height=250] ////////////////////////////////////////////////////////////////////// The truncate_rpcs.png diagram resembles this text art: Time Step Client1 MDT OST Client2 ------- ------- ------- ------- 1 MDS_REINT-------> 2 <-------MDS_REINT 3 LDLM_ENQUEUE-----------------> 4 LDLM_BL_CALLBACK----> 5 <----LDLM_BL_CALLBACK 6 <-----------------LDLM_ENQUEUE 7 <--------LDLM_CANCEL 8 <-----------LDLM_CP_CALLBACK 9 LDLM_CP_CALLBACK-----------> 10 OST_PUNCH--------------------> 11 LDLM_CANCEL------> 12 <--------------------OST_PUNCH ////////////////////////////////////////////////////////////////////// *1 - The client issues an MDS_REINT with the REINT_SETATTR sub-operation.* See <>. The MDS_REINT request RPC closely resembles the one described above, but in this case the 'setattr' wants to modify the size attribute on the resource. The 'mdt_rec_setattr' again identifies the resource with the 'sa_fid' field, and the 'sa_valid' field is set to 0x2002168: .Flags for 'sa_valid' field of 'struct mdt_rec_setattr' [options="header"] |==== | Flag | Meaning | MDS_ATTR_SIZE | size attribute | MDS_ATTR_MTIME | mtime attribute | MDS_ATTR_CTIME | ctime attribute | MDS_ATTR_MTIME_SET | mtime being set | MDS_ATTR_CTIME_SET | ctime being set |==== The size and time values are put in the corresponding fields of the 'mdt_rec_setattr', and the other attribute fields will be ignored. There is again an 'ldlm_request' structure in the RPC, but in this case it is empty (all fields set to zero), so no early lock cancellation. *2 - The MDS_REINT reply acknowledges the updated times.* The MDS_REINT reply is identical to the previous cases in every way, including which valid attributes it echoes back. *3 - The client asks the OST for a write lock of type LDLM_EXTENT.* See <>. The 'ldlm_request' asks for a write lock with the lock descriptor resource's type set to LDLM_EXTENT, the policy data covering the whole file, and the lock handle set to identify this request. The rest of the lock request is blank (zeroes). The RPC resembles the simplest request form in <>. *4 - The OST contacts Client2 to ask for the return of the lock.* The LDLM_BL_CALLBACK is initiated by the OST and sent to the client, identifying the resource in question. The content of the ldlm_request is otherwise identical to the one sent from Client1 to the OST ('l_req_mode' == LCK_PW, 'l_granted_mode' == LCK_MINMODE). .LDLM_BL_CALLBACK Request Packet Structure image::ldlm-bl-callback-request.png["LDLM_BL_CALLBACK Request Packet Structure", height=50] ////////////////////////////////////////////////////////////////////// The ldlm-bl-callback-request.png diagram resembles this text art: LDLM_BL_CALLBACK: --request--------------------- | ptlrpc_body | ldlm_request | ------------------------------ ////////////////////////////////////////////////////////////////////// See <>. *5 - Client2 acknowledges the request and returns the lock.* The LDLM_BL_CALLBACK is an "empty" RPC in that it only has the LDLM_BL_CALLBACK opcode and no other content beyond the 'ptlrpc_body'. .LDLM_BL_CALLBACK Reply Packet Structure image::ldlm-bl-callback-reply.png["LDLM_BL_CALLBACK Reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-bl-callback-reply.png diagram resembles this text art: LDLM_BL_CALLBACK: --reply-------- | ptlrpc_body | --------------- ////////////////////////////////////////////////////////////////////// Its effect is to notify the OST that the lock has been returned. *6 - The OST replies acknowledging the lock request.* The ldlm_reply's lock descriptor acknowledges the request for an extent write lock without granting it ('l_req_mode' == LCK_PW, 'l_granted_mode' == LCK_MINMODE, 'lock_flags' == 0x2 == LDLM_FL_BLOCK_GRANTED, it is not granted because it is blocked). Additional attribute data accompanies the LDLM_ENQUEUE reply to tell the client about the resource attributes on the OST. .LDLM_ENQUEUE Extent LVB Reply Packet Structure image::ldlm-enqueue-extent-lvb-reply.png["LDLM_ENQUEUE Extent LVB reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-enqueue-intent-lvb-reply.png diagram resembles this text art: LDLM_ENQUEUE: --extent lvb reply-------------------- | ptlrpc_body | ldlm_reply | ost_lvb | -------------------------------------- ////////////////////////////////////////////////////////////////////// *7 - Client2 cancels its lock* Having received an LDLM_BL_CALLBACK Client2 must finish up with its lock. Once it does it sends an LDLM_CANCEL request to the OST to signal that it is done. .LDLM_CANCEL Request Packet Structure image::ldlm-cancel-request.png["LDLM_CANCEL Request Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-cancel-request.png diagram resembles this text art: LDLM_CANCEL: --request--------------------- | ptlrpc_body | ldlm_request | ------------------------------ ////////////////////////////////////////////////////////////////////// See <>. The 'ldlm_request' indicates which lock is being canceled in its (first) 'lock_handle' field. The OST then looks for anyone else waiting on that lock, which it finds is Client1. It waits to reply to Client2 with an LDLM_CANCEL reply until after it has notified Client1. *8 - The OST notifies Client1 that it now has the lock.* The 'ldlm_request' structure now has the granted mode set to protected write. It also sends along any updated attributes as, for example, if Client1 had flushed its dirty write cache. .LDLM_CP_CALLBACK Request Packet Structure image::ldlm-cp-callback-request.png["LDLM_CP_CALLBACK Request Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-cp-callback-request.png diagram resembles this text art: LDLM_CP_CALLBACK: --request------------------------------- | ptlrpc_body | ldlm_request | ost_lvb | ---------------------------------------- ////////////////////////////////////////////////////////////////////// See <>. *9 - Client1 acknowledges the lock update.* The reply is "empty" in this case as well. The opcode in the 'ptlrpc_body' is sufficient to inform the OST that Client1 got its lock update. .LDLM_CP_CALLBACK Reply Packet Structure image::ldlm-cp-callback-reply.png["LDLM_CP_CALLBACK Reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-cp-callback-reply.png diagram resembles this text art: LDLM_CP_CALLBACK: --reply-------- | ptlrpc_body | --------------- ////////////////////////////////////////////////////////////////////// *10 - Client1 issues an OST_PUNCH request.* As with the OST_SETATTR RPC there is an 'ost_body' structure. .OST_PUNCH Request Packet Structure image::ost-punch-request.png["OST_PUNCH Request Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ost-punch-request.png diagram resembles this text art: OST_PUNCH: --request----------------- | ptlrpc_body | ost_body | -------------------------- ////////////////////////////////////////////////////////////////////// See <>. In this case the 'o_valid' field is 0x30403d: .Flags for 'o_valid' field of 'struct os_body' [options="header"] |==== | Flag | Meaning | OBD_MD_FLID | OST ID | OBD_MD_FLMTIME | mtime attribute | OBD_MD_FLCTIME | ctime attribute | OBD_MD_FLSIZE | size attribute | OBD_MD_FLBLOCKS | blocks attribute | OBD_MD_FLGENER | generation | OBD_MD_FLCKSUM | checksukm | OBD_MD_FLQOS | quality of service |==== *11 - The OST acknowledges the LDLM_CANCEL (step 7) from Client2* The OST finishes up with the lock cancel (after having notified Client1) by replying to Client2. This happens asynchronously with the arrival of the OST_PUNCH request, and in <> it is shown occurring after the OST_PUNCH, but that is not required. .LDLM_CANCEL Reply Packet Structure image::ldlm-cancel-reply.png["LDLM_CANCEL Reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ldlm-cancel-reply.png diagram resembles this text art: LDLM_CANCEL: --reply-------- | ptlrpc_body | --------------- ////////////////////////////////////////////////////////////////////// The LDLM_CANCEL reply is a so-called "empty" RPC. Its only purpose is to acknowledge receipt of the LDLM_CANCEL request. *12 - The OST an OST_PUNCH reply.* The OST_PUNCH reply also resembles the OST_SETATTR reply: .OST_PUNCH Reply Packet Structure image::ost-punch-reply.png["OST_PUNCH Reply Packet Structure",height=50] ////////////////////////////////////////////////////////////////////// The ost-punch-reply.png diagram resembles this text art: OST_PUNCH: --reply------------------- | ptlrpc_body | ost_body | -------------------------- ////////////////////////////////////////////////////////////////////// The 'o_valid' field is 0x1, so only the 'o_id' field is interpreted. It just acknowledges the requested change has been made.