1 #LyX 1.3 created this file. For more info see http://www.lyx.org/
15 \use_numerical_citations 0
16 \paperorientation portrait
19 \paragraph_separation skip
21 \quotes_language english
25 \paperpagestyle default
29 High Level Design of Remote UID/GID Handling
38 From the ERS (Engineering Requirements Spec, formerly Architecture)
41 Perform uid/gid translation between remote clients and local user database.
44 Handling client program calling setuid/setgid/setgroups syscalls to get
48 Handling supplementary groups membership.
51 Various security policies in situations with/without strong authentication
58 remote clients may have different user database from that of MDS's.
61 The remote ACL issues is addressed by a separate module.
64 Most content of this document has been described in Lustre Book.
67 The architecture prescribes a translation mechanism at the MDS: the MDS
68 will translate a locally found uid/gid, which is obtained through the kerberos
72 Functional Specification
75 Determine local/remote clients
79 \begin_inset Quotes eld
83 \begin_inset Quotes erd
86 client is the client node which is supposed to share the same user database
92 \begin_inset Quotes eld
96 \begin_inset Quotes erd
99 client is the client node which is supposed to have different user database
104 The MDS's will be able to determine that a client node is a local or remote
105 one, upon the client's first connection time to the MDS, and reply back
106 it's decision to client.
107 Later both MDS and client will make different operation decision according
109 This remote flag is per-client, not per user.
110 Once MDS made the decision, it will keep unchanged until client leave the
111 cluster membership (umount or so).
114 MDS will do many conversion (mostly uid/gid mapping) for users on remote
115 clients because of the user database mismatch, and due to the nature of
116 this mismatch we have to put some limitation on users of remote clients,
117 compare to local clients.
118 Following sections have the details description.
121 Mapping uid/gid from clients
124 For local client, obviously we don't need do any uid/gid mapping.
125 For remote clients, we need translate uid/gid in each request into one
126 which lives in local user database; and vice versa: translate uid/gid in
127 reply into the one in remote user database.
128 This translation affects the uid/gid's found in the inode as owner/group,
129 the security context which describes under what uid the MDS is executing
130 and in some cases (chown is a good example) the arguments of calls.
133 Each MDS will have to access a uid-mapping database, which prescribed that:
134 which principal from which nid/netid should be mapped to which local uid.
135 The mapping database must be the same to every MDS to get consistent result.
136 During runtime, the a remote user authenticated with the MDS, the corresponding
137 mapping entry will be read from the on-disk database and cached in the
138 kernel via an upcall.
139 Note the same principal from different clients might be mapped to different
140 local user, according to the mapping database.
141 So on each MDS there's a per-client structure which maintained the uid
145 Each remote client must have nllu/nllg installed.
147 \begin_inset Quotes eld
150 Non Local Lustre User
151 \begin_inset Quotes erd
155 \begin_inset Quotes eld
158 Non Local Lustre Group
159 \begin_inset Quotes erd
163 When client firstly mount a lustre fileset, it should notify MDS which
164 local uid/gid act as nllu/nllg.
165 MDS will translate those unrecognized uid/gid to this before send reply
167 Thus from client's perspect of view, those files which belong to unauthorized
168 users will be shown as belonging to nllu/nllg.
171 Lustre security description (LSD)
174 There's a security configure database on each MDS, which describes who(uid)
175 from where(nid/netid) have permission to setuid/setgid/setgroups.
176 Later we might add more into it.
177 the database must be the same to every MDS to get consistent result.
180 LSD refers to the in-kernel data structure which describe an user's security
182 It roughly be defined as:
185 struct lustre_sec_desc {
200 /* more security tags added here */
206 In the future we'll add more special security tag into it.
207 Each LSD entry correspond to an user in the local user database.
208 the 'setxid_desc' must have the ability to describe setuid/setgid/setgroups
209 permission for different clients respectively.
212 LSD cache is populated via an upcall during runtime.
213 The user-level helper will be feed in uid as a parameter, and found out
214 this uid's principal gid and supplementary groups from local user database,
215 and find setxid permission bits and other security tags from on-disk security
219 Each LSD entry have limited expiration time, and will be flushed out when
221 Next request come from this user will result in the LSD be populated again,
222 with the uptodate security settings if changed.
223 System administrator also could choose to flush certain user's LSD forcely.
226 Every filesystem access request from client need go through checking of
228 This checking is uid based, for those request coming from remote client,
229 uid will be mapped at first as described above, and then go to LSD.
232 The MDS security context
235 All kernel-level service threads running on MDS are running as root, waiting
236 request from other nodes, and provide services.
237 But for those request to access filesystem for certain user, those threads
238 must act as the user, running as its identities.
239 Thus such a request comes in, we firstly collect the identity information
240 for this user as above described, include uid, gid, etc., then switch the
241 identity in the process context before really execute the filesystem operation;
242 we also need switch the root directory of process to the root of MDS's
244 after it finished, we switch back to the original context, prepare to the
248 For some request for special service like llog handling, special interaction
249 between MDSs, which don't represent any certain user, and require keeping
251 In those situation we don't need do such context switch, also user identity
255 Remote client cache flushing
258 For a remote client, it should realize that those locally cached file's
259 owner information, e.g.
260 owner, group, is ever translated by server side, some mapping might be
261 stale as time goes on.
262 for example: a user newly authenticated, while some cached file which should
263 be owned by him still shows owner is
264 \begin_inset Quotes eld
268 \begin_inset Quotes erd
272 client must choose the proper time to flush those stale owner informations,
273 to give user a consistent view.
274 All attribute locks held by clients must be given a revocation callback
275 when a new user connects.
281 Connect rpc from local realm (case 1)
287 Alice sends the first ptlrpc request (MDS_CONNECT) without GSS security
291 mds_handle() will initialize per-client structure, clear the remote flag
295 After successful connection done, the MDS send the remote flag back to client
296 for future usage in client side.
299 Connect rpc from local realm (case 2)
305 Alice from a MDS local realm sends the first ptlrpc request (MDS_CONNECT)
306 with GSS security to MDS;
309 MDS svcgssd will determine it's from a local realm client;
312 mds_handle() will initialize per-client structure, clear the remote flag
316 After successful connection done, MDS will send the remote flag back to
317 client for future usage in client side.
320 Connect rpc from remote realm
323 Alice from a MDS remote realm sends the first ptlrpc request (MDS_CONNECT)
324 with GSS security to MDS, along with its nllu/nllg id number;
327 MDS svcgssd will determine it's from a remote realm client;
330 mds_handle() logic will initialize per-client structure:
334 Set the remote flag in it;
337 Fill in the nllu/nllg ids obtained from client rpc request;
341 After successful connection done, the MDS will send the remote flag back
342 to client for future usage in client side.
345 Filesystem access request
348 Alice (from local or remote client) try to access a file in lustre
351 If Alice is from remote client, MDS do uid/gid mapping; otherwise do nothing
354 MDS obtain LSD item for Alice
357 MDS perform permission check, based on LSD policies.
360 MDS service process switch to this user's context
363 MDS finish the file operation on behave of Alice.
366 MDS service process switch back original context
369 If Alice is from remote client, MDS do uid/gid reserve mapping if needed.
375 Rpc after setuid/setgid/setgroups from local clients
378 Alice calls setuid/setgid/setgroups to change her identity to Bob in local
382 Bob (Alice in fact) tries to access a lustre file which belongs to Bob;
385 MDS will verify the permission of Bob through local cached LSD configuration;
388 MDS turns down or accept the file access request;
391 Rpc after setuid/setgid/setgroups from remote clients
394 Alice calls setuid/setgid/setgroups to change her identity to Bob in remote
398 Bob (Alice in fact) tries to access a lustre file which belongs to Bob;
401 MDS will find Bob is from the remote realm and in fact he is not real Bob;
404 MDS turns down the file access request;
407 Update LSD configuration in MDS
410 Lustre system administrator hopes to update current LSD option;
413 The sysadmin uses the lsd update utility which will update the on-disk security
414 database, and notify the changes of the LSD configuration to MDS;
417 MDS re-fresh the cached LSD info through an upcall.
423 Bob is able to access lustre filesystem
426 Sysadmin remove Bob from the MDS's local user database, and flush in-kernel
430 Bob will not be able to access MDS immediately
436 Alice of a remote client is mapped to MDS local user Bob.
439 Alice is able to access lustre filesystem
442 Sysadmin remove the mapping
443 \begin_inset Quotes eld
447 \begin_inset Quotes erd
450 from mapping database, and flush in-kernel mapping entry.
453 Alice will not be able to access MDS immediately.
457 \begin_inset Quotes eld
461 \begin_inset Quotes erd
464 exist in the mapping database, Alice could reconnect to MDS and then will
468 Revoke a remote user (2)
471 Alice of a remote client is mapped to MDS local user Bob.
474 Alice is able to access lustre filesystem
477 Sysadmin remove Bob from the MDS's local user database, and flush in-kernel
481 Alice will not be able to access MDS immediately.
485 \begin_inset Quotes eld
489 \begin_inset Quotes erd
492 exist in the mapping database, Alice could reconnect to MDS and then will
496 'ls -l' on remote client
499 Suppose on a remote client, Alice's pricinpal group is AliceGrp; Bob's principal
503 there's several files on lustre: file_1 belongs to Alice:AliceGrp; file_2
504 belongs to Alice:BobGrp; file_3 belongs to Bob:AliceGrp; file_4 belongs
505 to Bob:BobGrp; file_5 belongs to Bob:nllg;
508 Alice do 'ls -l', output like this: file_1 belongs to Alice:AliceGrp; file_2
509 belongs to Alice:nllg; file_3 belongs to nllu:AliceGrp; file_4 belongs
510 to nllu:nllg; file_5 belongs to nllu:nllg;
513 Bob just login the client system, also do a 'ls -l', output like this: file_1
514 belongs to Alice:AliceGrp; file_2 belongs to Alice:Bobgrp; file_3 belongs
515 to Bob:AliceGrp; file_4 belongs to Bob:BobGrp; file_5 belongs to Bob:nllg;
518 Alice do 'ls -l' again, output is the same as Bob's list.
521 Alice logout, then Bob do a 'ls -l' again, output like this: file_1 belongs
522 to nllu:nllg; file_2 belongs to nllu:Bobgrp; file_3 belongs to Bob:nllg;
523 file_4 belongs to Bob:BogGrp; file_5 belongs to Bob:nllg;
526 Chown on remote client
529 Root user on a remote client want to change the owner of a file to Bob,
530 while Bob didn't login(authenticated with lustre) yet.
533 MDS can't find the mapping for the destinated uid, so return error.
536 Bob login at that time.
539 Root do the same chown again.
542 MDS will grant the request, no matter what the original owner of this file
546 Chgrp on remote client
549 Triditional chgrp on remote client is not allowed, since there's no clear
550 group id mapping between local and remote database.
551 so the group id on the remote client is not meaningful on the MDS.
560 When client do mount, in addition to other parameter, user need supply with
561 the IDs of nllu/nllg on this client, which will be sent to the MDS at connectin
563 If no nllu/nllg explicitly supplied, default values will be used.
566 Determine local or remote client
569 Under GSS protection, user could explicitly supply the remote flag during
571 MDS make decision as following order:
574 All permitted connections without GSS security are from local realm clients.
577 All connections with GSS security, if user supplied remote flag during mount,
578 MDS will grant the flag as requested.
581 All connections with GSS/local_realm_kerberos are from local realm clients.
584 All connections with GSS/remote_realm_kerberos are from remote realm clients.
587 Here we made the assumption that: kerberos's local/remote realm == lustre's
589 Later we might bring in more factors into this dicision making.
592 GSS/Kerberos module is responsible to provide the information that the initial
593 connect request whether has strong security; whether from remote kerberos
597 On MDS's, the per-client export structure has a flag to indicate local/remote
599 Accordingly, each client has a similar flag, which is send back by MDS's
600 after initial connection.
603 Handle local rpc request
606 For each filesystem access request from client, we will get LSD for this
608 We then lookup in the cache, if not found or already invalid, issue a upcall
610 If finally failed to get LSD(timeout or got an error), we simply deny this
614 After obtained LSD, we also check whether the client intend to do setuid/setgid/
616 If yes, check the permission bits in LSD, if not allow we also deny this
618 The intention of setuid/setgid could be detected by compare the uid, gid,
619 fsuid, fsgid sent by client, and the local authorized uid/gid.
622 If setgroups is permitted: for root we'll directly use the supplementary
623 groups array sent by client; for normal user we compare those sent by client
624 with those in LSD, guarantee client only could reduce the array (can't
625 add new ids which is not part of group array in LSD).
628 If setgroups is not permitted, we simply use the supplementary group array
632 After all security context prepared as above, we switch it into process
633 context, perform the actual filesystem operation.
634 after finished, switch back the original context.
635 send reply out to client.
638 Later an special security policy is needed to allow RAW access by FID without
640 This is used for analyzing audit logs, finding pathnames from fids (for
644 Remote user mapping database
647 There will be a user mapping configuration file on MDS, already defined
649 \begin_inset Quotes eld
652 functional specification
653 \begin_inset Quotes erd
657 MDS kernel will also maintain a cache of this mapping information.
658 It is populated by upcall to server side gss daemon, along with the gss
659 credential information.
663 The on-disk mapping database only described how user(principal) is mapped
664 to an local uid, and don't need specify the gid mapping.
667 Both on-disk mapping database and kernel mapping cache should be able to
668 allow map all other remote users to a certain local user.
671 On the MDS, the per-client structure will maintain this mapping cache.
672 When a user from remote client get authenticated, we check the on-disk
674 If no mapping items for this user found, we'll deny this user.
675 otherwise we record the target uid.
678 When a fs access request come from remote client, it contains the user's
679 uid, gid on the remote client.
680 Here we can establish mapping for uid and target uid.
681 With target uid we can find the target gid from local user database (from
682 LSD), thus we can also establish the mapping for gid and target gid.
685 With mapping we established above, we now do the mapping: replace the uid/gid
686 in the rpc request with target uid/gid.
687 If it request chown we also check & map the new owner id.
690 When reply populated and about to send back, we again check the mapping
691 cache, and do the reverse mapping if in the case which return file attributes
693 For those can't find the matched items, map them to nllu/nllg of this remote
697 Handle remote rpc request
700 The overall process of handle remote rpc request is the same as for local
701 user, except following:
704 For incoming request, firstly do the uid/gid mapping for the requestor;
705 and do reserve mapping for the reply, as described above.
708 No setuid/setgid/setgroups intention is permitted, except we explicitly
709 allow setuid-root in setxid database.
710 And so we ignore the supplementary groups sent by client(if any), and simply
711 use the one provided by LSD.
714 For chown request, we also do translation for the new owner id (already
715 described above) according to the in-kernel mapping cache.
716 It means the root user on remote client can't change owner of a file to
717 a user which is not login yet.
720 Deny all chgrp request, since the group on remote client has no clear mapping
721 on MDS's local user database (We also could choose allow this when the
722 new group id showup in the in-kernel mapping cache, but it seems dosen't
724 So we probably need a special tool like
725 \begin_inset Quotes eld
729 \begin_inset Quotes erd
732 to perform chgrp on remote client, which will send out text name instead
733 of translate to id locally.
736 Remote client cache flushing
739 Anytime there might be inodes cached and their owner belongs to nllu/nllg.
740 If a new user Alice get authenticated and she happens to be the owner of
741 those inodes, we need to refresh those inode even if it's cache status
742 is correct, otherwise Alice will find her files belong to others.
743 Since we don't know whether a inode with nllu/nllg belongs to Alice or
744 not, we must flush all of them.
747 On MDS, a callback or similar event notification mechanism should be hooked
749 When a user authenticated at the first time, we should iterate through
750 all the granted lock corresponding to this client, and revoke them selectively.
751 Strictly speaking we only want to revoke those inodebits lock and the owner/gro
752 up of their resource (inode) not show up in the in-kernel mapping database,
753 but here we just flush all the inodebits locks, a cache is quickly re-populated
754 - there are a maximum of 20-100 cached locks on clients at the moment.
757 When Alice logs out of the client system, we also do the similar things:
758 iterate through all the granted lock corresponding to this client, and
759 revoke them selectively.
760 Here we want to revoke those inodebits locks and the owner/group of their
761 resource(inode) is Alice.
762 We also could choose flush all of them like above case.
768 There is a general upcall-cache code which do upcall into user space, and
769 cache data passed down in kernel, and also implemented timeout invalidation.
770 Kernel LSD could simply be implemented as a instance of it.
771 So it will be quite simple.
774 A user-space tools should provide following functionality:
777 Accept uid as parameter
780 Obtian gid and supplementary groups id array which the uid belongs to, if
781 failed just return error.
784 Obtian the setxid permission bits for this user on this NID from database.
785 If not found a default bitset will be applied: (1) for local client: setuid/set
786 gid is off, setgroups for root is off, setgroups for normal user is on;
787 (2) for remote client: all of setuid/setgid/setgroups is off.
790 Pass all the collected information back to kernel by /proc.
793 Since the upcall could happen concurrently, and admin could modified it
794 at anytime, so a kind of read-write lock need to be done on the database
798 Recovery consideration
801 All the code here should have minimal effect on recovery.
802 After MDS's crash, security context will be established during connection
803 time in recovery; and uid-mapping cache and LSD actually are
804 \begin_inset Quotes eld
808 \begin_inset Quotes erd
811 , they will also be re-populated when handling related user's replay request
812 during/after recovery.
821 Client has a remote flag at mount time.
824 Remote clients must have nllu:nllg installed.
825 it could simply be nobody:nobody.
828 MDS could have a remote-user mapping database which contains which principal
829 at with client should be mapped to which local user.
830 Without the database no remote client is allowed to connect.
833 MDS could have a security database which contains setxid permissions along
834 with other security setting for each affected user.
835 No such database then a default setting will be applied.
838 LSD entry states transition
841 NEW: generated and submit to upcall
844 READY: ready to serve
847 INVALID: expired or error
850 Requestor will initiate an NEW LSD entry; after upcall successfully fill
851 in data it change to READY; if timeout or some error happen (e.g.
852 not found in user database) during upcall it change to INVALID; a READY
853 LSD will change to INVALID when expired, or flushed forcely by sysadmin,
854 or MDS shutdown; an INVALID LSD will be soon destroied.
857 No disk format changed.
858 When a large number of users access lustre from all kinds of local/remote
859 clients at the same time, MDS will have more CPU and memory overhead, especiall
861 No special recovery consideration.
871 NFSv4 sends user and groups by name.
877 Could this pass HP acceptance test?
880 Any is not reasonable? Any security hole?
883 Everything recoverable from MDS/client crash?