From 0a724b38c95cbb5b57c481dec1ce8f0525f4eb22 Mon Sep 17 00:00:00 2001 From: adilger Date: Mon, 4 Jul 2005 07:47:03 +0000 Subject: [PATCH] Branch b1_4 Remove quota-HLD.lyx from b1_4 to avoid version skew from HEAD. --- lustre/doc/quota_hld.lyx | 1231 ---------------------------------------------- 1 file changed, 1231 deletions(-) delete mode 100644 lustre/doc/quota_hld.lyx diff --git a/lustre/doc/quota_hld.lyx b/lustre/doc/quota_hld.lyx deleted file mode 100644 index 86d1096..0000000 --- a/lustre/doc/quota_hld.lyx +++ /dev/null @@ -1,1231 +0,0 @@ -#LyX 1.3 created this file. For more info see http://www.lyx.org/ -\lyxformat 221 -\textclass article -\language english -\inputencoding auto -\fontscheme times -\graphics default -\paperfontsize default -\spacing single -\papersize Default -\paperpackage a4 -\use_geometry 0 -\use_amsmath 0 -\use_natbib 0 -\use_numerical_citations 0 -\paperorientation portrait -\secnumdepth 3 -\tocdepth 3 -\paragraph_separation skip -\defskip medskip -\quotes_language english -\quotes_times 2 -\papercolumns 1 -\papersides 1 -\paperpagestyle default - -\layout Title - -Quota For Lustre -\layout Section - -From Engineering Requirements Specification -\layout Enumerate - -Lustre can operate and enforce disk block quota and file quota. -\layout Enumerate - -Hard and soft quota are supported -\layout Enumerate - -Central management tools enable setting limits for users and initializing - quota check operations -\layout Enumerate - -Quota are only needed for Linux 2.6 -\layout Section - -Specification of subsystems -\layout Description - -Definition: An -\emph on -operational quota file -\emph default - is a quota database containing limits for some uid's and gid's which is - being used to enforce quota. - An -\emph on -administrative quota file -\emph default - is a similar database, but it is used for recovery and soft quota or administra -tive purposes. -\layout Subsection - -Master & slaves -\layout Standard - -A node is a master -\series bold -for a uid or gid -\series default -if the node holds the cluster wide limits (hard, soft, files, blocks & gracetime -s) for that uid or gid in an administrative quota file. - The administrative quota file is similar to normal ext3 quota file. - The data structures and code for an administrative quota file API will - be copied from the Linux VFS to ldiskfs and amended. - Slave nodes (all other servers) only consider hard quota and only have - operational quota files. -\layout Standard - -Note that a node may be a master for some uid's, gid's and a slave for others. - Masters also have an operational quota file for enforcing hard quota . - Master -\series bold -observe soft limits in the administrative file, based on grace times -\series default -. -\layout Subsection - -Acquire / release protocol -\layout Standard - -The master administrative quota file has two kinds of limits: total limits - and limit acquired by all servers (administrative usage). - Totoal limits are set by user, administrative usage is initialized to zero - and it's amended when master/slaves acquire or release quota. -\layout Standard - -Quota slaves can acquire from the master and release to the master qunits - of disk space (>100MB typically, see ERS). - Slaves do this to increase / lower their hard limits of operational file. - Upon acquiring quota from a master the master's administrative usage are - increased. - Master can acqurie/release qunits, just like slaves, except that it is - done locally. -\layout Standard - -On the master only, soft limits are enforced in obd layer based on the administr -ative quota file. - Once administrative usage >= administrative soft limit, the timer is activated. -\layout Subsection - -Chown Operations -\layout Standard - -All objects associated with a file will have their owners set to that of - the MDS inode. - These chown operations occur in connection with file creation and chowning - on the MDS and are asynchronous. - There will also be enough space in the records to set an EA on the objects - indicating the originating MDS, fileset and storage id of the inode. - The arguments will contain the following - but the final format of the - packet sent is subject to approval by management (it may be larger): -\layout LyX-Code - -struct object_setattr_args { -\layout LyX-Code - - __u64 osa_mds_id; /* to identify MDS */ -\layout LyX-Code - - __u64 osa_fileset_id; /* part of the fid, tbd */ -\layout LyX-Code - - __u64 osa_ino; /* inode number on mds */ -\layout LyX-Code - - __u64 osa_gen; /* inode generation on mds */ -\layout LyX-Code - - __u32 osa_uid; /* owner of the file */ -\layout LyX-Code - - __u32 osa_gid; /* group of the file */ -\layout LyX-Code - - __u64 osa_mds_transno;/* for recovery of mds rollback */ -\layout LyX-Code - - __u64 osa_mds_last_committed; -\layout LyX-Code - - __u32 osa_mds_prev_uid; /* to undo things that didn't complete on - the MDS */ -\layout LyX-Code - - __u32 osa_mds_prev_gid; -\layout LyX-Code - -} -\layout Subsection - -Recovery -\layout Standard - -A recovery protocol for limits involves -\layout Description - -Master\SpecialChar ~ -recovery re-writing the operational limits on the master node, based - on the cluster-wide limits as found in the administrative quota file -\layout Description - -Slave\SpecialChar ~ -recovery completing aborted release operations on slaves. - -\layout Standard - -Chown operations for objects will use llog recovery on the MDS (as it is - used for unlinks). - -\layout Standard - -MDS chown operations that are lost are not recovered at this point - but - arguments to do so in the future are passed as above. - The recovery from this is fairly simple: the OST writes log operations - for each chown operation containing the MDS transaction number and undo - information. - The MDS reports last committed transactions to the OST. - During normal use these lead to cancellations of records leading up to - that transaction. - During recovery, all llog records following the record containing the transacti -on number will be used to undo the OST chown/chgrp operations. -\layout Standard - -For new files, removal of objects does already take place. -\layout Subsection - -Configuration -\layout Standard - -A configuration protocol will initiate quota check operations, turn quota - on, and set limits. - All commands will be issued through lfs. -\layout Subsection - -Disk fs handling -\layout Standard - -Disk file systems track quota usage. - An interface between OSS and MDS and disk file systems will enable a check - and adjustment of disk file system quota limits before operations proceed. - Every node will try to acquire quota before proceeding. - Every node will release quota after finishing. - Acquire and release calls are tuned to anticipate use. - Disk fs quota check handling will be possibly on busy file systems. -\layout Section - -Use cases -\layout Standard - -Each use case is an interaction between a -\begin_inset Quotes eld -\end_inset - -user -\begin_inset Quotes erd -\end_inset - - and -\begin_inset Quotes eld -\end_inset - -system -\begin_inset Quotes erd -\end_inset - -. - For each use case we describe what subsystem forms the -\begin_inset Quotes eld -\end_inset - -user -\begin_inset Quotes erd -\end_inset - - and the -\begin_inset Quotes eld -\end_inset - -system -\begin_inset Quotes erd -\end_inset - -. - Use the logical components indicated in sections 3.1-3.4 below to describe - the use cases. - The purpose is to check that each of the use cases at a high level appears - to execute successfully by using the components listed under 3.1-3.4. - In some of the scenarios (e.g. - 3.2 multiple use scenarios should be described, e.g. - how is the slave-master protocol involved and how is the client - oss protocol - involved). - -\layout Subsection - -Initialization operation -\layout Subsubsection - -Changing owners -\layout Standard - -The following operations are done on a client: -\layout List -\labelwidthstring 00.00.0000 - -Administrator get root priviliges on the file system -\layout List -\labelwidthstring 00.00.0000 - -Administrator run `find -type f | xargs lchog` -\begin_deeper -\layout Enumerate - - is mount point -\layout Enumerate - - -\emph on -lchog -\emph default - is a small utility to do chown/chgrp, its usage: -\begin_deeper -\layout Standard - - -\emph on -lchog [-i] FILE... - -\emph default - -\layout Description - - -\emph on --i -\emph default - ignore ENOENT error -\end_deeper -\end_deeper -\layout List -\labelwidthstring 00.00.0000 - -System -\emph on -lchog -\emph default - will abort if change failed, and then report error, indicating what was - searched etc. - Generally user cannot ignore the error, and should fix it and redo the - above before the next operation, except that user can set -\emph on --i -\emph default - option for -\emph on -lchog -\emph default - to ignore ENOENT error. - -\layout Subsubsection - -Mounting existing file systems with quota support -\layout List -\labelwidthstring 00.00.0000 - -Administrator file systems on all server nodes should be mounted with quota - support, this can be done by running -\emph on - lconf -\emph default - on all nodes: -\emph on -lconf --mountfsoptions quota ..., -\emph default - if the file system has already been mounted, it should be umounted first. -\layout List -\labelwidthstring 00.00.0000 - -System all needed modules are loaded, and file systems are mounted with - quota support. -\layout List -\labelwidthstring 00.00.0000 - -Administrator run `lfs quotacheck`, it will initiate quota check on all - MDS' and OSTs one by one. - -\layout List -\labelwidthstring 00.00.0000 - -System on each node ``quotacheck'' will walk through the diskfs. - When the check finishes, it will report the check status to the initiator. - If it failed, the error is listed. - -\layout List -\labelwidthstring 00.00.0000 - -Administrator user should fix the errors and recheck the specified nodes - before preceeding to the next step. - -\layout List -\labelwidthstring 00.00.0000 - -Administrator run `lfs quotaon`, it will initiate quotaon on all MDS' and - OSTs one by one. -\layout List -\labelwidthstring 00.00.0000 - -System each node will start to check/handle quota. - The status will be reported back to the initiator. -\layout List -\labelwidthstring 00.00.0000 - -Administrator user should fix the errors if there are. -\layout List -\labelwidthstring 00.00.0000 - -Administrator run `lfs setquota`, it will set limits on the corresponding - MDS master for the specified uid/gid. -\layout List -\labelwidthstring 00.00.0000 - -System if it's the first time to set limits, master will initialize quota - on all slaves, otherwise only modify the quota of itself. - Moreover, the limit info is saved in recovery quota file on master. - -\series bold - -\series default -The status will be reported to initiator. -\layout List -\labelwidthstring 00.00.0000 - -Administrator if some nodes failed, generally user should not ignore the - errors. -\layout Subsubsection - -a new file system to a state where it is using quota -\layout Standard - -Like above, but only need three steps: `lfs quotacheck`, `lfs quotaon` and - `lfs setquota`. -\layout Subsection - -Normal use block quota -\layout Standard - -Demonstrate how quota are acquired and released during normal use through - sequences of the API's and network calls defined in this document. -\layout Standard - - -\series bold -DESCRIBE CASES WHERE -\layout Enumerate - -A USER DOES THIS OR THAT: WHAT are the system responses -\layout Enumerate - -The client does this or that: what are the OSS & MDS responses -\layout Enumerate - -The OST does this or that, what are the obdfilter / diskfs reponses -\layout Subsubsection - -Acquire quota -\layout List -\labelwidthstring 00.00.0000 - -User issues file write operation. -\layout List -\labelwidthstring 00.00.0000 - -System performs write successfully and returns the written bytes. -\newline - -\layout List -\labelwidthstring 00.00.0000 - -Client makes IO requests to OSS. - -\layout List -\labelwidthstring 00.00.0000 - -OSS acquires qunit if needed. -\layout List -\labelwidthstring 00.00.0000 - -Master increase usage in adminstrative file then reply to OSS with granted - qunit. -\layout List -\labelwidthstring 00.00.0000 - -OSS updates local operational quota file, performs write operation and replies - client the ~noquota flag. -\newline - -\layout List -\labelwidthstring 00.00.0000 - -OST calls obd_commitrw to commit write. -\layout List -\labelwidthstring 00.00.0000 - -Obdfilter if not enough qunit, acquire qunit by dqacq rpc from master, updates - local operational quota file after dqacq reply, then performs normal direct - write. - -\layout Subsubsection - - -\begin_inset LatexCommand \label{release-quota} - -\end_inset - -Release quota -\layout List -\labelwidthstring 00.00.0000 - -User issues truncate or unlink operation. -\layout List -\labelwidthstring 00.00.0000 - -System performs the truncate/unlink operation and returns error code. -\newline - -\layout List -\labelwidthstring 00.00.0000 - -Client makes OST_PUNCH or OST_DESTROY requests to OSS. -\layout List -\labelwidthstring 00.00.0000 - -OSS performs truncate/unlink on objects. - release qunit to Master if needed. -\layout List -\labelwidthstring 00.00.0000 - -Master decrease usage in administrative file and reply to OSS. -\layout List -\labelwidthstring 00.00.0000 - -OSS updates local operational quota file. -\newline - -\layout List -\labelwidthstring 00.00.0000 - -OST calls obd_destroy/obd_punch. -\layout List -\labelwidthstring 00.00.0000 - -Obdfilter performs unlink/truncate on objects, if there is qunit to be released, - release qunit by dqrel rpc to master then updates local operational quota - file. - -\layout Subsection - -Running out of block quota -\layout List -\labelwidthstring 00.00.0000 - -User issues file write operation. -\layout List -\labelwidthstring 00.00.0000 - -System write fails and return EDQUOT. - (but the pages in cache will be written successfully) -\newline - -\layout List -\labelwidthstring 00.00.0000 - -Client makes IO requests to OSS. -\layout List -\labelwidthstring 00.00.0000 - -OSS acquires qunit from master. -\layout List -\labelwidthstring 00.00.0000 - -Master reply noquota to OSS. -\layout List -\labelwidthstring 00.00.0000 - -OSS fs write fails, rewrites pages from client cache forcibly, replies client - the noquota flag and error code. -\newline - -\layout List -\labelwidthstring 00.00.0000 - -OST calls obd_commitrw to commit write. -\layout List -\labelwidthstring 00.00.0000 - -Obdfilter acquiring qunit fails, then performs normal direct write and fails, - and then rewrites the pages from client cache, returns error code and noquota - flag to OST. - -\layout Subsection - -Freeing space to get under quota -\layout Standard - -The release steps are the same as those in -\begin_inset LatexCommand \ref{release-quota} - -\end_inset - -3.2.2. -\layout List -\labelwidthstring 00.00.0000 - -User issues file write operation. -\layout List -\labelwidthstring 00.00.0000 - -Client makes synchronous write rpc to OSS if there is noquota flag. -\layout List -\labelwidthstring 00.00.0000 - -OSS performs fs write successfully, return client ~noquota flag. -\layout List -\labelwidthstring 00.00.0000 - -Client clears noquota flag for this uid/gid. - -\layout Subsection - -Enforcing soft quota -\layout Subsubsection - -Start soft quota timer -\layout List -\labelwidthstring 00.00.0000 - -User issues file write/create operations. - -\layout List -\labelwidthstring 00.00.0000 - -System returns successfully. -\newline - -\layout List -\labelwidthstring 00.00.0000 - -Client makes file write/create requests to OSS/MDS. -\layout List -\labelwidthstring 00.00.0000 - -OSS/MDS sends dqacq rpcs to get more quota from master. -\layout List -\labelwidthstring 00.00.0000 - -Master starts the timer once administrative usage >= administrative soft - limit and grants qunit to OSS/MDS. -\layout List -\labelwidthstring 00.00.0000 - -OSS/MDS write/create succeeds. -\layout Subsubsection - -Soft quota timer goes off -\layout List -\labelwidthstring 00.00.0000 - -User issues file write/create operations. - -\layout List -\labelwidthstring 00.00.0000 - -System returns EDQUOT. - -\newline - -\layout List -\labelwidthstring 00.00.0000 - -Client makes file write/create requests to OSS/MDS. -\layout List -\labelwidthstring 00.00.0000 - -OSS/MDS sends dqacq rpcs to get more quota from master. -\layout List -\labelwidthstring 00.00.0000 - -Master returns noquota to OSS/MDS. -\layout List -\labelwidthstring 00.00.0000 - -OSS/MDS write/create fails and returns error code to Client. -\layout Subsubsection - -Stop soft quota timer -\layout Standard - -The release steps are the same as those in -\begin_inset LatexCommand \ref{release-quota} - -\end_inset - -3.2.2. - -\layout List -\labelwidthstring 00.00.0000 - -Slave calls dqrel rpc to release extra quota. -\layout List -\labelwidthstring 00.00.0000 - -Master stops the timer once administrative usage < administrative soft limit. - -\layout Subsection - -File quota on the MDS -\layout Standard - -For CMD, it is similiar to block quota described above. - For b1_4, it is completely managed by MDS locally. - -\layout Subsection - -Listing quota -\layout List -\labelwidthstring 00.00.0000 - -User runs 'lfs quota', it will make an rpc to the corresponding MDS master - for the specified uid/gid. -\layout List -\labelwidthstring 00.00.0000 - -System displays usage & limits related to quota for the uid/gid on all nodes - in the cluster. - if some nodes failed, reports the error to user. - -\layout List -\labelwidthstring 00.00.0000 - -User generally can ignore the errors. - -\layout Subsection - -Recovery of quota -\layout Standard - - -\series bold -just describe interaction initiator - response, no internals -\layout Subsubsection - -Slave recovery -\layout List -\labelwidthstring 00.00.0000 - -Slave releases unreasonably high limits to master. -\layout List -\labelwidthstring 00.00.0000 - -Master updates adminstrative quota file and reply to slave. -\layout List -\labelwidthstring 00.00.0000 - -Slave updates local operational quota file. - -\layout Subsubsection - -Master recovery -\layout List -\labelwidthstring 00.00.0000 - -Master enquires all slaves' operational limits by issuing a new RPC. - -\series bold - -\layout List -\labelwidthstring 00.00.0000 - -Slave replies with limit. -\layout List -\labelwidthstring 00.00.0000 - -Master updates administrative quota file. - -\layout Section - -State considerations -\layout Subsection - -Node state -\layout Subsection - -Context state -\layout Section - -Logic specification -\layout Standard - -The quota implementation falls into a few, almost separate, components. -\layout Standard - - -\series bold -ORDER OF IMPLEMENTATION -\layout Enumerate - -Administrative utilities, with sufficient flexibility to create unit test - cases -\layout Enumerate - -Administrative quota file implementation -\layout Enumerate - -OSS enforcement of quota (can be tested separately) -\layout Enumerate - -client - OSS protocol -\layout Enumerate - -quota context -\layout Enumerate - -quota acquire release protocol -\layout Enumerate - -MDS-OST setattr calls -\layout Enumerate - -comprehensive testing of use cases -\layout Enumerate - -recovery protocol -\layout Enumerate - -soft limit -\layout Subsection - -Administrative utilities -\layout Standard - -For all of the following commands it is probably useful to define a single - datastructure that has enough fields to hold all the data that needs to - be transfered. -\layout Description - -Top\SpecialChar ~ -priority -\layout Enumerate - -All utilities are either: -\begin_deeper -\layout Enumerate - -file system ioctls - where non-standard Lustre specific info is needed (e.g. - listing) -\layout Enumerate - -standard quotactl interfaces -\end_deeper -\layout Enumerate - -A lustre obd_iocontrol will allow an MDS to initiate quota check or quotaon - operations on all OST's. - It should be possible to issue this ioctl as a file system ioctl on a client, - or giving an MDS device on an MDS. - -\series bold -NOTE: -\series default -This rpc can be the same as the master to slave recovery enquiry rpc defined - below. -\layout Enumerate - -an obd_iocontrol and special lfs is needed to display usage & limits related - to quota for a uid/gid on all nodes in the cluster. - This needs to be added to lfs and need to be a command that can be issued - from a file system client. - -\layout Enumerate - -a command is needed to set the limits for a uid/gid, perhaps based on a - template. - The limits need to be set on the master and in the limit database. - All slaves need to be notified that quota tracking for the uid/gid is now - in effect (perhaps by increasing quota limits on the node to a non-zero - value). - Similarly it should be possible to disable quota for a uid / gid. -\layout Enumerate - -Documentation for all of these will be implemented as manual page extensions - and as part of the Lustre Users Guide. -\layout Enumerate - -A chown.chgrp utility. - Build a small c utility that stats a file and then issues the chown/chgrp - system call to change the ownder/group on the file. - This is issued from a client, in conjunction with running a find command - to initialize ownership. - This can only be run after the MDS has been changed to incorporate part - 3.3 -\layout Subsection - -Adminstrative quota file & disk file system quota -\layout Enumerate - -The administrative quota file will be a quota file similar to ext3 based - quota files with the usual VFS determined tree format. - -\layout Enumerate - -The VFS quota api will be adapted to enable the administrative commands - to create quota files by name and operate on them without sb (super block) - or dquot quota context arguments as required. -\layout Enumerate - - -\series bold -(Design this, but implementation is second priority) -\series default -Quota check will be adapted to handle checking on a live file system, as - follows: -\begin_deeper -\layout Enumerate - -if inodes are not checked in sequence order (1,2,3, etc) the following is - probably not possible. -\layout Enumerate - -block all operations on an inode while it is being -\begin_inset Quotes eld -\end_inset - -checked -\begin_inset Quotes erd -\end_inset - -. - -\layout Enumerate - -account for quota on inodes that are already checked -\layout Enumerate - -do not account on inodes that are not yet checked -\end_deeper -\layout Subsection - -OSS enforcement -\layout Enumerate - -The direct I/O and truncate calls on the OSS will enforce quota -\layout Subsection - -Client OST/MDT protocol -\layout Standard - -The following component can initially be implemented based on quota status - codes returned by the disk file system. - In due course the status of quota will be determined by the acquire calls - made in the OST or obdfilter. -\layout Enumerate - -All writes functions executed on OST's track quota for newly allocated space. -\layout Enumerate - -If a client flushes a page cache to an OST the data will be written (even - if quota are exceeded). - The mount flags allowing root to squash quota should be used for this. -\layout Enumerate - -If a client exceeds quota, a return code will indicate that the for that - further writes for files owned by that uid/gid must now be done synchronously. -\layout Enumerate - -If quota limits on the OSS are sufficient again, through removal of files - or enlarging limits, the flag must be cleared. -\layout Enumerate - -For MDC file quota are currently handled synchronously on the server. - -\layout Subsection - -Quota context and server quota enforcement -\layout Enumerate - -The MDS will automatically track block quota associated with directories. - It is important the llog files are owned by root users and not subject - to quota -\layout Enumerate - -For root root owned files, Lustre quota should not be enabled (there are - too many administratively controlled root-owned files right now). -\layout Enumerate - -There will be an active -\series bold -quota context -\series default - for a uid or gid for which quota operations are in progress. - Processes acquiring quota will find the context for that user or group - and wait on the context intelligently and not all fire RPC's to the master. - The context should also intelligently handle recovery operations running - concurrently with normal quota use. -\layout Subsection - -Slave to Master acquire / release protocol -\layout Enumerate - -Tunables -\begin_deeper -\layout Enumerate - -All servers will have tunables for qunits and early acquisition of more - qunits. - -\layout Enumerate - -The tunables can be set to configurable values through lconf, one set of - values for slave behavior, one for master behavior each separated for OSS - nodes, one for MDS nodes, as part of the configuration zeroconfig llog. - -\layout Enumerate - -The tunables can also be adjusted dynamically in /proc. - -\layout Enumerate - -Adjusting through proc only is not acceptable. -\end_deeper -\layout Enumerate - -There will be a function that determines the master node for a given uid - or gid. - For the 1.4 branch this function is always returning the MDS, but it will - be designed to make it easy to adapt to clustered metadata. -\layout Enumerate - -There will be dqacq and dqrel rpc's initiated by slave nodes. - The code will be organized so that it can be run on slave OSS and slave - MDS nodes without modification. - These functions will increase / decreate the local limits and administrative - usage on master. -\layout Enumerate - -A unit test program will run a collection of not less than 3 slaves and - a master through a sequence of interesting acquisitions and releases. -\layout Subsection - -Full integration and system testing -\layout Enumerate - -Full unit tests for all components. -\layout Enumerate - -Demonstrate successful handling of recovery from exceeding soft and hard - limits. -\layout Subsection - -MDS - OST setattr calls -\layout Enumerate - -When the MDS creates or chown a file it will queue an asynchronous obd_setattr - rpc to the RPC that: -\begin_deeper -\layout Enumerate - -changes the owner/group of the objects for the file. -\layout Enumerate - -transfers the storage id (ask Yury for data type) to the OSS (this is in - the create case only). - It writes the storage id in an EA. -\end_deeper -\layout Enumerate - -The obd_setattr calls will be journaled almost exactly like mds_unlink calls - in an llog (except that for unlink presently the client unlinks the objects) - and records will be canceled when the setattr commands commit to disk on - the OST. - -\layout Enumerate - -The obd_setattr rpc's will be queued on an RPC set for asynchronous completion, - i.e. - the MDS will reply to the client without waiting for the result. - The simple strategy ( -\begin_inset Quotes eld -\end_inset - -chown, even if user goes over quota -\begin_inset Quotes erd -\end_inset - -, see ERS) will be followed. -\layout Enumerate - -For this part not more than 4 (four) lines of code may be added to mds_open. - Adding 0 lines to this function (the longest in Lustre) would be better. -\layout Enumerate - -Demonstrate handling recovery of 300,000 orphaned chown operations while - the cluster is in use already. -\layout Subsection - -Server Node Recovery -\layout Standard - -Note: in CMD nodes will be slaves for some uids and masters for others. - The algorithm outlined here handles the general case. -\layout Enumerate - -Nodes will recovery quota asynchronously, ie. - they will start normal operations, without waiting for quota recovery to - complete. -\layout Enumerate - - -\series bold -Slave recovery initiation: -\begin_deeper -\layout Enumerate - -Slave recovery is initiated on a per-connection basis -\begin_deeper -\layout Enumerate - -Upon obtaining a new connection to a server node that can be a master during - normal operations -\layout Enumerate - -Upon entering normal operations while connections are present -\end_deeper -\layout Enumerate - -The recovery is aborted if a connection fails. -\layout Enumerate - -A collection of threads is needed to handle this recovery -\layout Enumerate - -The quota file handling should be sufficiently concurrent that multiple - connections can recover in parallel -\end_deeper -\layout Enumerate - - -\series bold -Slave recovery: -\series default - -\begin_deeper -\layout Enumerate - -During normal use the node will iterate through all the users and groups - in the operational quota file. - -\layout Enumerate - -If the connection is not one to the master for this uid/gid go to the next - uid/gid. -\layout Enumerate - -If such a uid/gid is also found in the node's administrative quota file, - this node is the master for that id and this id will be skipped, else continue -\series bold -. -\layout Enumerate - -Release unreasonably high limits for this uid/gid. -\layout Enumerate - -The contexts used for updating quota from the filter should be design so - that these releases can be made concurrent with normal use. -\end_deeper -\layout Enumerate - - -\series bold -Master recovery initiation -\begin_deeper -\layout Enumerate - -Master recovery requires connections to all other servers, it is initiated: -\begin_deeper -\layout Enumerate - -If upon entering normal operations all connections are present -\layout Enumerate - -If during normal operation all connections reach a usable state -\end_deeper -\layout Enumerate - -It is aborted if any connection fails during master recovery -\end_deeper -\layout Enumerate - - -\series bold -Master recovery: -\begin_deeper -\layout Enumerate - -During normal use the master will iterate through the administrative quota - file. -\layout Enumerate - -It will lock quota operations on the master for that uid. -\layout Enumerate - -For each uid/gid found it will make -\series bold -a new quota related master to slave -\series default -RPC to all other servers and ask for the current limit (and usage). -\layout Enumerate - -If a response is obtained from all nodes, the operational limit on the master - node is updated so that the sum of all operational limits is the clusterwide - administrative limit. - -\layout Enumerate - -If a response is not obtained from all servers, abort. -\end_deeper -\layout Subsection - -Soft Limits -\layout Standard - -Soft quota is not enforced in fs layer on master or slave. - It's only enforced in obd layer on Master: -\layout Enumerate - -The grace time and soft start time will be kept in adminstrative file. -\layout Enumerate - -Master monitor the administrative usage on each qunit acquire/release handling: - log the soft start time once the administrative usage >= administrative - soft limit, clear the soft start time once the administrative usage < administr -ative soft limit. -\layout Enumerate - -Master will reject any qunit acquire request if soft start time + grace - time < current time. -\layout Standard - -Make sure we have unit tests and integration and system tests that verify - this comprehensively. -\layout Section - -Changelog -\layout Description - -2005/01/29 First draft. - Based on review of Zhaohongs writings and ERS. -\layout Description - -2005/02/06 Second draft, much more detail to aid the team -\the_end -- 1.8.3.1