+ <para>Examples:</para>
+ <para>To turn on user, group, and project quotas for block only on
+ file system
+ <literal>testfs1</literal>, <emphasis>on the MGS</emphasis> run:</para>
+ <screen>$ lctl conf_param testfs1.quota.ost=ugp
+</screen>
+ <para>To turn on group quotas for inodes on file system
+ <literal>testfs2</literal>, on the MGS run:</para>
+ <screen>$ lctl conf_param testfs2.quota.mdt=g
+</screen>
+ <para>To turn off user, group, and project quotas for both inode and block
+ on file system
+ <literal>testfs3</literal>, on the MGS run:</para>
+ <screen>$ lctl conf_param testfs3.quota.ost=none
+</screen>
+ <screen>$ lctl conf_param testfs3.quota.mdt=none
+</screen>
+ <section>
+ <title>
+ <indexterm>
+ <primary>Quotas</primary>
+ <secondary>verifying</secondary>
+ </indexterm>Quota Verification</title>
+ <para>Once the quota parameters have been configured, all targets
+ which are part of the file system will be automatically notified of the
+ new quota settings and enable/disable quota enforcement as needed. The
+ per-target enforcement status can still be verified by running the
+ following <emphasis>command on the MDS(s)</emphasis>:</para>
+ <screen>
+$ lctl get_param osd-*.*.quota_slave.info
+osd-zfs.testfs-MDT0000.quota_slave.info=
+target name: testfs-MDT0000
+pool ID: 0
+type: md
+quota enabled: ug
+conn to master: setup
+user uptodate: glb[1],slv[1],reint[0]
+group uptodate: glb[1],slv[1],reint[0]
+</screen>
+ </section>
+ </section>
+ </section>
+ <section xml:id="quota_administration">
+ <title>
+ <indexterm>
+ <primary>Quotas</primary>
+ <secondary>creating</secondary>
+ </indexterm>Quota Administration</title>
+ <para>Once the file system is up and running, quota limits on blocks
+ and inodes can be set for user, group, and project. This is <emphasis>
+ controlled entirely from a client</emphasis> via three quota
+ parameters:</para>
+ <para>
+ <emphasis role="bold">Grace period</emphasis>-- The period of time (in
+ seconds) within which users are allowed to exceed their soft limit. There
+ are six types of grace periods:</para>
+ <itemizedlist>
+ <listitem>
+ <para>user block soft limit</para>
+ </listitem>
+ <listitem>
+ <para>user inode soft limit</para>
+ </listitem>
+ <listitem>
+ <para>group block soft limit</para>
+ </listitem>
+ <listitem>
+ <para>group inode soft limit</para>
+ </listitem>
+ <listitem>
+ <para>project block soft limit</para>
+ </listitem>
+ <listitem>
+ <para>project inode soft limit</para>
+ </listitem>
+ </itemizedlist>
+ <para>The grace period applies to all users. The user block soft limit is
+ for all users who are using a blocks quota.</para>
+ <para>
+ <emphasis role="bold">Soft limit</emphasis> -- The grace timer is started
+ once the soft limit is exceeded. At this point, the user/group/project
+ can still allocate block/inode. When the grace time expires and if the
+ user is still above the soft limit, the soft limit becomes a hard limit
+ and the user/group/project can't allocate any new block/inode any more.
+ The user/group/project should then delete files to be under the soft limit.
+ The soft limit MUST be smaller than the hard limit. If the soft limit is
+ not needed, it should be set to zero (0).</para>
+ <para>
+ <emphasis role="bold">Hard limit</emphasis> -- Block or inode allocation
+ will fail with
+ <literal>EDQUOT</literal>(i.e. quota exceeded) when the hard limit is
+ reached. The hard limit is the absolute limit. When a grace period is set,
+ one can exceed the soft limit within the grace period if under the hard
+ limit.</para>
+ <para>Due to the distributed nature of a Lustre file system and the need to
+ maintain performance under load, those quota parameters may not be 100%
+ accurate. The quota settings can be manipulated via the
+ <literal>lfs</literal> command, executed on a client, and includes several
+ options to work with quotas:</para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <varname>quota</varname> -- displays general quota information (disk
+ usage and limits)</para>
+ </listitem>
+ <listitem>
+ <para>
+ <varname>setquota</varname> -- specifies quota limits and tunes the
+ grace period. By default, the grace period is one week.</para>
+ </listitem>
+ </itemizedlist>
+ <para>Usage:</para>
+ <screen>
+lfs quota [-q] [-v] [-h] [-o obd_uuid] [-u|-g|-p <replaceable>uname|uid|gname|gid|projid</replaceable>] <replaceable>/mount_point</replaceable>
+lfs quota -t {-u|-g|-p} <replaceable>/mount_point</replaceable>
+lfs setquota {-u|--user|-g|--group|-p|--project} <replaceable>username|groupname</replaceable> [-b <replaceable>block-softlimit</replaceable>] \
+ [-B <replaceable>block_hardlimit</replaceable>] [-i <replaceable>inode_softlimit</replaceable>] \
+ [-I <replaceable>inode_hardlimit</replaceable>] <replaceable>/mount_point</replaceable>
+</screen>
+ <para>To display general quota information (disk usage and limits) for the
+ user running the command and his primary group, run:</para>
+ <screen>
+$ lfs quota /mnt/testfs
+</screen>
+ <para>To display general quota information for a specific user ("
+ <literal>bob</literal>" in this example), run:</para>
+ <screen>
+$ lfs quota -u bob /mnt/testfs
+</screen>
+ <para>To display general quota information for a specific user ("
+ <literal>bob</literal>" in this example) and detailed quota statistics for
+ each MDT and OST, run:</para>
+ <screen>
+$ lfs quota -u bob -v /mnt/testfs
+</screen>
+ <para>To display general quota information for a specific project ("
+ <literal>1</literal>" in this example), run:</para>
+ <screen>
+$ lfs quota -p 1 /mnt/testfs
+</screen>
+ <para>To display general quota information for a specific group ("
+ <literal>eng</literal>" in this example), run:</para>
+ <screen>
+$ lfs quota -g eng /mnt/testfs
+</screen>
+ <para>To limit quota usage for a specific project ID on a specific
+ directory ("<literal>/mnt/testfs/dir</literal>" in this example), run:</para>
+ <screen>
+$ chattr +P /mnt/testfs/dir
+$ chattr -p 1 /mnt/testfs/dir
+$ lfs setquota -p 1 -b 307200 -B 309200 -i 10000 -I 11000 /mnt/testfs
+</screen>
+ <para>Please note that if it is desired to have
+ <literal>lfs quota -p</literal> show the space/inode usage under the
+ directory properly (much faster than <literal>du</literal>), then the
+ user/admin needs to use different project IDs for different directories.
+ </para>
+ <para>To display block and inode grace times for user quotas, run:</para>
+ <screen>
+$ lfs quota -t -u /mnt/testfs
+</screen>
+ <para>To set user or group quotas for a specific ID ("bob" in this
+ example), run:</para>
+ <screen>
+$ lfs setquota -u bob -b 307200 -B 309200 -i 10000 -I 11000 /mnt/testfs
+</screen>
+ <para>In this example, the quota for user "bob" is set to 300 MB
+ (309200*1024) and the hard limit is 11,000 files. Therefore, the inode hard
+ limit should be 11000.</para>
+ <para>The quota command displays the quota allocated and consumed by each
+ Lustre target. Using the previous
+ <literal>setquota</literal> example, running this
+ <literal>lfs</literal> quota command:</para>
+ <screen>
+$ lfs quota -u bob -v /mnt/testfs
+</screen>
+ <para>displays this command output:</para>
+ <screen>
+Disk quotas for user bob (uid 6000):
+Filesystem kbytes quota limit grace files quota limit grace
+/mnt/testfs 0 30720 30920 - 0 10000 11000 -
+testfs-MDT0000_UUID 0 - 8192 - 0 - 2560 -
+testfs-OST0000_UUID 0 - 8192 - 0 - 0 -
+testfs-OST0001_UUID 0 - 8192 - 0 - 0 -
+Total allocated inode limit: 2560, total allocated block limit: 24576
+</screen>
+ <para>Global quota limits are stored in dedicated index files (there is one
+ such index per quota type) on the quota master target (aka QMT). The QMT
+ runs on MDT0000 and exports the global indexes via /proc. The global
+ indexes can thus be dumped via the following command:
+ <screen>
+# lctl get_param qmt.testfs-QMT0000.*.glb-*
+</screen>The format of global indexes depends on the OSD type. The ldiskfs OSD
+uses an IAM files while the ZFS OSD creates dedicated ZAPs.</para>
+ <para>Each slave also stores a copy of this global index locally. When the
+ global index is modified on the master, a glimpse callback is issued on the
+ global quota lock to notify all slaves that the global index has been
+ modified. This glimpse callback includes information about the identifier
+ subject to the change. If the global index on the QMT is modified while a
+ slave is disconnected, the index version is used to determine whether the
+ slave copy of the global index isn't up to date any more. If so, the slave
+ fetches the whole index again and updates the local copy. The slave copy of
+ the global index is also exported via /proc and can be accessed via the
+ following command:
+ <screen>
+lctl get_param osd-*.*.quota_slave.limit*
+</screen></para>
+ <note>
+ <para>Prior to 2.4, global quota limits used to be stored in
+ administrative quota files using the on-disk format of the linux quota
+ file. When upgrading MDT0000 to 2.4, those administrative quota files are
+ converted into IAM indexes automatically, conserving existing quota
+ limits previously set by the administrator.</para>
+ </note>
+ </section>
+ <section xml:id="quota_allocation">
+ <title>
+ <indexterm>
+ <primary>Quotas</primary>
+ <secondary>allocating</secondary>
+ </indexterm>Quota Allocation</title>
+ <para>In a Lustre file system, quota must be properly allocated or users
+ may experience unnecessary failures. The file system block quota is divided
+ up among the OSTs within the file system. Each OST requests an allocation
+ which is increased up to the quota limit. The quota allocation is then
+ <emphasis role="italic">quantized</emphasis> to reduce the number of
+ quota-related request traffic.</para>
+ <para>The Lustre quota system distributes quotas from the Quota Master
+ Target (aka QMT). Only one QMT instance is supported for now and only runs
+ on the same node as MDT0000. All OSTs and MDTs set up a Quota Slave Device
+ (aka QSD) which connects to the QMT to allocate/release quota space. The
+ QSD is setup directly from the OSD layer.</para>
+ <para>To reduce quota requests, quota space is initially allocated to QSDs
+ in very large chunks. How much unused quota space can be hold by a target
+ is controlled by the qunit size. When quota space for a given ID is close
+ to exhaustion on the QMT, the qunit size is reduced and QSDs are notified
+ of the new qunit size value via a glimpse callback. Slaves are then
+ responsible for releasing quota space above the new qunit value. The qunit
+ size isn't shrunk indefinitely and there is a minimal value of 1MB for
+ blocks and 1,024 for inodes. This means that the quota space rebalancing
+ process will stop when this minimum value is reached. As a result, quota
+ exceeded can be returned while many slaves still have 1MB or 1,024 inodes
+ of spare quota space.</para>
+ <para>If we look at the
+ <literal>setquota</literal> example again, running this
+ <literal>lfs quota</literal> command:</para>
+ <screen>
+# lfs quota -u bob -v /mnt/testfs
+</screen>
+ <para>displays this command output:</para>
+ <screen>
+Disk quotas for user bob (uid 500):
+Filesystem kbytes quota limit grace files quota limit grace
+/mnt/testfs 30720* 30720 30920 6d23h56m44s 10101* 10000 11000
+6d23h59m50s
+testfs-MDT0000_UUID 0 - 0 - 10101 - 10240
+testfs-OST0000_UUID 0 - 1024 - - - -
+testfs-OST0001_UUID 30720* - 29896 - - - -
+Total allocated inode limit: 10240, total allocated block limit: 30920
+</screen>
+ <para>The total quota limit of 30,920 is allocated to user bob, which is
+ further distributed to two OSTs.</para>
+ <para>Values appended with '
+ <literal>*</literal>' show that the quota limit has been exceeded, causing
+ the following error when trying to write or create a file:</para>
+ <para>
+ <screen>
+$ cp: writing `/mnt/testfs/foo`: Disk quota exceeded.
+</screen>
+ </para>
+ <note>
+ <para>It is very important to note that the block quota is consumed per
+ OST and the inode quota per MDS. Therefore, when the quota is consumed on
+ one OST (resp. MDT), the client may not be able to create files
+ regardless of the quota available on other OSTs (resp. MDTs).</para>
+ <para>Setting the quota limit below the minimal qunit size may prevent
+ the user/group from all file creation. It is thus recommended to use
+ soft/hard limits which are a multiple of the number of OSTs * the minimal
+ qunit size.</para>
+ </note>
+ <para>To determine the total number of inodes, use
+ <literal>lfs df -i</literal>(and also
+ <literal>lctl get_param *.*.filestotal</literal>). For more information on
+ using the
+ <literal>lfs df -i</literal> command and the command output, see
+ <xref linkend="dbdoclet.50438209_35838" />.</para>
+ <para>Unfortunately, the
+ <literal>statfs</literal> interface does not report the free inode count
+ directly, but instead reports the total inode and used inode counts. The
+ free inode count is calculated for
+ <literal>df</literal> from (total inodes - used inodes). It is not critical
+ to know the total inode count for a file system. Instead, you should know
+ (accurately), the free inode count and the used inode count for a file
+ system. The Lustre software manipulates the total inode count in order to
+ accurately report the other two values.</para>
+ </section>
+ <section xml:id="quota_interoperability">
+ <title>
+ <indexterm>
+ <primary>Quotas</primary>
+ <secondary>Interoperability</secondary>
+ </indexterm>Quotas and Version Interoperability</title>
+ <para>The new quota protocol introduced in Lustre software release 2.4.0
+ <emphasis role="bold">is not compatible</emphasis> with previous
+ versions. As a consequence,
+ <emphasis role="bold">all Lustre servers must be upgraded to release 2.4.0
+ for quota to be functional</emphasis>. Quota limits set on the Lustre file
+ system prior to the upgrade will be automatically migrated to the new quota
+ index format. As for accounting information with ldiskfs backend, they will
+ be regenerated by running
+ <literal>tunefs.lustre --quota</literal> against all targets. It is worth
+ noting that running
+ <literal>tunefs.lustre --quota</literal> is
+ <emphasis role="bold">mandatory</emphasis> for all targets formatted with a
+ Lustre software release older than release 2.4.0, otherwise quota
+ enforcement as well as accounting won't be functional.</para>
+ <para>Besides, the quota protocol in release 2.4 takes for granted that the
+ Lustre client supports the
+ <literal>OBD_CONNECT_EINPROGRESS</literal> connect flag. Clients supporting
+ this flag will retry indefinitely when the server returns
+ <literal>EINPROGRESS</literal> in a reply. Here is the list of Lustre client
+ version which are compatible with release 2.4:</para>
+ <itemizedlist>
+ <listitem>
+ <para>Release 2.3-based clients and later</para>
+ </listitem>
+ <listitem>
+ <para>Release 1.8 clients newer or equal to release 1.8.9-wc1</para>
+ </listitem>
+ <listitem>
+ <para>Release 2.1 clients newer or equal to release 2.1.4</para>
+ </listitem>
+ </itemizedlist>
+ <para condition="l2A">To use the project quota functionality introduced in
+ Lustre 2.10, <emphasis role="bold">all Lustre servers and clients must be
+ upgraded to Lustre release 2.10 or later for project quota to work
+ correctly</emphasis>. Otherwise, project quota will be inaccessible on
+ clients and not be accounted for on OSTs.</para>
+ </section>
+ <section xml:id="granted_cache_and_quota_limits">
+ <title>
+ <indexterm>
+ <primary>Quotas</primary>
+ <secondary>known issues</secondary>
+ </indexterm>Granted Cache and Quota Limits</title>
+ <para>In a Lustre file system, granted cache does not respect quota limits.
+ In this situation, OSTs grant cache to a Lustre client to accelerate I/O.
+ Granting cache causes writes to be successful in OSTs, even if they exceed
+ the quota limits, and will overwrite them.</para>
+ <para>The sequence is:</para>
+ <orderedlist>
+ <listitem>
+ <para>A user writes files to the Lustre file system.</para>
+ </listitem>
+ <listitem>
+ <para>If the Lustre client has enough granted cache, then it returns
+ 'success' to users and arranges the writes to the OSTs.</para>
+ </listitem>
+ <listitem>
+ <para>Because Lustre clients have delivered success to users, the OSTs
+ cannot fail these writes.</para>
+ </listitem>
+ </orderedlist>
+ <para>Because of granted cache, writes always overwrite quota limitations.
+ For example, if you set a 400 GB quota on user A and use IOR to write for
+ user A from a bundle of clients, you will write much more data than 400 GB,
+ and cause an out-of-quota error (
+ <literal>EDQUOT</literal>).</para>
+ <note>
+ <para>The effect of granted cache on quota limits can be mitigated, but
+ not eradicated. Reduce the maximum amount of dirty data on the clients
+ (minimal value is 1MB):</para>