ConfiguringQuotas.xml

   1 <?xml version='1.0' encoding='utf-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3  xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4  xml:id="configuringquotas">
   5   <title xml:id="configuringquotas.title">Configuring and Managing
   6   Quotas</title>
   7   <section xml:id="quota_configuring">
   8     <title>
   9     <indexterm>
  10       <primary>Quotas</primary>
  11       <secondary>configuring</secondary>
  12     </indexterm>Working with Quotas</title>
  13     <para>Quotas allow a system administrator to limit the amount of disk
  14     space a user, group, or project can use. Quotas are set by root, and can
  15     be specified for individual users, groups, and/or projects. Before a file
  16     is written to a partition where quotas are set, the quota of the creator's
  17     group is checked. If a quota exists, then the file size counts towards
  18     the group's quota. If no quota exists, then the owner's user quota is
  19     checked before the file is written. Similarly, inode usage for specific
  20     functions can be controlled if a user over-uses the allocated space.</para>
  21     <para>Lustre quota enforcement differs from standard Linux quota
  22     enforcement in several ways:</para>
  23     <itemizedlist>
  24       <listitem>
  25         <para>Quotas are administered via the
  26         <literal>lfs</literal> and
  27         <literal>lctl</literal> commands (post-mount).</para>
  28       </listitem>
  29       <listitem>
  30         <para>The quota feature in Lustre software is distributed
  31         throughout the system (as the Lustre file system is a distributed file
  32         system). Because of this, quota setup and behavior on Lustre is
  33         somewhat different from local disk quotas in the following ways:</para>
  34         <itemizedlist>
  35         <listitem>
  36           <para>No single point of administration: some commands must be
  37           executed on the MGS, other commands on the MDSs and OSSs, and still
  38           other commands on the client.</para>
  39           </listitem>
  40           <listitem>
  41           <para>Granularity: a local quota is typically specified for
  42           kilobyte resolution, Lustre uses one megabyte as the smallest quota
  43           resolution.</para>
  44           </listitem>
  45           <listitem>
  46           <para>Accuracy: quota information is distributed throughout the file
  47           system and can only be accurately calculated with a quiescent file
  48           system in order to minimize performance overhead during normal use.
  49           </para>
  50         </listitem>
  51         </itemizedlist>
  52       </listitem>
  53       <listitem>
  54         <para>Quotas are allocated and consumed in a quantized fashion.</para>
  55       </listitem>
  56       <listitem>
  57         <para>Client does not set the
  58         <literal>usrquota</literal> or
  59         <literal>grpquota</literal> options to mount. Space accounting is
  60         enabled by default and quota enforcement can be enabled/disabled on
  61         a per-filesystem basis with <literal>lctl conf_param</literal>.</para>
  62         <para condition="l28">It is worth noting that the
  63         <literal>lfs quotaon</literal>, <literal>lfs quotaoff</literal>,
  64         <literal>lfs quotacheck</literal> and <literal>quota_type</literal>
  65         sub-commands are deprecated as of Lustre 2.4.0, and removed completely
  66         in Lustre 2.8.0.</para>
  67       </listitem>
  68     </itemizedlist>
  69     <caution>
  70       <para>Although a quota feature is available in the Lustre software, root
  71       quotas are NOT enforced.</para>
  72       <para>
  73       <literal>lfs setquota -u root</literal> (limits are not enforced)</para>
  74       <para>
  75       <literal>lfs quota -u root</literal> (usage includes internal Lustre data
  76       that is dynamic in size and does not accurately reflect mount point
  77       visible block and inode usage).</para>
  78     </caution>
  79   </section>
  80   <section xml:id="enabling_disk_quotas">
  81     <title>
  82     <indexterm>
  83       <primary>Quotas</primary>
  84       <secondary>enabling disk</secondary>
  85     </indexterm>Enabling Disk Quotas</title>
  86     <para>The design of quotas on Lustre has management and enforcement
  87     separated from resource usage and accounting. Lustre software is
  88     responsible for management and enforcement. The back-end file
  89     system is responsible for resource usage and accounting. Because of
  90     this, it is necessary to begin enabling quotas by enabling quotas on the
  91     back-end disk system.
  92     </para>
  93       <caution>
  94         <para>Quota setup is orchestrated by the MGS and <emphasis>all setup
  95         commands in this section must be run directly on the MGS</emphasis>.
  96         Support for project quotas specifically requires Lustre Release 2.10 or
  97         later.  A <emphasis>patched server</emphasis> may be required, depending
  98         on the kernel version and backend filesystem type:</para>
  99         <informaltable frame="all">
 100           <tgroup cols="2">
 101           <colspec colname="c1" colwidth="50*" />
 102           <colspec colname="c2" colwidth="50*" align="center" />
 103           <thead>
 104             <row>
 105               <entry>
 106                 <para>
 107                   <emphasis role="bold">Configuration</emphasis>
 108                 </para>
 109               </entry>
 110               <entry>
 111                 <para>
 112                   <emphasis role="bold">Patched Server Required?</emphasis>
 113                 </para>
 114               </entry>
 115             </row>
 116           </thead>
 117           <tbody>
 118             <row>
 119               <entry><para>
 120                 <emphasis>ldiskfs with kernel version &lt; 4.5</emphasis>
 121               </para></entry>
 122               <entry><para>Yes</para></entry>
 123             </row>
 124             <row>
 125               <entry><para>
 126                 <emphasis>ldiskfs with kernel version &gt;= 4.5</emphasis>
 127               </para></entry>
 128               <entry><para>No</para></entry>
 129             </row>
 130             <row>
 131               <entry><para>
 132                 <emphasis>zfs version &gt;=0.8 with kernel
 133                 version &lt; 4.5</emphasis>
 134               </para></entry>
 135               <entry><para>Yes</para></entry>
 136             </row>
 137             <row>
 138               <entry><para>
 139                 <emphasis>zfs version &gt;=0.8 with kernel
 140                 version &gt; 4.5</emphasis>
 141               </para></entry>
 142               <entry><para>No</para></entry>
 143             </row>
 144           </tbody>
 145           </tgroup>
 146         </informaltable>
 147         <para>*Note:  Project quotas are not supported on zfs versions earlier
 148         than 0.8.</para>
 149       </caution>
 150       <para>Once setup, verification of the quota state must be performed on the
 151       MDT. Although quota enforcement is managed by the Lustre software, each
 152       OSD implementation relies on the back-end file system to maintain
 153       per-user/group/project block and inode usage. Hence, differences exist
 154       when setting up quotas with ldiskfs or ZFS back-ends:</para>
 155       <itemizedlist>
 156         <listitem>
 157           <para>For ldiskfs backends,
 158           <literal>mkfs.lustre</literal> now creates empty quota files and
 159           enables the QUOTA feature flag in the superblock which turns quota
 160           accounting on at mount time automatically. e2fsck was also modified
 161           to fix the quota files when the QUOTA feature flag is present. The
 162           project quota feature is disabled by default, and
 163           <literal>tune2fs</literal> needs to be run to enable every target
 164           manually.  If user, group, and project quota usage is inconsistent,
 165           run <literal>e2fsck -f</literal> on all unmounted MDTs and OSTs.
 166           </para>
 167         </listitem>
 168         <listitem>
 169           <para>For ZFS backend, <emphasis>the project quota feature is not
 170           supported on zfs versions less than 0.8.0.</emphasis> Accounting ZAPs
 171           are created and maintained by the ZFS file system itself. While ZFS
 172           tracks per-user and group block usage, it does not handle inode
 173           accounting for ZFS versions prior to zfs-0.7.0. The ZFS OSD previously
 174           implemented its own support for inode tracking. Two options are
 175           available:</para>
 176           <orderedlist>
 177             <listitem>
 178               <para>The ZFS OSD can estimate the number of inodes in-use based
 179               on the number of blocks used by a given user or group. This mode
 180               can be enabled by running the following command on the server
 181               running the target:
 182               <literal>lctl set_param
 183               osd-zfs.${FSNAME}-${TARGETNAME}.quota_iused_estimate=1</literal>.
 184               </para>
 185             </listitem>
 186             <listitem>
 187               <para>Similarly to block accounting, dedicated ZAPs are also
 188               created the ZFS OSD to maintain per-user and group inode usage.
 189               This is the default mode which corresponds to
 190               <literal>quota_iused_estimate</literal> set to 0.</para>
 191             </listitem>
 192           </orderedlist>
 193         </listitem>
 194       </itemizedlist>
 195       <note>
 196       <para>To (re-)enable space usage quota on ldiskfs filesystems, run
 197       <literal>tune2fs -O quota</literal> against all targets. This command
 198       sets the QUOTA feature flag in the superblock and runs e2fsck internally.
 199       As a result, the target must be offline to build the per-UID/GID disk
 200       usage database.</para>
 201       <para condition="l2A">Lustre filesystems formatted with a Lustre release
 202       prior to 2.10 can be still safely upgraded to release 2.10, but will not
 203       have project quota usage reporting functional until
 204       <literal>tune2fs -O project</literal> is run against all ldiskfs backend
 205       targets. This command sets the PROJECT feature flag in the superblock and
 206       runs e2fsck (as a result, the target must be offline). See
 207       <xref linkend="quota_interoperability"/> for further important
 208       considerations.</para>
 209       </note>
 210       <caution>
 211         <para>Lustre requires a version of e2fsprogs that supports quota
 212         to be installed on the server nodes when using the ldiskfs backend
 213         (e2fsprogs is not needed with ZFS backend). In general, we recommend
 214         to use the latest e2fsprogs version available on
 215         <link xl:href="https://downloads.whamcloud.com/public/e2fsprogs/">
 216         http://downloads.whamcloud.com/public/e2fsprogs/</link>.</para>
 217         <para>The ldiskfs OSD relies on the standard Linux quota to maintain
 218         accounting information on disk. As a consequence, the Linux kernel
 219         running on the Lustre servers using ldiskfs backend must have
 220         <literal>CONFIG_QUOTA</literal>,
 221         <literal>CONFIG_QUOTACTL</literal> and
 222         <literal>CONFIG_QFMT_V2</literal> enabled.</para>
 223       </caution>
 224       <para>Quota enforcement is turned on/off independently of space
 225         accounting which is always enabled.  There is a single per-file
 226         system quota parameter controlling inode/block quota enforcement.
 227         Like all permanent parameters, this quota parameter can be set via
 228         <literal>lctl conf_param</literal> on the MGS via the command:</para>
 229       <screen>
 230 lctl conf_param <replaceable>fsname</replaceable>.quota.<replaceable>ost|mdt</replaceable>=<replaceable>u|g|p|ugp|none</replaceable>
 231 </screen>
 232       <itemizedlist>
 233         <listitem>
 234           <para>
 235           <literal>ost</literal> -- to configure block quota managed by
 236           OSTs</para>
 237         </listitem>
 238         <listitem>
 239           <para>
 240           <literal>mdt</literal> -- to configure inode quota managed by
 241           MDTs</para>
 242         </listitem>
 243         <listitem>
 244           <para>
 245           <literal>u</literal> -- to enable quota enforcement for users
 246           only</para>
 247         </listitem>
 248         <listitem>
 249           <para>
 250           <literal>g</literal> -- to enable quota enforcement for groups
 251           only</para>
 252         </listitem>
 253         <listitem>
 254           <para>
 255           <literal>p</literal> -- to enable quota enforcement for projects
 256           only</para>
 257         </listitem>
 258         <listitem>
 259           <para>
 260           <literal>ugp</literal> -- to enable quota enforcement for all users,
 261           groups and projects</para>
 262         </listitem>
 263         <listitem>
 264           <para>
 265           <literal>none</literal> -- to disable quota enforcement for all users,
 266           groups and projects</para>
 267         </listitem>
 268       </itemizedlist>
 269       <para>Examples:</para>
 270       <para>To turn on user, group, and project quotas for block only on
 271       file system
 272       <literal>testfs1</literal>, <emphasis>on the MGS</emphasis> run:</para>
 273       <screen>mgs# lctl conf_param testfs1.quota.ost=ugp </screen>
 274       <para>To turn on group quotas for inodes on file system
 275       <literal>testfs2</literal>, on the MGS run:</para>
 276       <screen>mgs# lctl conf_param testfs2.quota.mdt=g </screen>
 277       <para>To turn off user, group, and project quotas for both inode and block
 278       on file system
 279       <literal>testfs3</literal>, on the MGS run:</para>
 280       <screen>mgs# lctl conf_param testfs3.quota.ost=none</screen>
 281       <screen>mgs# lctl conf_param testfs3.quota.mdt=none</screen>
 282       <section xml:id="quota_verification">
 283         <title>
 284         <indexterm>
 285           <primary>Quotas</primary>
 286           <secondary>verifying</secondary>
 287         </indexterm>Quota Verification</title>
 288       <para>Once the quota parameters have been configured, all targets
 289       which are part of the file system will be automatically notified of the
 290       new quota settings and enable/disable quota enforcement as needed. The
 291       per-target enforcement status can still be verified by running the
 292       following <emphasis>command on the MDS(s)</emphasis>:</para>
 293       <screen>
 294 $ lctl get_param osd-*.*.quota_slave.info
 295 osd-zfs.testfs-MDT0000.quota_slave.info=
 296 target name:    testfs-MDT0000
 297 pool ID:        0
 298 type:           md
 299 quota enabled:  ug
 300 conn to master: setup
 301 user uptodate:  glb[1],slv[1],reint[0]
 302 group uptodate: glb[1],slv[1],reint[0]
 303 </screen>
 304       </section>
 305   </section>
 306   <section xml:id="quota_administration">
 307     <title>
 308     <indexterm>
 309       <primary>Quotas</primary>
 310       <secondary>creating</secondary>
 311     </indexterm>Quota Administration</title>
 312     <para>Once the file system is up and running, quota limits on blocks
 313     and inodes can be set for user, group, and project. This is <emphasis>
 314     controlled entirely from a client</emphasis> via three quota
 315     parameters:</para>
 316     <para>
 317     <emphasis role="bold">Grace period</emphasis>-- The period of time (in
 318     seconds) within which users are allowed to exceed their soft limit. There
 319     are six types of grace periods:</para>
 320     <itemizedlist>
 321       <listitem>
 322         <para>user block soft limit</para>
 323       </listitem>
 324       <listitem>
 325         <para>user inode soft limit</para>
 326       </listitem>
 327       <listitem>
 328         <para>group block soft limit</para>
 329       </listitem>
 330       <listitem>
 331         <para>group inode soft limit</para>
 332       </listitem>
 333       <listitem>
 334         <para>project block soft limit</para>
 335       </listitem>
 336       <listitem>
 337         <para>project inode soft limit</para>
 338       </listitem>
 339     </itemizedlist>
 340     <para>The grace period applies to all users. The user block soft limit is
 341     for all users who are using a blocks quota.</para>
 342     <para>
 343     <emphasis role="bold">Soft limit</emphasis> -- The grace timer is started
 344     once the soft limit is exceeded. At this point, the user/group/project
 345     can still allocate block/inode. When the grace time expires and if the
 346     user is still above the soft limit, the soft limit becomes a hard limit
 347     and the user/group/project can't allocate any new block/inode any more.
 348     The user/group/project should then delete files to be under the soft limit.
 349     The soft limit MUST be smaller than the hard limit. If the soft limit is
 350     not needed, it should be set to zero (0).</para>
 351     <para>
 352     <emphasis role="bold">Hard limit</emphasis> -- Block or inode allocation
 353     will fail with
 354     <literal>EDQUOT</literal>(i.e. quota exceeded) when the hard limit is
 355     reached. The hard limit is the absolute limit. When a grace period is set,
 356     one can exceed the soft limit within the grace period if under the hard
 357     limit.</para>
 358     <para>Due to the distributed nature of a Lustre file system and the need to
 359     maintain performance under load, those quota parameters may not be 100%
 360     accurate. The quota settings can be manipulated via the
 361     <literal>lfs</literal> command, executed on a client, and includes several
 362     options to work with quotas:</para>
 363     <itemizedlist>
 364       <listitem>
 365         <para>
 366         <varname>quota</varname> -- displays general quota information (disk
 367         usage and limits)</para>
 368       </listitem>
 369       <listitem>
 370         <para>
 371         <varname>setquota</varname> -- specifies quota limits and tunes the
 372         grace period. By default, the grace period is one week.</para>
 373       </listitem>
 374     </itemizedlist>
 375     <para>Usage:</para>
 376     <screen>
 377 lfs quota [-q] [-v] [-h] [-o obd_uuid] [-u|-g|-p <replaceable>uname|uid|gname|gid|projid</replaceable>] <replaceable>/mount_point</replaceable>
 378 lfs quota -t {-u|-g|-p} <replaceable>/mount_point</replaceable>
 379 lfs setquota {-u|--user|-g|--group|-p|--project} <replaceable>username|groupname</replaceable> [-b <replaceable>block-softlimit</replaceable>] \
 380              [-B <replaceable>block_hardlimit</replaceable>] [-i <replaceable>inode_softlimit</replaceable>] \
 381              [-I <replaceable>inode_hardlimit</replaceable>] <replaceable>/mount_point</replaceable>
 382 </screen>
 383     <para>To display general quota information (disk usage and limits) for the
 384     user running the command and his primary group, run:</para>
 385     <screen>
 386 $ lfs quota /mnt/testfs
 387 </screen>
 388     <para>To display general quota information for a specific user ("
 389     <literal>bob</literal>" in this example), run:</para>
 390     <screen>
 391 $ lfs quota -u bob /mnt/testfs
 392 </screen>
 393     <para>To display general quota information for a specific user ("
 394     <literal>bob</literal>" in this example) and detailed quota statistics for
 395     each MDT and OST, run:</para>
 396     <screen>
 397 $ lfs quota -u bob -v /mnt/testfs
 398 </screen>
 399     <para>To display general quota information for a specific project ("
 400     <literal>1</literal>" in this example), run:</para>
 401     <screen>
 402 $ lfs quota -p 1 /mnt/testfs
 403 </screen>
 404     <para>To display general quota information for a specific group ("
 405     <literal>eng</literal>" in this example), run:</para>
 406     <screen>
 407 $ lfs quota -g eng /mnt/testfs
 408 </screen>
 409     <para>To limit quota usage for a specific project ID on a specific
 410     directory ("<literal>/mnt/testfs/dir</literal>" in this example), run:</para>
 411     <screen>
 412 $ chattr +P /mnt/testfs/dir
 413 $ chattr -p 1 /mnt/testfs/dir
 414 $ lfs setquota -p 1 -b 307200 -B 309200 -i 10000 -I 11000 /mnt/testfs
 415 </screen>
 416     <para>Please note that if it is desired to have
 417     <literal>lfs quota -p</literal> show the space/inode usage under the
 418     directory properly (much faster than <literal>du</literal>), then the
 419     user/admin needs to use different project IDs for different directories.
 420     </para>
 421     <para>To display block and inode grace times for user quotas, run:</para>
 422     <screen>
 423 $ lfs quota -t -u /mnt/testfs
 424 </screen>
 425     <para>To set user or group quotas for a specific ID ("bob" in this
 426     example), run:</para>
 427     <screen>
 428 $ lfs setquota -u bob -b 307200 -B 309200 -i 10000 -I 11000 /mnt/testfs
 429 </screen>
 430     <para>In this example, the quota for user "bob" is set to 300 MB
 431     (309200*1024) and the hard limit is 11,000 files. Therefore, the inode hard
 432     limit should be 11000.</para>
 433     <para>The quota command displays the quota allocated and consumed by each
 434     Lustre target. Using the previous
 435     <literal>setquota</literal> example, running this
 436     <literal>lfs</literal> quota command:</para>
 437     <screen>
 438 $ lfs quota -u bob -v /mnt/testfs
 439 </screen>
 440     <para>displays this command output:</para>
 441     <screen>
 442 Disk quotas for user bob (uid 6000):
 443 Filesystem          kbytes quota limit grace files quota limit grace
 444 /mnt/testfs         0      30720 30920 -     0     10000 11000 -
 445 testfs-MDT0000_UUID 0      -      8192 -     0     -     2560  -
 446 testfs-OST0000_UUID 0      -      8192 -     0     -     0     -
 447 testfs-OST0001_UUID 0      -      8192 -     0     -     0     -
 448 Total allocated inode limit: 2560, total allocated block limit: 24576
 449 </screen>
 450     <para>Global quota limits are stored in dedicated index files (there is one
 451     such index per quota type) on the quota master target (aka QMT). The QMT
 452     runs on MDT0000 and exports the global indices via <replaceable>lctl
 453     get_param</replaceable>. The global indices can thus be dumped via the
 454     following command:
 455     <screen>
 456 # lctl get_param qmt.testfs-QMT0000.*.glb-*
 457 </screen>The format of global indexes depends on the OSD type. The ldiskfs OSD
 458 uses an IAM files while the ZFS OSD creates dedicated ZAPs.</para>
 459     <para>Each slave also stores a copy of this global index locally. When the
 460     global index is modified on the master, a glimpse callback is issued on the
 461     global quota lock to notify all slaves that the global index has been
 462     modified. This glimpse callback includes information about the identifier
 463     subject to the change. If the global index on the QMT is modified while a
 464     slave is disconnected, the index version is used to determine whether the
 465     slave copy of the global index isn't up to date any more. If so, the slave
 466     fetches the whole index again and updates the local copy. The slave copy of
 467     the global index can also be accessed via the following command:
 468     <screen>
 469 lctl get_param osd-*.*.quota_slave.limit*
 470 </screen></para>
 471   </section>
 472   <section condition='l2C' xml:id="default_quota">
 473     <title>
 474     <indexterm>
 475       <primary>Quotas</primary>
 476       <secondary>default</secondary>
 477     </indexterm>Default Quota</title>
 478     <para>The default quota is used to enforce the quota limits for any user,
 479     group, or project that do not have quotas set by administrator.</para>
 480     <para>The default quota can be disabled by setting limits to
 481     <literal>0</literal>.</para>
 482       <section xml:id="defalut_quota_usage">
 483       <title>
 484       <indexterm>
 485         <primary>Quotas</primary>
 486         <secondary>usage</secondary>
 487       </indexterm>Usage</title>
 488       <screen>
 489 lfs quota [-U|--default-usr|-G|--default-grp|-P|--default-prj] <replaceable>/mount_point</replaceable>
 490 lfs setquota {-U|--default-usr|-G|--default-grp|-P|--default-prj} [-b <replaceable>block-softlimit</replaceable>] \
 491              [-B <replaceable>block_hardlimit</replaceable>] [-i <replaceable>inode_softlimit</replaceable>] [-I <replaceable>inode_hardlimit</replaceable>] <replaceable>/mount_point</replaceable>
 492 lfs setquota {-u|-g|-p} <replaceable>username|groupname</replaceable> -d <replaceable>/mount_point</replaceable>
 493       </screen>
 494       <para>To set the default user quota:</para>
 495       <screen>
 496 # lfs setquota -U -b 10G -B 11G -i 100K -I 105K /mnt/testfs
 497       </screen>
 498       <para>To set the default group quota:</para>
 499       <screen>
 500 # lfs setquota -G -b 10G -B 11G -i 100K -I 105K /mnt/testfs
 501       </screen>
 502       <para>To set the default project quota:</para>
 503       <screen>
 504 # lfs setquota -P -b 10G -B 11G -i 100K -I 105K /mnt/testfs
 505       </screen>
 506       <para>To disable the default user quota:</para>
 507       <screen>
 508 # lfs setquota -U -b 0 -B 0 -i 0 -I 0 /mnt/testfs
 509       </screen>
 510       <para>To disable the default group quota:</para>
 511       <screen>
 512 # lfs setquota -G -b 0 -B 0 -i 0 -I 0 /mnt/testfs
 513       </screen>
 514       <para>To disable the default project quota:</para>
 515       <screen>
 516 # lfs setquota -P -b 0 -B 0 -i 0 -I 0 /mnt/testfs
 517       </screen>
 518       <note>
 519       <para>
 520       If quota limits are set for some user, group or project, it will use
 521       those specific quota limits instead of the default quota. Quota limits for
 522       any user, group or project will use the default quota by setting its quota
 523       limits to <literal>0</literal>.
 524       </para>
 525       </note>
 526     </section>
 527   </section>
 528   <section xml:id="quota_allocation">
 529     <title>
 530     <indexterm>
 531       <primary>Quotas</primary>
 532       <secondary>allocating</secondary>
 533     </indexterm>Quota Allocation</title>
 534     <para>In a Lustre file system, quota must be properly allocated or users
 535     may experience unnecessary failures. The file system block quota is divided
 536     up among the OSTs within the file system. Each OST requests an allocation
 537     which is increased up to the quota limit. The quota allocation is then
 538     <emphasis role="italic">quantized</emphasis> to reduce the number of
 539     quota-related request traffic.</para>
 540     <para>The Lustre quota system distributes quotas from the Quota Master
 541     Target (aka QMT). Only one QMT instance is supported for now and only runs
 542     on the same node as MDT0000. All OSTs and MDTs set up a Quota Slave Device
 543     (aka QSD) which connects to the QMT to allocate/release quota space. The
 544     QSD is setup directly from the OSD layer.</para>
 545     <para>To reduce quota requests, quota space is initially allocated to QSDs
 546     in very large chunks. How much unused quota space can be held by a target
 547     is controlled by the qunit size. When quota space for a given ID is close
 548     to exhaustion on the QMT, the qunit size is reduced and QSDs are notified
 549     of the new qunit size value via a glimpse callback. Slaves are then
 550     responsible for releasing quota space above the new qunit value. The qunit
 551     size isn't shrunk indefinitely and there is a minimal value of 1MB for
 552     blocks and 1,024 for inodes. This means that the quota space rebalancing
 553     process will stop when this minimum value is reached. As a result, quota
 554     exceeded can be returned while many slaves still have 1MB or 1,024 inodes
 555     of spare quota space.</para>
 556     <para>If we look at the
 557     <literal>setquota</literal> example again, running this
 558     <literal>lfs quota</literal> command:</para>
 559     <screen>
 560 # lfs quota -u bob -v /mnt/testfs
 561 </screen>
 562     <para>displays this command output:</para>
 563     <screen>
 564 Disk quotas for user bob (uid 500):
 565 Filesystem          kbytes quota limit grace       files  quota limit grace
 566 /mnt/testfs         30720* 30720 30920 6d23h56m44s 10101* 10000 11000
 567 6d23h59m50s
 568 testfs-MDT0000_UUID 0      -     0     -           10101  -     10240
 569 testfs-OST0000_UUID 0      -     1024  -           -      -     -
 570 testfs-OST0001_UUID 30720* -     29896 -           -      -     -
 571 Total allocated inode limit: 10240, total allocated block limit: 30920
 572 </screen>
 573     <para>The total quota limit of 30,920 is allocated to user bob, which is
 574     further distributed to two OSTs.</para>
 575     <para>Values appended with '
 576     <literal>*</literal>' show that the quota limit has been exceeded, causing
 577     the following error when trying to write or create a file:</para>
 578     <para>
 579       <screen>
 580 $ cp: writing `/mnt/testfs/foo`: Disk quota exceeded.
 581 </screen>
 582     </para>
 583     <note>
 584       <para>It is very important to note that the block quota is consumed per
 585       OST and the inode quota per MDS. Therefore, when the quota is consumed on
 586       one OST (resp. MDT), the client may not be able to create files
 587       regardless of the quota available on other OSTs (resp. MDTs).</para>
 588       <para>Setting the quota limit below the minimal qunit size may prevent
 589       the user/group from all file creation. It is thus recommended to use
 590       soft/hard limits which are a multiple of the number of OSTs * the minimal
 591       qunit size.</para>
 592     </note>
 593     <para>To determine the total number of inodes, use
 594     <literal>lfs df -i</literal>(and also
 595     <literal>lctl get_param *.*.filestotal</literal>). For more information on
 596     using the
 597     <literal>lfs df -i</literal> command and the command output, see
 598     <xref linkend="dbdoclet.checking_free_space" />.</para>
 599     <para>Unfortunately, the
 600     <literal>statfs</literal> interface does not report the free inode count
 601     directly, but instead reports the total inode and used inode counts. The
 602     free inode count is calculated for
 603     <literal>df</literal> from (total inodes - used inodes). It is not critical
 604     to know the total inode count for a file system. Instead, you should know
 605     (accurately), the free inode count and the used inode count for a file
 606     system. The Lustre software manipulates the total inode count in order to
 607     accurately report the other two values.</para>
 608   </section>
 609   <section xml:id="quota_interoperability">
 610     <title>
 611     <indexterm>
 612       <primary>Quotas</primary>
 613       <secondary>Interoperability</secondary>
 614     </indexterm>Quotas and Version Interoperability</title>
 615     <para condition="l2A">To use the project quota functionality introduced in
 616     Lustre 2.10, <emphasis role="bold">all Lustre servers and clients must be
 617     upgraded to Lustre release 2.10 or later for project quota to work
 618     correctly</emphasis>.  Otherwise, project quota will be inaccessible on
 619     clients and not be accounted for on OSTs.  Furthermore, the
 620     <emphasis role="bold">servers may be required to use a patched kernel,
 621     </emphasis> for more information see
 622     <xref linkend="enabling_disk_quotas"/>.</para>
 623   </section>
 624   <section xml:id="granted_cache_and_quota_limits">
 625     <title>
 626     <indexterm>
 627       <primary>Quotas</primary>
 628       <secondary>known issues</secondary>
 629     </indexterm>Granted Cache and Quota Limits</title>
 630     <para>In a Lustre file system, granted cache does not respect quota limits.
 631     In this situation, OSTs grant cache to a Lustre client to accelerate I/O.
 632     Granting cache causes writes to be successful in OSTs, even if they exceed
 633     the quota limits, and will overwrite them.</para>
 634     <para>The sequence is:</para>
 635     <orderedlist>
 636       <listitem>
 637         <para>A user writes files to the Lustre file system.</para>
 638       </listitem>
 639       <listitem>
 640         <para>If the Lustre client has enough granted cache, then it returns
 641         'success' to users and arranges the writes to the OSTs.</para>
 642       </listitem>
 643       <listitem>
 644         <para>Because Lustre clients have delivered success to users, the OSTs
 645         cannot fail these writes.</para>
 646       </listitem>
 647     </orderedlist>
 648     <para>Because of granted cache, writes always overwrite quota limitations.
 649     For example, if you set a 400 GB quota on user A and use IOR to write for
 650     user A from a bundle of clients, you will write much more data than 400 GB,
 651     and cause an out-of-quota error (
 652     <literal>EDQUOT</literal>).</para>
 653     <note>
 654       <para>The effect of granted cache on quota limits can be mitigated, but
 655       not eradicated. Reduce the maximum amount of dirty data on the clients
 656       (minimal value is 1MB):</para>
 657       <itemizedlist>
 658         <listitem>
 659           <para>
 660             <literal>lctl set_param osc.*.max_dirty_mb=8</literal>
 661           </para>
 662         </listitem>
 663       </itemizedlist>
 664     </note>
 665   </section>
 666   <section xml:id="lustre_quota_statistics">
 667     <title>
 668     <indexterm>
 669       <primary>Quotas</primary>
 670       <secondary>statistics</secondary>
 671     </indexterm>Lustre Quota Statistics</title>
 672     <para>The Lustre software includes statistics that monitor quota activity,
 673     such as the kinds of quota RPCs sent during a specific period, the average
 674     time to complete the RPCs, etc. These statistics are useful to measure
 675     performance of a Lustre file system.</para>
 676     <para>Each quota statistic consists of a quota event and
 677     <literal>min_time</literal>,
 678     <literal>max_time</literal> and
 679     <literal>sum_time</literal> values for the event.</para>
 680     <informaltable frame="all">
 681       <tgroup cols="2">
 682         <colspec colname="c1" colwidth="50*" />
 683         <colspec colname="c2" colwidth="50*" />
 684         <thead>
 685           <row>
 686             <entry>
 687               <para>
 688                 <emphasis role="bold">Quota Event</emphasis>
 689               </para>
 690             </entry>
 691             <entry>
 692               <para>
 693                 <emphasis role="bold">Description</emphasis>
 694               </para>
 695             </entry>
 696           </row>
 697         </thead>
 698         <tbody>
 699           <row>
 700             <entry>
 701               <para>
 702                 <emphasis role="bold">sync_acq_req</emphasis>
 703               </para>
 704             </entry>
 705             <entry>
 706               <para>Quota slaves send a acquiring_quota request and wait for
 707               its return.</para>
 708             </entry>
 709           </row>
 710           <row>
 711             <entry>
 712               <para>
 713                 <emphasis role="bold">sync_rel_req</emphasis>
 714               </para>
 715             </entry>
 716             <entry>
 717               <para>Quota slaves send a releasing_quota request and wait for
 718               its return.</para>
 719             </entry>
 720           </row>
 721           <row>
 722             <entry>
 723               <para>
 724                 <emphasis role="bold">async_acq_req</emphasis>
 725               </para>
 726             </entry>
 727             <entry>
 728               <para>Quota slaves send an acquiring_quota request and do not
 729               wait for its return.</para>
 730             </entry>
 731           </row>
 732           <row>
 733             <entry>
 734               <para>
 735                 <emphasis role="bold">async_rel_req</emphasis>
 736               </para>
 737             </entry>
 738             <entry>
 739               <para>Quota slaves send a releasing_quota request and do not wait
 740               for its return.</para>
 741             </entry>
 742           </row>
 743           <row>
 744             <entry>
 745               <para>
 746                 <emphasis role="bold">wait_for_blk_quota
 747                 (lquota_chkquota)</emphasis>
 748               </para>
 749             </entry>
 750             <entry>
 751               <para>Before data is written to OSTs, the OSTs check if the
 752               remaining block quota is sufficient. This is done in the
 753               lquota_chkquota function.</para>
 754             </entry>
 755           </row>
 756           <row>
 757             <entry>
 758               <para>
 759                 <emphasis role="bold">wait_for_ino_quota
 760                 (lquota_chkquota)</emphasis>
 761               </para>
 762             </entry>
 763             <entry>
 764               <para>Before files are created on the MDS, the MDS checks if the
 765               remaining inode quota is sufficient. This is done in the
 766               lquota_chkquota function.</para>
 767             </entry>
 768           </row>
 769           <row>
 770             <entry>
 771               <para>
 772                 <emphasis role="bold">wait_for_blk_quota
 773                 (lquota_pending_commit)</emphasis>
 774               </para>
 775             </entry>
 776             <entry>
 777               <para>After blocks are written to OSTs, relative quota
 778               information is updated. This is done in the lquota_pending_commit
 779               function.</para>
 780             </entry>
 781           </row>
 782           <row>
 783             <entry>
 784               <para>
 785                 <emphasis role="bold">wait_for_ino_quota
 786                 (lquota_pending_commit)</emphasis>
 787               </para>
 788             </entry>
 789             <entry>
 790               <para>After files are created, relative quota information is
 791               updated. This is done in the lquota_pending_commit
 792               function.</para>
 793             </entry>
 794           </row>
 795           <row>
 796             <entry>
 797               <para>
 798                 <emphasis role="bold">wait_for_pending_blk_quota_req
 799                 (qctxt_wait_pending_dqacq)</emphasis>
 800               </para>
 801             </entry>
 802             <entry>
 803               <para>On the MDS or OSTs, there is one thread sending a quota
 804               request for a specific UID/GID for block quota at any time. At
 805               that time, if other threads need to do this too, they should
 806               wait. This is done in the qctxt_wait_pending_dqacq
 807               function.</para>
 808             </entry>
 809           </row>
 810           <row>
 811             <entry>
 812               <para>
 813                 <emphasis role="bold">wait_for_pending_ino_quota_req
 814                 (qctxt_wait_pending_dqacq)</emphasis>
 815               </para>
 816             </entry>
 817             <entry>
 818               <para>On the MDS, there is one thread sending a quota request for
 819               a specific UID/GID for inode quota at any time. If other threads
 820               need to do this too, they should wait. This is done in the
 821               qctxt_wait_pending_dqacq function.</para>
 822             </entry>
 823           </row>
 824           <row>
 825             <entry>
 826               <para>
 827                 <emphasis role="bold">nowait_for_pending_blk_quota_req
 828                 (qctxt_wait_pending_dqacq)</emphasis>
 829               </para>
 830             </entry>
 831             <entry>
 832               <para>On the MDS or OSTs, there is one thread sending a quota
 833               request for a specific UID/GID for block quota at any time. When
 834               threads enter qctxt_wait_pending_dqacq, they do not need to wait.
 835               This is done in the qctxt_wait_pending_dqacq function.</para>
 836             </entry>
 837           </row>
 838           <row>
 839             <entry>
 840               <para>
 841                 <emphasis role="bold">nowait_for_pending_ino_quota_req
 842                 (qctxt_wait_pending_dqacq)</emphasis>
 843               </para>
 844             </entry>
 845             <entry>
 846               <para>On the MDS, there is one thread sending a quota request for
 847               a specific UID/GID for inode quota at any time. When threads
 848               enter qctxt_wait_pending_dqacq, they do not need to wait. This is
 849               done in the qctxt_wait_pending_dqacq function.</para>
 850             </entry>
 851           </row>
 852           <row>
 853             <entry>
 854               <para>
 855                 <emphasis role="bold">quota_ctl</emphasis>
 856               </para>
 857             </entry>
 858             <entry>
 859               <para>The quota_ctl statistic is generated when lfs
 860               <literal>setquota</literal>,
 861               <literal>lfs quota</literal> and so on, are issued.</para>
 862             </entry>
 863           </row>
 864           <row>
 865             <entry>
 866               <para>
 867                 <emphasis role="bold">adjust_qunit</emphasis>
 868               </para>
 869             </entry>
 870             <entry>
 871               <para>Each time qunit is adjusted, it is counted.</para>
 872             </entry>
 873           </row>
 874         </tbody>
 875       </tgroup>
 876     </informaltable>
 877     <section remap="h3">
 878       <title>Interpreting Quota Statistics</title>
 879       <para>Quota statistics are an important measure of the performance of a
 880       Lustre file system. Interpreting these statistics correctly can help you
 881       diagnose problems with quotas, and may indicate adjustments to improve
 882       system performance.</para>
 883       <para>For example, if you run this command on the OSTs:</para>
 884       <screen>
 885 lctl get_param lquota.testfs-OST0000.stats
 886 </screen>
 887       <para>You will get a result similar to this:</para>
 888       <screen>
 889 snapshot_time                                1219908615.506895 secs.usecs
 890 async_acq_req                              1 samples [us]  32 32 32
 891 async_rel_req                              1 samples [us]  5 5 5
 892 nowait_for_pending_blk_quota_req(qctxt_wait_pending_dqacq) 1 samples [us] 2\
 893  2 2
 894 quota_ctl                          4 samples [us]  80 3470 4293
 895 adjust_qunit                               1 samples [us]  70 70 70
 896 ....
 897 </screen>
 898       <para>In the first line,
 899       <literal>snapshot_time</literal> indicates when the statistics were taken.
 900       The remaining lines list the quota events and their associated
 901       data.</para>
 902       <para>In the second line, the
 903       <literal>async_acq_req</literal> event occurs one time. The
 904       <literal>min_time</literal>,
 905       <literal>max_time</literal> and
 906       <literal>sum_time</literal> statistics for this event are 32, 32 and 32,
 907       respectively. The unit is microseconds (μs).</para>
 908       <para>In the fifth line, the quota_ctl event occurs four times. The
 909       <literal>min_time</literal>,
 910       <literal>max_time</literal> and
 911       <literal>sum_time</literal> statistics for this event are 80, 3470 and
 912       4293, respectively. The unit is microseconds (μs).</para>
 913     </section>
 914   </section>
 915   <section xml:id="quota_pools" condition='l2E'>
 916     <title>
 917     <indexterm>
 918       <primary>Quotas</primary>
 919       <secondary>pools</secondary>
 920     </indexterm>Pool Quotas</title>
 921     <para>
 922     OST Pool Quotas feature gives an ability to limit user's (group's/project's)
 923     disk usage at OST pool level. Each OST Pool Quota (PQ) maps directly to the
 924     OST pool of the same name. Thus PQ could be tuned with standard <literal>
 925     lctl pool_new/add/remove/erase</literal> commands. All PQ are subset of a
 926     global pool that includes all OSTs and MDTs (DOM case).
 927     It may be initially confusing to be prevented from using "all of" one quota
 928     due to a different quota setting. In Lustre, a quota is a limit, not a right
 929     to use an amount. You don't always get to use your quota - an OST may be out
 930     of space, or some other quota is limiting. For example, if there is an inode
 931     quota and a space quota, and you hit your inode limit while you still have
 932     plenty of space, you can't use the space. For another example, quotas may
 933     easily be over-allocated: everyone gets 10PB of quota, in a 15PB system.
 934     That does not give them the right to use 10PB, it means they cannot use more
 935     than 10PB. They may very well get ENOSPC long before that - but they will not
 936     get EDQUOT. This behavior already exists in Lustre today, but pool quotas
 937     increase the number of limits in play: user, group or project global space quota
 938     and now all of those limits can also be defined for each pool. In all cases,
 939     the net effect is that the actual amount of space you can use is limited to the
 940     smallest (min) quota out of everything that is applicable.
 941     See more details in
 942     <link xl:href="http://wiki.lustre.org/OST_Pool_Quotas_HLD">
 943     OST Pool Quotas HLD</link>
 944     </para>
 945     <section remap="h3">
 946       <title>DOM and MDT pools</title>
 947       <para>
 948       From Quota Master point of view, "data" MDTs are regular members together
 949       with OSTs. However Pool Quotas support only OSTs as there is currently
 950       no mechanism to group MDTs in pools.
 951       </para>
 952     </section>
 953     <section remap="h3">
 954       <title>Lfs quota/setquota options to setup quota pools</title>
 955       <para>
 956       The same long option <literal>--pool</literal> is used to setup and report
 957       Pool Quotas with <literal>lfs setquota</literal> and <literal>lfs setquota</literal>.
 958       </para>
 959       <para>
 960       <literal>lfs setquota --pool <replaceable>pool_name</replaceable></literal>
 961       is used to set the block and soft usage limit for the user, group, or
 962       project for the specified pool name.
 963       </para>
 964       <para>
 965       <literal>lfs quota --pool <replaceable>pool_name</replaceable></literal>
 966       shows the user, group, or project usage for the specified pool name.
 967       </para>
 968     </section>
 969     <section remap="h3">
 970       <title>Quota pools interoperability</title>
 971       <para>
 972       Both client and server should have at least Lustre 2.14 to support Pool Quotas.
 973       </para>
 974       <note>
 975        <para>Pool Quotas may be able to work with older clients if server
 976        supports Pool Quotas. Pool quotas cannot be viewed or modified by
 977        older clients. Since the quota enforcement is done on the servers, only
 978        a single client is needed to configure the quotas. This could be done by
 979        mounting a client directly on the MDS if needed.
 980        </para>
 981       </note>
 982     </section>
 983     <section remap="h3">
 984       <title>Pool Quotas Hard Limit setup example</title>
 985       <para>
 986       Let's imagine you need to setup quota usage for already existed OST pool
 987       <literal>flash_pool</literal>:
 988       </para>
 989       <screen>
 990 # it is a limit for global pool. PQ don't work properly without that
 991 lfs setquota -u <replaceable>ivan</replaceable> -B<replaceable>100T /mnt/testfs</replaceable>
 992 # set 1TiB block hard limit for ivan in a flash_pool
 993 lfs setquota -u <replaceable>ivan</replaceable> --pool <replaceable>flash_pool</replaceable> -B<replaceable>1T /mnt/testfs</replaceable>
 994       </screen>
 995       <para>
 996       <note>
 997        <para>System-side hard limit is required before setting Quota Pool limit.
 998        If you do not need to limit user at all OSTs and MDTs at system,
 999        only per pool, it is recommended to set some unrealistic big hard limit.
1000        Without a global limit in place the Quota Pool limit will not be enforced.
1001        No matter hard or soft global limit - at least one of them should be set.
1002        </para>
1003       </note>
1004       </para>
1005     </section>
1006     <section remap="h3">
1007       <title>Pool Quotas Soft Limit setup example</title>
1008       <screen>
1009 # notify OSTs to enforce quota for ivan
1010 lfs setquota -u <replaceable>ivan</replaceable> -B<replaceable>10T /mnt/testfs</replaceable>
1011 # soft limit 10MiB for ivan in a pool flash_pool
1012 lfs setquota -u <replaceable>ivan</replaceable> --pool <replaceable>flash_pool</replaceable> -b<replaceable>1T /mnt/testfs</replaceable>
1013 # set block grace 600 s for all users at flash_pool
1014 lfs setquota -t -u --block-grace <replaceable>600</replaceable> --pool <replaceable>flash_pool /mnt/testfs</replaceable>
1015       </screen>
1016     </section>
1017   </section>
1018 </chapter>
1019 <!--
1020   vim:expandtab:shiftwidth=2:tabstop=8:
1021   -->