LustreOperations.xml

   1 <?xml version='1.0' encoding='utf-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4 xml:id="lustreoperations">
   5   <title xml:id="lustreoperations.title">Lustre Operations</title>
   6   <para>Once you have the Lustre file system up and running, you can use the
   7   procedures in this section to perform these basic Lustre administration
   8   tasks.</para>
   9   <section xml:id="dbdoclet.50438194_42877">
  10     <title>
  11     <indexterm>
  12       <primary>operations</primary>
  13     </indexterm>
  14     <indexterm>
  15       <primary>operations</primary>
  16       <secondary>mounting by label</secondary>
  17     </indexterm>Mounting by Label</title>
  18     <para>The file system name is limited to 8 characters. We have encoded the
  19     file system and target information in the disk label, so you can mount by
  20     label. This allows system administrators to move disks around without
  21     worrying about issues such as SCSI disk reordering or getting the
  22     <literal>/dev/device</literal> wrong for a shared target. Soon, file system
  23     naming will be made as fail-safe as possible. Currently, Linux disk labels
  24     are limited to 16 characters. To identify the target within the file
  25     system, 8 characters are reserved, leaving 8 characters for the file system
  26     name:</para>
  27     <screen>
  28 <replaceable>fsname</replaceable>-MDT0000 or
  29 <replaceable>fsname</replaceable>-OST0a19
  30 </screen>
  31     <para>To mount by label, use this command:</para>
  32     <screen>
  33 mount -t lustre -L
  34 <replaceable>file_system_label</replaceable>
  35 <replaceable>/mount_point</replaceable>
  36 </screen>
  37     <para>This is an example of mount-by-label:</para>
  38     <screen>
  39 mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
  40 </screen>
  41     <caution>
  42       <para>Mount-by-label should NOT be used in a multi-path environment or
  43       when snapshots are being created of the device, since multiple block
  44       devices will have the same label.</para>
  45     </caution>
  46     <para>Although the file system name is internally limited to 8 characters,
  47     you can mount the clients at any mount point, so file system users are not
  48     subjected to short names. Here is an example:</para>
  49     <screen>
  50 client# mount -t lustre mds0@tcp0:/short
  51 <replaceable>/dev/long_mountpoint_name</replaceable>
  52 </screen>
  53   </section>
  54   <section xml:id="dbdoclet.50438194_24122">
  55     <title>
  56     <indexterm>
  57       <primary>operations</primary>
  58       <secondary>starting</secondary>
  59     </indexterm>Starting Lustre</title>
  60     <para>On the first start of a Lustre file system, the components must be
  61     started in the following order:</para>
  62     <orderedlist>
  63       <listitem>
  64         <para>Mount the MGT.</para>
  65         <note>
  66           <para>If a combined MGT/MDT is present, Lustre will correctly mount
  67           the MGT and MDT automatically.</para>
  68         </note>
  69       </listitem>
  70       <listitem>
  71         <para>Mount the MDT.</para>
  72         <note>
  73           <para condition='l24'>Mount all MDTs if multiple MDTs are
  74           present.</para>
  75         </note>
  76       </listitem>
  77       <listitem>
  78         <para>Mount the OST(s).</para>
  79       </listitem>
  80       <listitem>
  81         <para>Mount the client(s).</para>
  82       </listitem>
  83     </orderedlist>
  84   </section>
  85   <section xml:id="dbdoclet.50438194_84876">
  86     <title>
  87     <indexterm>
  88       <primary>operations</primary>
  89       <secondary>mounting</secondary>
  90     </indexterm>Mounting a Server</title>
  91     <para>Starting a Lustre server is straightforward and only involves the
  92     mount command. Lustre servers can be added to
  93     <literal>/etc/fstab</literal>:</para>
  94     <screen>
  95 mount -t lustre
  96 </screen>
  97     <para>The mount command generates output similar to this:</para>
  98     <screen>
  99 /dev/sda1 on /mnt/test/mdt type lustre (rw)
 100 /dev/sda2 on /mnt/test/ost0 type lustre (rw)
 101 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
 102 </screen>
 103     <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
 104     mounted.</para>
 105     <screen>
 106 LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
 107 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
 108 </screen>
 109     <para>In general, it is wise to specify noauto and let your
 110     high-availability (HA) package manage when to mount the device. If you are
 111     not using failover, make sure that networking has been started before
 112     mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
 113     Linux Enterprise Server, Debian operating system (and perhaps others), use
 114     the
 115     <literal>_netdev</literal> flag to ensure that these disks are mounted after
 116     the network is up.</para>
 117     <para>We are mounting by disk label here. The label of a device can be read
 118     with
 119     <literal>e2label</literal>. The label of a newly-formatted Lustre server
 120     may end in
 121     <literal>FFFF</literal> if the
 122     <literal>--index</literal> option is not specified to
 123     <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
 124     assignment takes place when the server is first started, and the disk label
 125     is updated. It is recommended that the
 126     <literal>--index</literal> option always be used, which will also ensure
 127     that the label is set at format time.</para>
 128     <caution>
 129       <para>Do not do this when the client and OSS are on the same node, as
 130       memory pressure between the client and OSS can lead to deadlocks.</para>
 131     </caution>
 132     <caution>
 133       <para>Mount-by-label should NOT be used in a multi-path
 134       environment.</para>
 135     </caution>
 136   </section>
 137   <section xml:id="dbdoclet.shutdownLustre">
 138       <title>
 139           <indexterm>
 140               <primary>operations</primary>
 141               <secondary>shutdownLustre</secondary>
 142           </indexterm>Stopping the Filesystem</title>
 143       <para>A complete Lustre filesystem shutdown occurs by unmounting all
 144       clients and servers in the order shown below.  Please note that unmounting
 145       a block device causes the Lustre software to be shut down on that node.
 146       </para>
 147       <note><para>Please note that the <literal>-a -t lustre</literal> in the
 148           commands below is not the name of a filesystem, but rather is
 149           specifying to unmount all entries in /etc/mtab that are of type
 150           <literal>lustre</literal></para></note>
 151       <orderedlist>
 152           <listitem><para>Unmount the clients</para>
 153               <para>On each client node, unmount the filesystem on that client
 154               using the <literal>umount</literal> command:</para>
 155               <para><literal>umount -a -t lustre</literal></para>
 156               <para>The example below shows the unmount of the
 157               <literal>testfs</literal> filesystem on a client node:</para>
 158               <para><screen>[root@client1 ~]# mount |grep testfs
 159 XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs)
 160
 161 [root@client1 ~]# umount -a -t lustre
 162 [154523.177714] Lustre: Unmounted testfs-client</screen></para>
 163           </listitem>
 164           <listitem><para>Unmount the MDT and MGT</para>
 165               <para>On the MGS and MDS node(s), use the <literal>umount</literal>
 166               command:</para>
 167               <para><literal>umount -a -t lustre</literal></para>
 168               <para>The example below shows the unmount of the MDT and MGT for
 169               the <literal>testfs</literal> filesystem on a combined MGS/MDS:
 170               </para>
 171               <para><screen>[root@mds1 ~]# mount |grep lustre
 172 /dev/sda on /mnt/mgt type lustre (ro)
 173 /dev/sdb on /mnt/mdt type lustre (ro)
 174
 175 [root@mds1 ~]# umount -a -t lustre
 176 [155263.566230] Lustre: Failing over testfs-MDT0000
 177 [155263.775355] Lustre: server umount testfs-MDT0000 complete
 178 [155269.843862] Lustre: server umount MGS complete</screen></para>
 179           <para>For a seperate MGS and MDS, the same command is used, first on
 180           the MDS and then followed by the MGS.</para>
 181           </listitem>
 182           <listitem><para>Unmount all the OSTs</para>
 183               <para>On each OSS node, use the <literal>umount</literal> command:
 184               </para>
 185               <para><literal>umount -a -t lustre</literal></para>
 186               <para>The example below shows the unmount of all OSTs for the
 187               <literal>testfs</literal> filesystem on server
 188               <literal>OSS1</literal>:
 189               </para>
 190               <para><screen>[root@oss1 ~]# mount |grep lustre
 191 /dev/sda on /mnt/ost0 type lustre (ro)
 192 /dev/sdb on /mnt/ost1 type lustre (ro)
 193 /dev/sdc on /mnt/ost2 type lustre (ro)
 194
 195 [root@oss1 ~]# umount -a -t lustre
 196 [155336.491445] Lustre: Failing over testfs-OST0002
 197 [155336.556752] Lustre: server umount testfs-OST0002 complete</screen></para>
 198           </listitem>
 199       </orderedlist>
 200       <para>For unmount command syntax for a single OST, MDT, or MGT target
 201       please refer to <xref linkend="dbdoclet.umountTarget"/></para>
 202   </section>
 203   <section xml:id="dbdoclet.umountTarget">
 204     <title>
 205     <indexterm>
 206       <primary>operations</primary>
 207       <secondary>unmounting</secondary>
 208     </indexterm>Unmounting a Specific Target on a Server</title>
 209     <para>To stop a Lustre OST, MDT, or MGT , use the
 210     <literal>umount
 211     <replaceable>/mount_point</replaceable></literal> command.</para>
 212     <para>The example below stops an OST, <literal>ost0</literal>, on mount
 213     point <literal>/mnt/ost0</literal> for the <literal>testfs</literal>
 214     filesystem:</para>
 215     <screen>[root@oss1 ~]# umount /mnt/ost0
 216 [  385.142264] Lustre: Failing over testfs-OST0000
 217 [  385.210810] Lustre: server umount testfs-OST0000 complete</screen>
 218     <para>Gracefully stopping a server with the
 219     <literal>umount</literal> command preserves the state of the connected
 220     clients. The next time the server is started, it waits for clients to
 221     reconnect, and then goes through the recovery procedure.</para>
 222     <para>If the force (
 223     <literal>-f</literal>) flag is used, then the server evicts all clients and
 224     stops WITHOUT recovery. Upon restart, the server does not wait for
 225     recovery. Any currently connected clients receive I/O errors until they
 226     reconnect.</para>
 227     <note>
 228       <para>If you are using loopback devices, use the
 229       <literal>-d</literal> flag. This flag cleans up loop devices and can
 230       always be safely specified.</para>
 231     </note>
 232   </section>
 233   <section xml:id="dbdoclet.50438194_57420">
 234     <title>
 235     <indexterm>
 236       <primary>operations</primary>
 237       <secondary>failover</secondary>
 238     </indexterm>Specifying Failout/Failover Mode for OSTs</title>
 239     <para>In a Lustre file system, an OST that has become unreachable because
 240     it fails, is taken off the network, or is unmounted can be handled in one
 241     of two ways:</para>
 242     <itemizedlist>
 243       <listitem>
 244         <para>In
 245         <literal>failout</literal> mode, Lustre clients immediately receive
 246         errors (EIOs) after a timeout, instead of waiting for the OST to
 247         recover.</para>
 248       </listitem>
 249       <listitem>
 250         <para>In
 251         <literal>failover</literal> mode, Lustre clients wait for the OST to
 252         recover.</para>
 253       </listitem>
 254     </itemizedlist>
 255     <para>By default, the Lustre file system uses
 256     <literal>failover</literal> mode for OSTs. To specify
 257     <literal>failout</literal> mode instead, use the
 258     <literal>--param="failover.mode=failout"</literal> option as shown below
 259     (entered on one line):</para>
 260     <screen>
 261 oss# mkfs.lustre --fsname=
 262 <replaceable>fsname</replaceable> --mgsnode=
 263 <replaceable>mgs_NID</replaceable> --param=failover.mode=failout
 264       --ost --index=
 265 <replaceable>ost_index</replaceable>
 266 <replaceable>/dev/ost_block_device</replaceable>
 267 </screen>
 268     <para>In the example below,
 269     <literal>failout</literal> mode is specified for the OSTs on the MGS
 270     <literal>mds0</literal> in the file system
 271     <literal>testfs</literal>(entered on one line).</para>
 272     <screen>
 273 oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
 274       --ost --index=3 /dev/sdb
 275 </screen>
 276     <caution>
 277       <para>Before running this command, unmount all OSTs that will be affected
 278       by a change in
 279       <literal>failover</literal>/
 280       <literal>failout</literal> mode.</para>
 281     </caution>
 282     <note>
 283       <para>After initial file system configuration, use the
 284       <literal>tunefs.lustre</literal> utility to change the mode. For example,
 285       to set the
 286       <literal>failout</literal> mode, run:</para>
 287       <para>
 288         <screen>
 289 $ tunefs.lustre --param failover.mode=failout
 290 <replaceable>/dev/ost_device</replaceable>
 291 </screen>
 292       </para>
 293     </note>
 294   </section>
 295   <section xml:id="dbdoclet.50438194_54138">
 296     <title>
 297     <indexterm>
 298       <primary>operations</primary>
 299       <secondary>degraded OST RAID</secondary>
 300     </indexterm>Handling Degraded OST RAID Arrays</title>
 301     <para>Lustre includes functionality that notifies Lustre if an external
 302     RAID array has degraded performance (resulting in reduced overall file
 303     system performance), either because a disk has failed and not been
 304     replaced, or because a disk was replaced and is undergoing a rebuild. To
 305     avoid a global performance slowdown due to a degraded OST, the MDS can
 306     avoid the OST for new object allocation if it is notified of the degraded
 307     state.</para>
 308     <para>A parameter for each OST, called
 309     <literal>degraded</literal>, specifies whether the OST is running in
 310     degraded mode or not.</para>
 311     <para>To mark the OST as degraded, use:</para>
 312     <screen>
 313 lctl set_param obdfilter.{OST_name}.degraded=1
 314 </screen>
 315     <para>To mark that the OST is back in normal operation, use:</para>
 316     <screen>
 317 lctl set_param obdfilter.{OST_name}.degraded=0
 318 </screen>
 319     <para>To determine if OSTs are currently in degraded mode, use:</para>
 320     <screen>
 321 lctl get_param obdfilter.*.degraded
 322 </screen>
 323     <para>If the OST is remounted due to a reboot or other condition, the flag
 324     resets to
 325     <literal>0</literal>.</para>
 326     <para>It is recommended that this be implemented by an automated script
 327     that monitors the status of individual RAID devices.</para>
 328   </section>
 329   <section xml:id="dbdoclet.50438194_88063">
 330     <title>
 331     <indexterm>
 332       <primary>operations</primary>
 333       <secondary>multiple file systems</secondary>
 334     </indexterm>Running Multiple Lustre File Systems</title>
 335     <para>Lustre supports multiple file systems provided the combination of
 336     <literal>NID:fsname</literal> is unique. Each file system must be allocated
 337     a unique name during creation with the
 338     <literal>--fsname</literal> parameter. Unique names for file systems are
 339     enforced if a single MGS is present. If multiple MGSs are present (for
 340     example if you have an MGS on every MDS) the administrator is responsible
 341     for ensuring file system names are unique. A single MGS and unique file
 342     system names provides a single point of administration and allows commands
 343     to be issued against the file system even if it is not mounted.</para>
 344     <para>Lustre supports multiple file systems on a single MGS. With a single
 345     MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
 346     to co-exist. For example, multiple MGSs will be necessary if multiple file
 347     systems on different Lustre software versions are to be concurrently
 348     available. With multiple MGSs additional care must be taken to ensure file
 349     system names are unique. Each file system should have a unique fsname among
 350     all systems that may interoperate in the future.</para>
 351     <para>By default, the
 352     <literal>mkfs.lustre</literal> command creates a file system named
 353     <literal>lustre</literal>. To specify a different file system name (limited
 354     to 8 characters) at format time, use the
 355     <literal>--fsname</literal> option:</para>
 356     <para>
 357       <screen>
 358 mkfs.lustre --fsname=
 359 <replaceable>file_system_name</replaceable>
 360 </screen>
 361     </para>
 362     <note>
 363       <para>The MDT, OSTs and clients in the new file system must use the same
 364       file system name (prepended to the device name). For example, for a new
 365       file system named
 366       <literal>foo</literal>, the MDT and two OSTs would be named
 367       <literal>foo-MDT0000</literal>,
 368       <literal>foo-OST0000</literal>, and
 369       <literal>foo-OST0001</literal>.</para>
 370     </note>
 371     <para>To mount a client on the file system, run:</para>
 372     <screen>
 373 client# mount -t lustre
 374 <replaceable>mgsnode</replaceable>:
 375 <replaceable>/new_fsname</replaceable>
 376 <replaceable>/mount_point</replaceable>
 377 </screen>
 378     <para>For example, to mount a client on file system foo at mount point
 379     /mnt/foo, run:</para>
 380     <screen>
 381 client# mount -t lustre mgsnode:/foo /mnt/foo
 382 </screen>
 383     <note>
 384       <para>If a client(s) will be mounted on several file systems, add the
 385       following line to
 386       <literal>/etc/xattr.conf</literal> file to avoid problems when files are
 387       moved between the file systems:
 388       <literal>lustre.* skip</literal></para>
 389     </note>
 390     <note>
 391       <para>To ensure that a new MDT is added to an existing MGS create the MDT
 392       by specifying:
 393       <literal>--mdt --mgsnode=
 394       <replaceable>mgs_NID</replaceable></literal>.</para>
 395     </note>
 396     <para>A Lustre installation with two file systems (
 397     <literal>foo</literal> and
 398     <literal>bar</literal>) could look like this, where the MGS node is
 399     <literal>mgsnode@tcp0</literal> and the mount points are
 400     <literal>/mnt/foo</literal> and
 401     <literal>/mnt/bar</literal>.</para>
 402     <screen>
 403 mgsnode# mkfs.lustre --mgs /dev/sda
 404 mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
 405 /dev/sdb
 406 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
 407 /dev/sda
 408 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
 409 /dev/sdb
 410 mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
 411 /dev/sda
 412 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
 413 /dev/sdc
 414 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
 415 /dev/sdd
 416 </screen>
 417     <para>To mount a client on file system foo at mount point
 418     <literal>/mnt/foo</literal>, run:</para>
 419     <screen>
 420 client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
 421 </screen>
 422     <para>To mount a client on file system bar at mount point
 423     <literal>/mnt/bar</literal>, run:</para>
 424     <screen>
 425 client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
 426 </screen>
 427   </section>
 428   <section xml:id="dbdoclet.lfsmkdir" condition='l24'>
 429     <title>
 430     <indexterm>
 431       <primary>operations</primary>
 432       <secondary>remote directory</secondary>
 433     </indexterm>Creating a sub-directory on a given MDT</title>
 434     <para>Lustre 2.4 enables individual sub-directories to be serviced by
 435     unique MDTs. An administrator can allocate a sub-directory to a given MDT
 436     using the command:</para>
 437     <screen>
 438 client# lfs mkdir –i
 439 <replaceable>mdt_index</replaceable>
 440 <replaceable>/mount_point/remote_dir</replaceable>
 441 </screen>
 442     <para>This command will allocate the sub-directory
 443     <literal>remote_dir</literal> onto the MDT of index
 444     <literal>mdt_index</literal>. For more information on adding additional MDTs
 445     and
 446     <literal>mdt_index</literal> see
 447     <xref linkend='dbdoclet.addmdtindex' />.</para>
 448     <warning>
 449       <para>An administrator can allocate remote sub-directories to separate
 450       MDTs. Creating remote sub-directories in parent directories not hosted on
 451       MDT0 is not recommended. This is because the failure of the parent MDT
 452       will leave the namespace below it inaccessible. For this reason, by
 453       default it is only possible to create remote sub-directories off MDT0. To
 454       relax this restriction and enable remote sub-directories off any MDT, an
 455       administrator must issue the following command on the MGS:
 456       <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir=1</screen>
 457       For Lustre filesystem 'scratch', the command executed is:
 458       <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir=1</screen>
 459       To verify the configuration setting execute the following command on any
 460       MDS:
 461           <screen>mds# lctl get_param mdt.*.enable_remote_dir</screen></para>
 462     </warning>
 463     <para condition='l28'>With Lustre software version 2.8, a new
 464     tunable is available to allow users with a specific group ID to create
 465     and delete remote and striped directories. This tunable is
 466     <literal>enable_remote_dir_gid</literal>. For example, setting this
 467     parameter to the 'wheel' or 'admin' group ID allows users with that GID
 468     to create and delete remote and striped directories. Setting this
 469     parameter to <literal>-1</literal> on MDT0 to permanently allow any
 470     non-root users create and delete remote and striped directories.
 471     On the MGS execute the following command:
 472     <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir_gid=-1</screen>
 473     For the Lustre filesystem 'scratch', the commands expands to:
 474     <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1</screen>.
 475     The change can be verified by executing the following command on every MDS:
 476     <screen>mds# lctl get_param mdt.<replaceable>*</replaceable>.enable_remote_dir_gid</screen>
 477     </para>
 478   </section>
 479   <section xml:id="dbdoclet.lfsmkdirdne2" condition='l28'>
 480     <title>
 481     <indexterm>
 482       <primary>operations</primary>
 483       <secondary>striped directory</secondary>
 484     </indexterm>
 485     <indexterm>
 486       <primary>operations</primary>
 487       <secondary>mkdir</secondary>
 488     </indexterm>
 489     <indexterm>
 490       <primary>operations</primary>
 491       <secondary>setdirstripe</secondary>
 492     </indexterm>
 493     <indexterm>
 494       <primary>striping</primary>
 495       <secondary>metadata</secondary>
 496     </indexterm>Creating a directory striped across multiple MDTs</title>
 497     <para>The Lustre 2.8 DNE feature enables individual files in a given
 498     directory to store their metadata on separate MDTs (a <emphasis>striped
 499     directory</emphasis>) once additional MDTs have been added to the
 500     filesystem, see <xref linkend="dbdoclet.addingamdt"/>.
 501     The result of this is that metadata requests for
 502     files in a striped directory are serviced by multiple MDTs and metadata
 503     service load is distributed over all the MDTs that service a given
 504     directory. By distributing metadata service load over multiple MDTs,
 505     performance can be improved beyond the limit of single MDT
 506     performance. Prior to the development of this feature all files in a
 507     directory must record their metadata on a single MDT.</para>
 508     <para>This command to stripe a directory over
 509     <replaceable>mdt_count</replaceable> MDTs is:
 510     </para>
 511     <screen>
 512 client# lfs mkdir -c
 513 <replaceable>mdt_count</replaceable>
 514 <replaceable>/mount_point/new_directory</replaceable>
 515 </screen>
 516     <para>The striped directory feature is most useful for distributing
 517     single large directories (50k entries or more) across multiple MDTs,
 518     since it incurs more overhead than non-striped directories.</para>
 519   </section>
 520   <section xml:id="dbdoclet.50438194_88980">
 521     <title>
 522     <indexterm>
 523       <primary>operations</primary>
 524       <secondary>parameters</secondary>
 525     </indexterm>Setting and Retrieving Lustre Parameters</title>
 526     <para>Several options are available for setting parameters in
 527     Lustre:</para>
 528     <itemizedlist>
 529       <listitem>
 530         <para>When creating a file system, use mkfs.lustre. See
 531         <xref linkend="dbdoclet.50438194_17237" />below.</para>
 532       </listitem>
 533       <listitem>
 534         <para>When a server is stopped, use tunefs.lustre. See
 535         <xref linkend="dbdoclet.50438194_55253" />below.</para>
 536       </listitem>
 537       <listitem>
 538         <para>When the file system is running, use lctl to set or retrieve
 539         Lustre parameters. See
 540         <xref linkend="dbdoclet.50438194_51490" />and
 541         <xref linkend="dbdoclet.50438194_63247" />below.</para>
 542       </listitem>
 543     </itemizedlist>
 544     <section xml:id="dbdoclet.50438194_17237">
 545       <title>Setting Tunable Parameters with
 546       <literal>mkfs.lustre</literal></title>
 547       <para>When the file system is first formatted, parameters can simply be
 548       added as a
 549       <literal>--param</literal> option to the
 550       <literal>mkfs.lustre</literal> command. For example:</para>
 551       <screen>
 552 mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
 553 </screen>
 554       <para>For more details about creating a file system,see
 555       <xref linkend="configuringlustre" />. For more details about
 556       <literal>mkfs.lustre</literal>, see
 557       <xref linkend="systemconfigurationutilities" />.</para>
 558     </section>
 559     <section xml:id="dbdoclet.50438194_55253">
 560       <title>Setting Parameters with
 561       <literal>tunefs.lustre</literal></title>
 562       <para>If a server (OSS or MDS) is stopped, parameters can be added to an
 563       existing file system using the
 564       <literal>--param</literal> option to the
 565       <literal>tunefs.lustre</literal> command. For example:</para>
 566       <screen>
 567 oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
 568 </screen>
 569       <para>With
 570       <literal>tunefs.lustre</literal>, parameters are
 571       <emphasis>additive</emphasis>-- new parameters are specified in addition
 572       to old parameters, they do not replace them. To erase all old
 573       <literal>tunefs.lustre</literal> parameters and just use newly-specified
 574       parameters, run:</para>
 575       <screen>
 576 mds# tunefs.lustre --erase-params --param=
 577 <replaceable>new_parameters</replaceable>
 578 </screen>
 579       <para>The tunefs.lustre command can be used to set any parameter settable
 580       in a /proc/fs/lustre file and that has its own OBD device, so it can be
 581       specified as
 582       <literal>
 583       <replaceable>obdname|fsname</replaceable>.
 584       <replaceable>obdtype</replaceable>.
 585       <replaceable>proc_file_name</replaceable>=
 586       <replaceable>value</replaceable></literal>. For example:</para>
 587       <screen>
 588 mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
 589 </screen>
 590       <para>For more details about
 591       <literal>tunefs.lustre</literal>, see
 592       <xref linkend="systemconfigurationutilities" />.</para>
 593     </section>
 594     <section xml:id="dbdoclet.50438194_51490">
 595       <title>Setting Parameters with
 596       <literal>lctl</literal></title>
 597       <para>When the file system is running, the
 598       <literal>lctl</literal> command can be used to set parameters (temporary
 599       or permanent) and report current parameter values. Temporary parameters
 600       are active as long as the server or client is not shut down. Permanent
 601       parameters live through server and client reboots.</para>
 602       <note>
 603         <para>The lctl list_param command enables users to list all parameters
 604         that can be set. See
 605         <xref linkend="dbdoclet.50438194_88217" />.</para>
 606       </note>
 607       <para>For more details about the
 608       <literal>lctl</literal> command, see the examples in the sections below
 609       and
 610       <xref linkend="systemconfigurationutilities" />.</para>
 611       <section remap="h4">
 612         <title>Setting Temporary Parameters</title>
 613         <para>Use
 614         <literal>lctl set_param</literal> to set temporary parameters on the
 615         node where it is run. These parameters map to items in
 616         <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
 617         <literal>lctl set_param</literal> command uses this syntax:</para>
 618         <screen>
 619 lctl set_param [-n]
 620 <replaceable>obdtype</replaceable>.
 621 <replaceable>obdname</replaceable>.
 622 <replaceable>proc_file_name</replaceable>=
 623 <replaceable>value</replaceable>
 624 </screen>
 625         <para>For example:</para>
 626         <screen>
 627 # lctl set_param osc.*.max_dirty_mb=1024
 628 osc.myth-OST0000-osc.max_dirty_mb=32
 629 osc.myth-OST0001-osc.max_dirty_mb=32
 630 osc.myth-OST0002-osc.max_dirty_mb=32
 631 osc.myth-OST0003-osc.max_dirty_mb=32
 632 osc.myth-OST0004-osc.max_dirty_mb=32
 633 </screen>
 634       </section>
 635       <section xml:id="dbdoclet.50438194_64195">
 636         <title>Setting Permanent Parameters</title>
 637         <para>Use the
 638         <literal>lctl conf_param</literal> command to set permanent parameters.
 639         In general, the
 640         <literal>lctl conf_param</literal> command can be used to specify any
 641         parameter settable in a
 642         <literal>/proc/fs/lustre</literal> file, with its own OBD device. The
 643         <literal>lctl conf_param</literal> command uses this syntax (same as the
 644
 645         <literal>mkfs.lustre</literal> and
 646         <literal>tunefs.lustre</literal> commands):</para>
 647         <screen>
 648 <replaceable>obdname|fsname</replaceable>.
 649 <replaceable>obdtype</replaceable>.
 650 <replaceable>proc_file_name</replaceable>=
 651 <replaceable>value</replaceable>)
 652 </screen>
 653         <para>Here are a few examples of
 654         <literal>lctl conf_param</literal> commands:</para>
 655         <screen>
 656 mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
 657 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
 658 $ lctl conf_param testfs.llite.max_read_ahead_mb=16
 659 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M
 660 $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
 661 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
 662 $ lctl conf_param testfs.sys.timeout=40
 663 </screen>
 664         <caution>
 665           <para>Parameters specified with the
 666           <literal>lctl conf_param</literal> command are set permanently in the
 667           file system's configuration file on the MGS.</para>
 668         </caution>
 669       </section>
 670       <section xml:id="dbdoclet.setparamp" condition='l25'>
 671         <title>Setting Permanent Parameters with lctl set_param -P</title>
 672         <para>Use the
 673         <literal>lctl set_param -P</literal> to set parameters permanently. This
 674         command must be issued on the MGS. The given parameter is set on every
 675         host using
 676         <literal>lctl</literal> upcall. Parameters map to items in
 677         <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
 678         <literal>lctl set_param</literal> command uses this syntax:</para>
 679         <screen>
 680 lctl set_param -P
 681 <replaceable>obdtype</replaceable>.
 682 <replaceable>obdname</replaceable>.
 683 <replaceable>proc_file_name</replaceable>=
 684 <replaceable>value</replaceable>
 685 </screen>
 686         <para>For example:</para>
 687         <screen>
 688 # lctl set_param -P osc.*.max_dirty_mb=1024
 689 osc.myth-OST0000-osc.max_dirty_mb=32
 690 osc.myth-OST0001-osc.max_dirty_mb=32
 691 osc.myth-OST0002-osc.max_dirty_mb=32
 692 osc.myth-OST0003-osc.max_dirty_mb=32
 693 osc.myth-OST0004-osc.max_dirty_mb=32
 694 </screen>
 695         <para>Use
 696         <literal>-d</literal>(only with -P) option to delete permanent
 697         parameter. Syntax:</para>
 698         <screen>
 699 lctl set_param -P -d
 700 <replaceable>obdtype</replaceable>.
 701 <replaceable>obdname</replaceable>.
 702 <replaceable>proc_file_name</replaceable>
 703 </screen>
 704         <para>For example:</para>
 705         <screen>
 706 # lctl set_param -P -d osc.*.max_dirty_mb
 707 </screen>
 708       </section>
 709       <section xml:id="dbdoclet.50438194_88217">
 710         <title>Listing Parameters</title>
 711         <para>To list Lustre or LNet parameters that are available to set, use
 712         the
 713         <literal>lctl list_param</literal> command. For example:</para>
 714         <screen>
 715 lctl list_param [-FR]
 716 <replaceable>obdtype</replaceable>.
 717 <replaceable>obdname</replaceable>
 718 </screen>
 719         <para>The following arguments are available for the
 720         <literal>lctl list_param</literal> command.</para>
 721         <para>
 722         <literal>-F</literal> Add '
 723         <literal>/</literal>', '
 724         <literal>@</literal>' or '
 725         <literal>=</literal>' for directories, symlinks and writeable files,
 726         respectively</para>
 727         <para>
 728         <literal>-R</literal> Recursively lists all parameters under the
 729         specified path</para>
 730         <para>For example:</para>
 731         <screen>
 732 oss# lctl list_param obdfilter.lustre-OST0000
 733 </screen>
 734       </section>
 735       <section xml:id="dbdoclet.50438194_63247">
 736         <title>Reporting Current Parameter Values</title>
 737         <para>To report current Lustre parameter values, use the
 738         <literal>lctl get_param</literal> command with this syntax:</para>
 739         <screen>
 740 lctl get_param [-n]
 741 <replaceable>obdtype</replaceable>.
 742 <replaceable>obdname</replaceable>.
 743 <replaceable>proc_file_name</replaceable>
 744 </screen>
 745         <para>This example reports data on RPC service times.</para>
 746         <screen>
 747 oss# lctl get_param -n ost.*.ost_io.timeouts
 748 service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
 749 </screen>
 750         <para>This example reports the amount of space this client has reserved
 751         for writeback cache with each OST:</para>
 752         <screen>
 753 client# lctl get_param osc.*.cur_grant_bytes
 754 osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
 755 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
 756 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
 757 osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
 758 osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
 759 </screen>
 760       </section>
 761     </section>
 762   </section>
 763   <section xml:id="dbdoclet.50438194_41817">
 764     <title>
 765     <indexterm>
 766       <primary>operations</primary>
 767       <secondary>failover</secondary>
 768     </indexterm>Specifying NIDs and Failover</title>
 769     <para>If a node has multiple network interfaces, it may have multiple NIDs,
 770     which must all be identified so other nodes can choose the NID that is
 771     appropriate for their network interfaces. Typically, NIDs are specified in
 772     a list delimited by commas (
 773     <literal>,</literal>). However, when failover nodes are specified, the NIDs
 774     are delimited by a colon (
 775     <literal>:</literal>) or by repeating a keyword such as
 776     <literal>--mgsnode=</literal> or
 777     <literal>--servicenode=</literal>).</para>
 778     <para>To display the NIDs of all servers in networks configured to work
 779     with the Lustre file system, run (while LNet is running):</para>
 780     <screen>
 781 lctl list_nids
 782 </screen>
 783     <para>In the example below,
 784     <literal>mds0</literal> and
 785     <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
 786     and
 787     <literal>oss0</literal> and
 788     <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
 789     address for
 790     <literal>mds0</literal> is 192.168.10.1, and for
 791     <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
 792     <literal>oss0</literal> and
 793     <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
 794     respectively.</para>
 795     <screen>
 796 mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
 797         --servicenode=192.168.10.2@tcp0 \
 798         -–servicenode=192.168.10.1@tcp0 /dev/sda1
 799 mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
 800 oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \
 801         --servicenode=192.168.10.21 --ost --index=0 \
 802         --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \
 803         /dev/sdb
 804 oss0# mount -t lustre /dev/sdb /mnt/test/ost0
 805 client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \
 806         /mnt/testfs
 807 mds0# umount /mnt/mdt
 808 mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
 809 mds1# lctl get_param mdt.testfs-MDT0000.recovery_status
 810 </screen>
 811     <para>Where multiple NIDs are specified separated by commas (for example,
 812     <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
 813     to the same host, and the Lustre software chooses the
 814     <emphasis>best</emphasis> one for communication. When a pair of NIDs is
 815     separated by a colon (for example,
 816     <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
 817     to two different hosts and are treated as a failover pair (the Lustre
 818     software tries the first one, and if that fails, it tries the second
 819     one.)</para>
 820     <para>Two options to
 821     <literal>mkfs.lustre</literal> can be used to specify failover nodes.
 822     Introduced in Lustre software release 2.0, the
 823     <literal>--servicenode</literal> option is used to specify all service NIDs,
 824     including those for primary nodes and failover nodes. When the
 825     <literal>--servicenode</literal> option is used, the first service node to
 826     load the target device becomes the primary service node, while nodes
 827     corresponding to the other specified NIDs become failover locations for the
 828     target device. An older option,
 829     <literal>--failnode</literal>, specifies just the NIDS of failover nodes.
 830     For more information about the
 831     <literal>--servicenode</literal> and
 832     <literal>--failnode</literal> options, see
 833     <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 834     linkend="configuringfailover" />.</para>
 835   </section>
 836   <section xml:id="dbdoclet.50438194_70905">
 837     <title>
 838     <indexterm>
 839       <primary>operations</primary>
 840       <secondary>erasing a file system</secondary>
 841     </indexterm>Erasing a File System</title>
 842     <para>If you want to erase a file system and permanently delete all the
 843     data in the file system, run this command on your targets:</para>
 844     <screen>
 845 $ "mkfs.lustre --reformat"
 846 </screen>
 847     <para>If you are using a separate MGS and want to keep other file systems
 848     defined on that MGS, then set the
 849     <literal>writeconf</literal> flag on the MDT for that file system. The
 850     <literal>writeconf</literal> flag causes the configuration logs to be
 851     erased; they are regenerated the next time the servers start.</para>
 852     <para>To set the
 853     <literal>writeconf</literal> flag on the MDT:</para>
 854     <orderedlist>
 855       <listitem>
 856         <para>Unmount all clients/servers using this file system, run:</para>
 857         <screen>
 858 $ umount /mnt/lustre
 859 </screen>
 860       </listitem>
 861       <listitem>
 862         <para>Permanently erase the file system and, presumably, replace it
 863         with another file system, run:</para>
 864         <screen>
 865 $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
 866 <emphasis>{mdsdev}</emphasis>
 867 </screen>
 868       </listitem>
 869       <listitem>
 870         <para>If you have a separate MGS (that you do not want to reformat),
 871         then add the
 872         <literal>--writeconf</literal> flag to
 873         <literal>mkfs.lustre</literal> on the MDT, run:</para>
 874         <screen>
 875 $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
 876 <replaceable>mgs_nid</replaceable> --mdt --index=0
 877 <replaceable>/dev/mds_device</replaceable>
 878 </screen>
 879       </listitem>
 880     </orderedlist>
 881     <note>
 882       <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
 883       MGS as well, causing all configuration information to be lost; you can
 884       start building your new file system. Nothing needs to be done with old
 885       disks that will not be part of the new file system, just do not mount
 886       them.</para>
 887     </note>
 888   </section>
 889   <section xml:id="dbdoclet.50438194_16954">
 890     <title>
 891     <indexterm>
 892       <primary>operations</primary>
 893       <secondary>reclaiming space</secondary>
 894     </indexterm>Reclaiming Reserved Disk Space</title>
 895     <para>All current Lustre installations run the ldiskfs file system
 896     internally on service nodes. By default, ldiskfs reserves 5% of the disk
 897     space to avoid file system fragmentation. In order to reclaim this space,
 898     run the following command on your OSS for each OST in the file
 899     system:</para>
 900     <screen>
 901 tune2fs [-m reserved_blocks_percent] /dev/
 902 <emphasis>{ostdev}</emphasis>
 903 </screen>
 904     <para>You do not need to shut down Lustre before running this command or
 905     restart it afterwards.</para>
 906     <warning>
 907       <para>Reducing the space reservation can cause severe performance
 908       degradation as the OST file system becomes more than 95% full, due to
 909       difficulty in locating large areas of contiguous free space. This
 910       performance degradation may persist even if the space usage drops below
 911       95% again. It is recommended NOT to reduce the reserved disk space below
 912       5%.</para>
 913     </warning>
 914   </section>
 915   <section xml:id="dbdoclet.50438194_69998">
 916     <title>
 917     <indexterm>
 918       <primary>operations</primary>
 919       <secondary>replacing an OST or MDS</secondary>
 920     </indexterm>Replacing an Existing OST or MDT</title>
 921     <para>To copy the contents of an existing OST to a new OST (or an old MDT
 922     to a new MDT), follow the process for either OST/MDT backups in
 923     <xref linkend='dbdoclet.backup_device' />or
 924     <xref linkend='dbdoclet.backup_target_filesystem' />.
 925     For more information on removing a MDT, see
 926     <xref linkend='dbdoclet.rmremotedir' />.</para>
 927   </section>
 928   <section xml:id="dbdoclet.50438194_30872">
 929     <title>
 930     <indexterm>
 931       <primary>operations</primary>
 932       <secondary>identifying OSTs</secondary>
 933     </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
 934     <para>Use this procedure to identify the file containing a given object on
 935     a given OST.</para>
 936     <orderedlist>
 937       <listitem>
 938         <para>On the OST (as root), run
 939         <literal>debugfs</literal> to display the file identifier (
 940         <literal>FID</literal>) of the file associated with the object.</para>
 941         <para>For example, if the object is
 942         <literal>34976</literal> on
 943         <literal>/dev/lustre/ost_test2</literal>, the debug command is:
 944         <screen>
 945 # debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
 946 </screen></para>
 947         <para>The command output is:
 948         <screen>
 949 debugfs 1.42.3.wc3 (15-Aug-2012)
 950 /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
 951 Inode: 352365   Type: regular    Mode:  0666   Flags: 0x80000
 952 Generation: 2393149953    Version: 0x0000002a:00005f81
 953 User:  1000   Group:  1000   Size: 260096
 954 File ACL: 0    Directory ACL: 0
 955 Links: 1   Blockcount: 512
 956 Fragment:  Address: 0    Number: 0    Size: 0
 957 ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
 958 atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
 959 mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
 960 crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
 961 Size of extra inode fields: 24
 962 Extended attributes stored in inode body:
 963   fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
 964 00 00 00 00 00 00 00 00 " (32)
 965   fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1
 966 EXTENTS:
 967 (0-64):4620544-4620607
 968 </screen></para>
 969       </listitem>
 970       <listitem>
 971         <para>For Lustre software release 2.x file systems, the parent FID will
 972         be of the form [0x200000400:0x122:0x0] and can be resolved directly
 973         using the
 974         <literal>lfs fid2path [0x200000404:0x122:0x0]
 975         /mnt/lustre</literal> command on any Lustre client, and the process is
 976         complete.</para>
 977       </listitem>
 978       <listitem>
 979         <para>In this example the parent inode FID is an upgraded 1.x inode
 980         (due to the first part of the FID being below 0x200000400), the MDT
 981         inode number is
 982         <literal>0x24dab9</literal> and generation
 983         <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved
 984         using
 985         <literal>debugfs</literal>.</para>
 986       </listitem>
 987       <listitem>
 988         <para>On the MDS (as root), use
 989         <literal>debugfs</literal> to find the file associated with the
 990         inode:</para>
 991         <screen>
 992 # debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
 993 </screen>
 994         <para>Here is the command output:</para>
 995         <screen>
 996 debugfs 1.42.3.wc2 (15-Aug-2012)
 997 /dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
 998 s
 999 Inode      Pathname
1000 2415289    /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
1001 </screen>
1002       </listitem>
1003     </orderedlist>
1004     <para>The command lists the inode and pathname associated with the
1005     object.</para>
1006     <note>
1007       <para>
1008       <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
1009       take a long time to complete.</para>
1010     </note>
1011     <note>
1012       <para>To find the Lustre file from a disk LBA, follow the steps listed in
1013       the document at this URL:
1014       <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">
1015       http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then,
1016       follow the steps above to resolve the Lustre filename.</para>
1017     </note>
1018   </section>
1019 </chapter>