LustreOperations.xml

   1 <?xml version='1.0' encoding='utf-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4 xml:id="lustreoperations">
   5   <title xml:id="lustreoperations.title">Lustre Operations</title>
   6   <para>Once you have the Lustre file system up and running, you can use the
   7   procedures in this section to perform these basic Lustre administration
   8   tasks:</para>
   9   <itemizedlist>
  10     <listitem>
  11       <para>
  12         <xref linkend="dbdoclet.50438194_42877" />
  13       </para>
  14     </listitem>
  15     <listitem>
  16       <para>
  17         <xref linkend="dbdoclet.50438194_24122" />
  18       </para>
  19     </listitem>
  20     <listitem>
  21       <para>
  22         <xref linkend="dbdoclet.50438194_84876" />
  23       </para>
  24     </listitem>
  25     <listitem>
  26       <para>
  27         <xref linkend="dbdoclet.50438194_69255" />
  28       </para>
  29     </listitem>
  30     <listitem>
  31       <para>
  32         <xref linkend="dbdoclet.50438194_57420" />
  33       </para>
  34     </listitem>
  35     <listitem>
  36       <para>
  37         <xref linkend="dbdoclet.50438194_54138" />
  38       </para>
  39     </listitem>
  40     <listitem>
  41       <para>
  42         <xref linkend="dbdoclet.50438194_88063" />
  43       </para>
  44     </listitem>
  45     <listitem>
  46       <para>
  47         <xref linkend="dbdoclet.lfsmkdir" />
  48       </para>
  49     </listitem>
  50     <listitem>
  51       <para>
  52         <xref linkend="dbdoclet.50438194_88980" />
  53       </para>
  54     </listitem>
  55     <listitem>
  56       <para>
  57         <xref linkend="dbdoclet.50438194_41817" />
  58       </para>
  59     </listitem>
  60     <listitem>
  61       <para>
  62         <xref linkend="dbdoclet.50438194_70905" />
  63       </para>
  64     </listitem>
  65     <listitem>
  66       <para>
  67         <xref linkend="dbdoclet.50438194_16954" />
  68       </para>
  69     </listitem>
  70     <listitem>
  71       <para>
  72         <xref linkend="dbdoclet.50438194_69998" />
  73       </para>
  74     </listitem>
  75     <listitem>
  76       <para>
  77         <xref linkend="dbdoclet.50438194_30872" />
  78       </para>
  79     </listitem>
  80   </itemizedlist>
  81   <section xml:id="dbdoclet.50438194_42877">
  82     <title>
  83     <indexterm>
  84       <primary>operations</primary>
  85     </indexterm>
  86     <indexterm>
  87       <primary>operations</primary>
  88       <secondary>mounting by label</secondary>
  89     </indexterm>Mounting by Label</title>
  90     <para>The file system name is limited to 8 characters. We have encoded the
  91     file system and target information in the disk label, so you can mount by
  92     label. This allows system administrators to move disks around without
  93     worrying about issues such as SCSI disk reordering or getting the
  94     <literal>/dev/device</literal> wrong for a shared target. Soon, file system
  95     naming will be made as fail-safe as possible. Currently, Linux disk labels
  96     are limited to 16 characters. To identify the target within the file
  97     system, 8 characters are reserved, leaving 8 characters for the file system
  98     name:</para>
  99     <screen>
 100 <replaceable>fsname</replaceable>-MDT0000 or
 101 <replaceable>fsname</replaceable>-OST0a19
 102 </screen>
 103     <para>To mount by label, use this command:</para>
 104     <screen>
 105 mount -t lustre -L
 106 <replaceable>file_system_label</replaceable>
 107 <replaceable>/mount_point</replaceable>
 108 </screen>
 109     <para>This is an example of mount-by-label:</para>
 110     <screen>
 111 mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
 112 </screen>
 113     <caution>
 114       <para>Mount-by-label should NOT be used in a multi-path environment or
 115       when snapshots are being created of the device, since multiple block
 116       devices will have the same label.</para>
 117     </caution>
 118     <para>Although the file system name is internally limited to 8 characters,
 119     you can mount the clients at any mount point, so file system users are not
 120     subjected to short names. Here is an example:</para>
 121     <screen>
 122 client# mount -t lustre mds0@tcp0:/short
 123 <replaceable>/dev/long_mountpoint_name</replaceable>
 124 </screen>
 125   </section>
 126   <section xml:id="dbdoclet.50438194_24122">
 127     <title>
 128     <indexterm>
 129       <primary>operations</primary>
 130       <secondary>starting</secondary>
 131     </indexterm>Starting Lustre</title>
 132     <para>On the first start of a Lustre file system, the components must be
 133     started in the following order:</para>
 134     <orderedlist>
 135       <listitem>
 136         <para>Mount the MGT.</para>
 137         <note>
 138           <para>If a combined MGT/MDT is present, Lustre will correctly mount
 139           the MGT and MDT automatically.</para>
 140         </note>
 141       </listitem>
 142       <listitem>
 143         <para>Mount the MDT.</para>
 144         <note>
 145           <para condition='l24'>Mount all MDTs if multiple MDTs are
 146           present.</para>
 147         </note>
 148       </listitem>
 149       <listitem>
 150         <para>Mount the OST(s).</para>
 151       </listitem>
 152       <listitem>
 153         <para>Mount the client(s).</para>
 154       </listitem>
 155     </orderedlist>
 156   </section>
 157   <section xml:id="dbdoclet.50438194_84876">
 158     <title>
 159     <indexterm>
 160       <primary>operations</primary>
 161       <secondary>mounting</secondary>
 162     </indexterm>Mounting a Server</title>
 163     <para>Starting a Lustre server is straightforward and only involves the
 164     mount command. Lustre servers can be added to
 165     <literal>/etc/fstab</literal>:</para>
 166     <screen>
 167 mount -t lustre
 168 </screen>
 169     <para>The mount command generates output similar to this:</para>
 170     <screen>
 171 /dev/sda1 on /mnt/test/mdt type lustre (rw)
 172 /dev/sda2 on /mnt/test/ost0 type lustre (rw)
 173 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
 174 </screen>
 175     <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
 176     mounted.</para>
 177     <screen>
 178 LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
 179 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
 180 </screen>
 181     <para>In general, it is wise to specify noauto and let your
 182     high-availability (HA) package manage when to mount the device. If you are
 183     not using failover, make sure that networking has been started before
 184     mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
 185     Linux Enterprise Server, Debian operating system (and perhaps others), use
 186     the
 187     <literal>_netdev</literal> flag to ensure that these disks are mounted after
 188     the network is up.</para>
 189     <para>We are mounting by disk label here. The label of a device can be read
 190     with
 191     <literal>e2label</literal>. The label of a newly-formatted Lustre server
 192     may end in
 193     <literal>FFFF</literal> if the
 194     <literal>--index</literal> option is not specified to
 195     <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
 196     assignment takes place when the server is first started, and the disk label
 197     is updated. It is recommended that the
 198     <literal>--index</literal> option always be used, which will also ensure
 199     that the label is set at format time.</para>
 200     <caution>
 201       <para>Do not do this when the client and OSS are on the same node, as
 202       memory pressure between the client and OSS can lead to deadlocks.</para>
 203     </caution>
 204     <caution>
 205       <para>Mount-by-label should NOT be used in a multi-path
 206       environment.</para>
 207     </caution>
 208   </section>
 209   <section xml:id="dbdoclet.50438194_69255">
 210     <title>
 211     <indexterm>
 212       <primary>operations</primary>
 213       <secondary>unmounting</secondary>
 214     </indexterm>Unmounting a Server</title>
 215     <para>To stop a Lustre server, use the
 216     <literal>umount
 217     <replaceable>/mount</replaceable>
 218     <replaceable>point</replaceable></literal> command.</para>
 219     <para>For example, to stop
 220     <literal>ost0</literal> on mount point
 221     <literal>/mnt/test</literal>, run:</para>
 222     <screen>
 223 $ umount /mnt/test
 224 </screen>
 225     <para>Gracefully stopping a server with the
 226     <literal>umount</literal> command preserves the state of the connected
 227     clients. The next time the server is started, it waits for clients to
 228     reconnect, and then goes through the recovery procedure.</para>
 229     <para>If the force (
 230     <literal>-f</literal>) flag is used, then the server evicts all clients and
 231     stops WITHOUT recovery. Upon restart, the server does not wait for
 232     recovery. Any currently connected clients receive I/O errors until they
 233     reconnect.</para>
 234     <note>
 235       <para>If you are using loopback devices, use the
 236       <literal>-d</literal> flag. This flag cleans up loop devices and can
 237       always be safely specified.</para>
 238     </note>
 239   </section>
 240   <section xml:id="dbdoclet.50438194_57420">
 241     <title>
 242     <indexterm>
 243       <primary>operations</primary>
 244       <secondary>failover</secondary>
 245     </indexterm>Specifying Failout/Failover Mode for OSTs</title>
 246     <para>In a Lustre file system, an OST that has become unreachable because
 247     it fails, is taken off the network, or is unmounted can be handled in one
 248     of two ways:</para>
 249     <itemizedlist>
 250       <listitem>
 251         <para>In
 252         <literal>failout</literal> mode, Lustre clients immediately receive
 253         errors (EIOs) after a timeout, instead of waiting for the OST to
 254         recover.</para>
 255       </listitem>
 256       <listitem>
 257         <para>In
 258         <literal>failover</literal> mode, Lustre clients wait for the OST to
 259         recover.</para>
 260       </listitem>
 261     </itemizedlist>
 262     <para>By default, the Lustre file system uses
 263     <literal>failover</literal> mode for OSTs. To specify
 264     <literal>failout</literal> mode instead, use the
 265     <literal>--param="failover.mode=failout"</literal> option as shown below
 266     (entered on one line):</para>
 267     <screen>
 268 oss# mkfs.lustre --fsname=
 269 <replaceable>fsname</replaceable> --mgsnode=
 270 <replaceable>mgs_NID</replaceable> --param=failover.mode=failout
 271       --ost --index=
 272 <replaceable>ost_index</replaceable>
 273 <replaceable>/dev/ost_block_device</replaceable>
 274 </screen>
 275     <para>In the example below,
 276     <literal>failout</literal> mode is specified for the OSTs on the MGS
 277     <literal>mds0</literal> in the file system
 278     <literal>testfs</literal>(entered on one line).</para>
 279     <screen>
 280 oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
 281       --ost --index=3 /dev/sdb
 282 </screen>
 283     <caution>
 284       <para>Before running this command, unmount all OSTs that will be affected
 285       by a change in
 286       <literal>failover</literal>/
 287       <literal>failout</literal> mode.</para>
 288     </caution>
 289     <note>
 290       <para>After initial file system configuration, use the
 291       <literal>tunefs.lustre</literal> utility to change the mode. For example,
 292       to set the
 293       <literal>failout</literal> mode, run:</para>
 294       <para>
 295         <screen>
 296 $ tunefs.lustre --param failover.mode=failout
 297 <replaceable>/dev/ost_device</replaceable>
 298 </screen>
 299       </para>
 300     </note>
 301   </section>
 302   <section xml:id="dbdoclet.50438194_54138">
 303     <title>
 304     <indexterm>
 305       <primary>operations</primary>
 306       <secondary>degraded OST RAID</secondary>
 307     </indexterm>Handling Degraded OST RAID Arrays</title>
 308     <para>Lustre includes functionality that notifies Lustre if an external
 309     RAID array has degraded performance (resulting in reduced overall file
 310     system performance), either because a disk has failed and not been
 311     replaced, or because a disk was replaced and is undergoing a rebuild. To
 312     avoid a global performance slowdown due to a degraded OST, the MDS can
 313     avoid the OST for new object allocation if it is notified of the degraded
 314     state.</para>
 315     <para>A parameter for each OST, called
 316     <literal>degraded</literal>, specifies whether the OST is running in
 317     degraded mode or not.</para>
 318     <para>To mark the OST as degraded, use:</para>
 319     <screen>
 320 lctl set_param obdfilter.{OST_name}.degraded=1
 321 </screen>
 322     <para>To mark that the OST is back in normal operation, use:</para>
 323     <screen>
 324 lctl set_param obdfilter.{OST_name}.degraded=0
 325 </screen>
 326     <para>To determine if OSTs are currently in degraded mode, use:</para>
 327     <screen>
 328 lctl get_param obdfilter.*.degraded
 329 </screen>
 330     <para>If the OST is remounted due to a reboot or other condition, the flag
 331     resets to
 332     <literal>0</literal>.</para>
 333     <para>It is recommended that this be implemented by an automated script
 334     that monitors the status of individual RAID devices.</para>
 335   </section>
 336   <section xml:id="dbdoclet.50438194_88063">
 337     <title>
 338     <indexterm>
 339       <primary>operations</primary>
 340       <secondary>multiple file systems</secondary>
 341     </indexterm>Running Multiple Lustre File Systems</title>
 342     <para>Lustre supports multiple file systems provided the combination of
 343     <literal>NID:fsname</literal> is unique. Each file system must be allocated
 344     a unique name during creation with the
 345     <literal>--fsname</literal> parameter. Unique names for file systems are
 346     enforced if a single MGS is present. If multiple MGSs are present (for
 347     example if you have an MGS on every MDS) the administrator is responsible
 348     for ensuring file system names are unique. A single MGS and unique file
 349     system names provides a single point of administration and allows commands
 350     to be issued against the file system even if it is not mounted.</para>
 351     <para>Lustre supports multiple file systems on a single MGS. With a single
 352     MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
 353     to co-exist. For example, multiple MGSs will be necessary if multiple file
 354     systems on different Lustre software versions are to be concurrently
 355     available. With multiple MGSs additional care must be taken to ensure file
 356     system names are unique. Each file system should have a unique fsname among
 357     all systems that may interoperate in the future.</para>
 358     <para>By default, the
 359     <literal>mkfs.lustre</literal> command creates a file system named
 360     <literal>lustre</literal>. To specify a different file system name (limited
 361     to 8 characters) at format time, use the
 362     <literal>--fsname</literal> option:</para>
 363     <para>
 364       <screen>
 365 mkfs.lustre --fsname=
 366 <replaceable>file_system_name</replaceable>
 367 </screen>
 368     </para>
 369     <note>
 370       <para>The MDT, OSTs and clients in the new file system must use the same
 371       file system name (prepended to the device name). For example, for a new
 372       file system named
 373       <literal>foo</literal>, the MDT and two OSTs would be named
 374       <literal>foo-MDT0000</literal>,
 375       <literal>foo-OST0000</literal>, and
 376       <literal>foo-OST0001</literal>.</para>
 377     </note>
 378     <para>To mount a client on the file system, run:</para>
 379     <screen>
 380 client# mount -t lustre
 381 <replaceable>mgsnode</replaceable>:
 382 <replaceable>/new_fsname</replaceable>
 383 <replaceable>/mount_point</replaceable>
 384 </screen>
 385     <para>For example, to mount a client on file system foo at mount point
 386     /mnt/foo, run:</para>
 387     <screen>
 388 client# mount -t lustre mgsnode:/foo /mnt/foo
 389 </screen>
 390     <note>
 391       <para>If a client(s) will be mounted on several file systems, add the
 392       following line to
 393       <literal>/etc/xattr.conf</literal> file to avoid problems when files are
 394       moved between the file systems:
 395       <literal>lustre.* skip</literal></para>
 396     </note>
 397     <note>
 398       <para>To ensure that a new MDT is added to an existing MGS create the MDT
 399       by specifying:
 400       <literal>--mdt --mgsnode=
 401       <replaceable>mgs_NID</replaceable></literal>.</para>
 402     </note>
 403     <para>A Lustre installation with two file systems (
 404     <literal>foo</literal> and
 405     <literal>bar</literal>) could look like this, where the MGS node is
 406     <literal>mgsnode@tcp0</literal> and the mount points are
 407     <literal>/mnt/foo</literal> and
 408     <literal>/mnt/bar</literal>.</para>
 409     <screen>
 410 mgsnode# mkfs.lustre --mgs /dev/sda
 411 mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
 412 /dev/sdb
 413 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
 414 /dev/sda
 415 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
 416 /dev/sdb
 417 mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
 418 /dev/sda
 419 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
 420 /dev/sdc
 421 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
 422 /dev/sdd
 423 </screen>
 424     <para>To mount a client on file system foo at mount point
 425     <literal>/mnt/foo</literal>, run:</para>
 426     <screen>
 427 client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
 428 </screen>
 429     <para>To mount a client on file system bar at mount point
 430     <literal>/mnt/bar</literal>, run:</para>
 431     <screen>
 432 client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
 433 </screen>
 434   </section>
 435   <section xml:id="dbdoclet.lfsmkdir" condition='l24'>
 436     <title>
 437     <indexterm>
 438       <primary>operations</primary>
 439       <secondary>remote directory</secondary>
 440     </indexterm>Creating a sub-directory on a given MDT</title>
 441     <para>Lustre 2.4 enables individual sub-directories to be serviced by
 442     unique MDTs. An administrator can allocate a sub-directory to a given MDT
 443     using the command:</para>
 444     <screen>
 445 client# lfs mkdir –i
 446 <replaceable>mdt_index</replaceable>
 447 <replaceable>/mount_point/remote_dir</replaceable>
 448
 449 </screen>
 450     <para>This command will allocate the sub-directory
 451     <literal>remote_dir</literal> onto the MDT of index
 452     <literal>mdtindex</literal>. For more information on adding additional MDTs
 453     and
 454     <literal>mdtindex</literal> see
 455     <xref linkend='dbdoclet.addmdtindex' />.</para>
 456     <warning>
 457       <para>An administrator can allocate remote sub-directories to separate
 458       MDTs. Creating remote sub-directories in parent directories not hosted on
 459       MDT0 is not recommended. This is because the failure of the parent MDT
 460       will leave the namespace below it inaccessible. For this reason, by
 461       default it is only possible to create remote sub-directories off MDT0. To
 462       relax this restriction and enable remote sub-directories off any MDT, an
 463       administrator must issue the command
 464       <literal>lctl set_param mdd.*.enable_remote_dir=1</literal>.</para>
 465     </warning>
 466   </section>
 467   <section xml:id="dbdoclet.50438194_88980">
 468     <title>
 469     <indexterm>
 470       <primary>operations</primary>
 471       <secondary>parameters</secondary>
 472     </indexterm>Setting and Retrieving Lustre Parameters</title>
 473     <para>Several options are available for setting parameters in
 474     Lustre:</para>
 475     <itemizedlist>
 476       <listitem>
 477         <para>When creating a file system, use mkfs.lustre. See
 478         <xref linkend="dbdoclet.50438194_17237" />below.</para>
 479       </listitem>
 480       <listitem>
 481         <para>When a server is stopped, use tunefs.lustre. See
 482         <xref linkend="dbdoclet.50438194_55253" />below.</para>
 483       </listitem>
 484       <listitem>
 485         <para>When the file system is running, use lctl to set or retrieve
 486         Lustre parameters. See
 487         <xref linkend="dbdoclet.50438194_51490" />and
 488         <xref linkend="dbdoclet.50438194_63247" />below.</para>
 489       </listitem>
 490     </itemizedlist>
 491     <section xml:id="dbdoclet.50438194_17237">
 492       <title>Setting Tunable Parameters with
 493       <literal>mkfs.lustre</literal></title>
 494       <para>When the file system is first formatted, parameters can simply be
 495       added as a
 496       <literal>--param</literal> option to the
 497       <literal>mkfs.lustre</literal> command. For example:</para>
 498       <screen>
 499 mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
 500 </screen>
 501       <para>For more details about creating a file system,see
 502       <xref linkend="configuringlustre" />. For more details about
 503       <literal>mkfs.lustre</literal>, see
 504       <xref linkend="systemconfigurationutilities" />.</para>
 505     </section>
 506     <section xml:id="dbdoclet.50438194_55253">
 507       <title>Setting Parameters with
 508       <literal>tunefs.lustre</literal></title>
 509       <para>If a server (OSS or MDS) is stopped, parameters can be added to an
 510       existing file system using the
 511       <literal>--param</literal> option to the
 512       <literal>tunefs.lustre</literal> command. For example:</para>
 513       <screen>
 514 oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
 515 </screen>
 516       <para>With
 517       <literal>tunefs.lustre</literal>, parameters are
 518       <emphasis>additive</emphasis>-- new parameters are specified in addition
 519       to old parameters, they do not replace them. To erase all old
 520       <literal>tunefs.lustre</literal> parameters and just use newly-specified
 521       parameters, run:</para>
 522       <screen>
 523 mds# tunefs.lustre --erase-params --param=
 524 <replaceable>new_parameters</replaceable>
 525 </screen>
 526       <para>The tunefs.lustre command can be used to set any parameter settable
 527       in a /proc/fs/lustre file and that has its own OBD device, so it can be
 528       specified as
 529       <literal>
 530       <replaceable>obdname|fsname</replaceable>.
 531       <replaceable>obdtype</replaceable>.
 532       <replaceable>proc_file_name</replaceable>=
 533       <replaceable>value</replaceable></literal>. For example:</para>
 534       <screen>
 535 mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
 536 </screen>
 537       <para>For more details about
 538       <literal>tunefs.lustre</literal>, see
 539       <xref linkend="systemconfigurationutilities" />.</para>
 540     </section>
 541     <section xml:id="dbdoclet.50438194_51490">
 542       <title>Setting Parameters with
 543       <literal>lctl</literal></title>
 544       <para>When the file system is running, the
 545       <literal>lctl</literal> command can be used to set parameters (temporary
 546       or permanent) and report current parameter values. Temporary parameters
 547       are active as long as the server or client is not shut down. Permanent
 548       parameters live through server and client reboots.</para>
 549       <note>
 550         <para>The lctl list_param command enables users to list all parameters
 551         that can be set. See
 552         <xref linkend="dbdoclet.50438194_88217" />.</para>
 553       </note>
 554       <para>For more details about the
 555       <literal>lctl</literal> command, see the examples in the sections below
 556       and
 557       <xref linkend="systemconfigurationutilities" />.</para>
 558       <section remap="h4">
 559         <title>Setting Temporary Parameters</title>
 560         <para>Use
 561         <literal>lctl set_param</literal> to set temporary parameters on the
 562         node where it is run. These parameters map to items in
 563         <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
 564         <literal>lctl set_param</literal> command uses this syntax:</para>
 565         <screen>
 566 lctl set_param [-n]
 567 <replaceable>obdtype</replaceable>.
 568 <replaceable>obdname</replaceable>.
 569 <replaceable>proc_file_name</replaceable>=
 570 <replaceable>value</replaceable>
 571 </screen>
 572         <para>For example:</para>
 573         <screen>
 574 # lctl set_param osc.*.max_dirty_mb=1024
 575 osc.myth-OST0000-osc.max_dirty_mb=32
 576 osc.myth-OST0001-osc.max_dirty_mb=32
 577 osc.myth-OST0002-osc.max_dirty_mb=32
 578 osc.myth-OST0003-osc.max_dirty_mb=32
 579 osc.myth-OST0004-osc.max_dirty_mb=32
 580 </screen>
 581       </section>
 582       <section xml:id="dbdoclet.50438194_64195">
 583         <title>Setting Permanent Parameters</title>
 584         <para>Use the
 585         <literal>lctl conf_param</literal> command to set permanent parameters.
 586         In general, the
 587         <literal>lctl conf_param</literal> command can be used to specify any
 588         parameter settable in a
 589         <literal>/proc/fs/lustre</literal> file, with its own OBD device. The
 590         <literal>lctl conf_param</literal> command uses this syntax (same as the
 591
 592         <literal>mkfs.lustre</literal> and
 593         <literal>tunefs.lustre</literal> commands):</para>
 594         <screen>
 595 <replaceable>obdname|fsname</replaceable>.
 596 <replaceable>obdtype</replaceable>.
 597 <replaceable>proc_file_name</replaceable>=
 598 <replaceable>value</replaceable>)
 599 </screen>
 600         <para>Here are a few examples of
 601         <literal>lctl conf_param</literal> commands:</para>
 602         <screen>
 603 mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
 604 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
 605 $ lctl conf_param testfs.llite.max_read_ahead_mb=16
 606 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M
 607 $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
 608 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
 609 $ lctl conf_param testfs.sys.timeout=40
 610 </screen>
 611         <caution>
 612           <para>Parameters specified with the
 613           <literal>lctl conf_param</literal> command are set permanently in the
 614           file system's configuration file on the MGS.</para>
 615         </caution>
 616       </section>
 617       <section xml:id="dbdoclet.setparamp" condition='l25'>
 618         <title>Setting Permanent Parameters with lctl set_param -P</title>
 619         <para>Use the
 620         <literal>lctl set_param -P</literal> to set parameters permanently. This
 621         command must be issued on the MGS. The given parameter is set on every
 622         host using
 623         <literal>lctl</literal> upcall. Parameters map to items in
 624         <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
 625         <literal>lctl set_param</literal> command uses this syntax:</para>
 626         <screen>
 627 lctl set_param -P
 628 <replaceable>obdtype</replaceable>.
 629 <replaceable>obdname</replaceable>.
 630 <replaceable>proc_file_name</replaceable>=
 631 <replaceable>value</replaceable>
 632 </screen>
 633         <para>For example:</para>
 634         <screen>
 635 # lctl set_param -P osc.*.max_dirty_mb=1024
 636 osc.myth-OST0000-osc.max_dirty_mb=32
 637 osc.myth-OST0001-osc.max_dirty_mb=32
 638 osc.myth-OST0002-osc.max_dirty_mb=32
 639 osc.myth-OST0003-osc.max_dirty_mb=32
 640 osc.myth-OST0004-osc.max_dirty_mb=32
 641 </screen>
 642         <para>Use
 643         <literal>-d</literal>(only with -P) option to delete permanent
 644         parameter. Syntax:</para>
 645         <screen>
 646 lctl set_param -P -d
 647 <replaceable>obdtype</replaceable>.
 648 <replaceable>obdname</replaceable>.
 649 <replaceable>proc_file_name</replaceable>
 650 </screen>
 651         <para>For example:</para>
 652         <screen>
 653 # lctl set_param -P -d osc.*.max_dirty_mb
 654 </screen>
 655       </section>
 656       <section xml:id="dbdoclet.50438194_88217">
 657         <title>Listing Parameters</title>
 658         <para>To list Lustre or LNET parameters that are available to set, use
 659         the
 660         <literal>lctl list_param</literal> command. For example:</para>
 661         <screen>
 662 lctl list_param [-FR]
 663 <replaceable>obdtype</replaceable>.
 664 <replaceable>obdname</replaceable>
 665 </screen>
 666         <para>The following arguments are available for the
 667         <literal>lctl list_param</literal> command.</para>
 668         <para>
 669         <literal>-F</literal> Add '
 670         <literal>/</literal>', '
 671         <literal>@</literal>' or '
 672         <literal>=</literal>' for directories, symlinks and writeable files,
 673         respectively</para>
 674         <para>
 675         <literal>-R</literal> Recursively lists all parameters under the
 676         specified path</para>
 677         <para>For example:</para>
 678         <screen>
 679 oss# lctl list_param obdfilter.lustre-OST0000
 680 </screen>
 681       </section>
 682       <section xml:id="dbdoclet.50438194_63247">
 683         <title>Reporting Current Parameter Values</title>
 684         <para>To report current Lustre parameter values, use the
 685         <literal>lctl get_param</literal> command with this syntax:</para>
 686         <screen>
 687 lctl get_param [-n]
 688 <replaceable>obdtype</replaceable>.
 689 <replaceable>obdname</replaceable>.
 690 <replaceable>proc_file_name</replaceable>
 691 </screen>
 692         <para>This example reports data on RPC service times.</para>
 693         <screen>
 694 oss# lctl get_param -n ost.*.ost_io.timeouts
 695 service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
 696 </screen>
 697         <para>This example reports the amount of space this client has reserved
 698         for writeback cache with each OST:</para>
 699         <screen>
 700 client# lctl get_param osc.*.cur_grant_bytes
 701 osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
 702 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
 703 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
 704 osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
 705 osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
 706 </screen>
 707       </section>
 708     </section>
 709   </section>
 710   <section xml:id="dbdoclet.50438194_41817">
 711     <title>
 712     <indexterm>
 713       <primary>operations</primary>
 714       <secondary>failover</secondary>
 715     </indexterm>Specifying NIDs and Failover</title>
 716     <para>If a node has multiple network interfaces, it may have multiple NIDs,
 717     which must all be identified so other nodes can choose the NID that is
 718     appropriate for their network interfaces. Typically, NIDs are specified in
 719     a list delimited by commas (
 720     <literal>,</literal>). However, when failover nodes are specified, the NIDs
 721     are delimited by a colon (
 722     <literal>:</literal>) or by repeating a keyword such as
 723     <literal>--mgsnode=</literal> or
 724     <literal>--servicenode=</literal>).</para>
 725     <para>To display the NIDs of all servers in networks configured to work
 726     with the Lustre file system, run (while LNET is running):</para>
 727     <screen>
 728 lctl list_nids
 729 </screen>
 730     <para>In the example below,
 731     <literal>mds0</literal> and
 732     <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
 733     and
 734     <literal>oss0</literal> and
 735     <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
 736     address for
 737     <literal>mds0</literal> is 192.168.10.1, and for
 738     <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
 739     <literal>oss0</literal> and
 740     <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
 741     respectively.</para>
 742     <screen>
 743 mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
 744         --servicenode=192.168.10.2@tcp0 \
 745         -–servicenode=192.168.10.1@tcp0 /dev/sda1
 746 mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
 747 oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \
 748         --servicenode=192.168.10.21 --ost --index=0 \
 749         --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \
 750         /dev/sdb
 751 oss0# mount -t lustre /dev/sdb /mnt/test/ost0
 752 client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \
 753         /mnt/testfs
 754 mds0# umount /mnt/mdt
 755 mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
 756 mds1# cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status
 757 </screen>
 758     <para>Where multiple NIDs are specified separated by commas (for example,
 759     <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
 760     to the same host, and the Lustre software chooses the
 761     <emphasis>best</emphasis>one for communication. When a pair of NIDs is
 762     separated by a colon (for example,
 763     <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
 764     to two different hosts and are treated as a failover pair (the Lustre
 765     software tries the first one, and if that fails, it tries the second
 766     one.)</para>
 767     <para>Two options to
 768     <literal>mkfs.lustre</literal> can be used to specify failover nodes.
 769     Introduced in Lustre software release 2.0, the
 770     <literal>--servicenode</literal> option is used to specify all service NIDs,
 771     including those for primary nodes and failover nodes. When the
 772     <literal>--servicenode</literal> option is used, the first service node to
 773     load the target device becomes the primary service node, while nodes
 774     corresponding to the other specified NIDs become failover locations for the
 775     target device. An older option,
 776     <literal>--failnode</literal>, specifies just the NIDS of failover nodes.
 777     For more information about the
 778     <literal>--servicenode</literal> and
 779     <literal>--failnode</literal> options, see
 780     <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 781     linkend="configuringfailover" />.</para>
 782   </section>
 783   <section xml:id="dbdoclet.50438194_70905">
 784     <title>
 785     <indexterm>
 786       <primary>operations</primary>
 787       <secondary>erasing a file system</secondary>
 788     </indexterm>Erasing a File System</title>
 789     <para>If you want to erase a file system and permanently delete all the
 790     data in the file system, run this command on your targets:</para>
 791     <screen>
 792 $ "mkfs.lustre --reformat"
 793 </screen>
 794     <para>If you are using a separate MGS and want to keep other file systems
 795     defined on that MGS, then set the
 796     <literal>writeconf</literal> flag on the MDT for that file system. The
 797     <literal>writeconf</literal> flag causes the configuration logs to be
 798     erased; they are regenerated the next time the servers start.</para>
 799     <para>To set the
 800     <literal>writeconf</literal> flag on the MDT:</para>
 801     <orderedlist>
 802       <listitem>
 803         <para>Unmount all clients/servers using this file system, run:</para>
 804         <screen>
 805 $ umount /mnt/lustre
 806 </screen>
 807       </listitem>
 808       <listitem>
 809         <para>Permanently erase the file system and, presumably, replace it
 810         with another file system, run:</para>
 811         <screen>
 812 $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
 813 <emphasis>{mdsdev}</emphasis>
 814 </screen>
 815       </listitem>
 816       <listitem>
 817         <para>If you have a separate MGS (that you do not want to reformat),
 818         then add the
 819         <literal>--writeconf</literal> flag to
 820         <literal>mkfs.lustre</literal> on the MDT, run:</para>
 821         <screen>
 822 $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
 823 <replaceable>mgs_nid</replaceable> --mdt --index=0
 824 <replaceable>/dev/mds_device</replaceable>
 825 </screen>
 826       </listitem>
 827     </orderedlist>
 828     <note>
 829       <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
 830       MGS as well, causing all configuration information to be lost; you can
 831       start building your new file system. Nothing needs to be done with old
 832       disks that will not be part of the new file system, just do not mount
 833       them.</para>
 834     </note>
 835   </section>
 836   <section xml:id="dbdoclet.50438194_16954">
 837     <title>
 838     <indexterm>
 839       <primary>operations</primary>
 840       <secondary>reclaiming space</secondary>
 841     </indexterm>Reclaiming Reserved Disk Space</title>
 842     <para>All current Lustre installations run the ldiskfs file system
 843     internally on service nodes. By default, ldiskfs reserves 5% of the disk
 844     space to avoid file system fragmentation. In order to reclaim this space,
 845     run the following command on your OSS for each OST in the file
 846     system:</para>
 847     <screen>
 848 tune2fs [-m reserved_blocks_percent] /dev/
 849 <emphasis>{ostdev}</emphasis>
 850 </screen>
 851     <para>You do not need to shut down Lustre before running this command or
 852     restart it afterwards.</para>
 853     <warning>
 854       <para>Reducing the space reservation can cause severe performance
 855       degradation as the OST file system becomes more than 95% full, due to
 856       difficulty in locating large areas of contiguous free space. This
 857       performance degradation may persist even if the space usage drops below
 858       95% again. It is recommended NOT to reduce the reserved disk space below
 859       5%.</para>
 860     </warning>
 861   </section>
 862   <section xml:id="dbdoclet.50438194_69998">
 863     <title>
 864     <indexterm>
 865       <primary>operations</primary>
 866       <secondary>replacing an OST or MDS</secondary>
 867     </indexterm>Replacing an Existing OST or MDT</title>
 868     <para>To copy the contents of an existing OST to a new OST (or an old MDT
 869     to a new MDT), follow the process for either OST/MDT backups in
 870     <xref linkend='dbdoclet.50438207_71633' />or
 871     <xref linkend='dbdoclet.50438207_21638' />. For more information on
 872     removing a MDT, see
 873     <xref linkend='dbdoclet.rmremotedir' />.</para>
 874   </section>
 875   <section xml:id="dbdoclet.50438194_30872">
 876     <title>
 877     <indexterm>
 878       <primary>operations</primary>
 879       <secondary>identifying OSTs</secondary>
 880     </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
 881     <para>Use this procedure to identify the file containing a given object on
 882     a given OST.</para>
 883     <orderedlist>
 884       <listitem>
 885         <para>On the OST (as root), run
 886         <literal>debugfs</literal> to display the file identifier (
 887         <literal>FID</literal>) of the file associated with the object.</para>
 888         <para>For example, if the object is
 889         <literal>34976</literal> on
 890         <literal>/dev/lustre/ost_test2</literal>, the debug command is:
 891         <screen>
 892 # debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
 893 </screen></para>
 894         <para>The command output is:
 895         <screen>
 896 debugfs 1.42.3.wc3 (15-Aug-2012)
 897 /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
 898 Inode: 352365   Type: regular    Mode:  0666   Flags: 0x80000
 899 Generation: 2393149953    Version: 0x0000002a:00005f81
 900 User:  1000   Group:  1000   Size: 260096
 901 File ACL: 0    Directory ACL: 0
 902 Links: 1   Blockcount: 512
 903 Fragment:  Address: 0    Number: 0    Size: 0
 904 ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
 905 atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
 906 mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
 907 crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
 908 Size of extra inode fields: 24
 909 Extended attributes stored in inode body:
 910   fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
 911 00 00 00 00 00 00 00 00 " (32)
 912   fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1
 913 EXTENTS:
 914 (0-64):4620544-4620607
 915 </screen></para>
 916       </listitem>
 917       <listitem>
 918         <para>For Lustre software release 2.x file systems, the parent FID will
 919         be of the form [0x200000400:0x122:0x0] and can be resolved directly
 920         using the
 921         <literal>lfs fid2path [0x200000404:0x122:0x0]
 922         /mnt/lustre</literal> command on any Lustre client, and the process is
 923         complete.</para>
 924       </listitem>
 925       <listitem>
 926         <para>In this example the parent inode FID is an upgraded 1.x inode
 927         (due to the first part of the FID being below 0x200000400), the MDT
 928         inode number is
 929         <literal>0x24dab9</literal> and generation
 930         <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved
 931         using
 932         <literal>debugfs</literal>.</para>
 933       </listitem>
 934       <listitem>
 935         <para>On the MDS (as root), use
 936         <literal>debugfs</literal> to find the file associated with the
 937         inode:</para>
 938         <screen>
 939 # debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
 940 </screen>
 941         <para>Here is the command output:</para>
 942         <screen>
 943 debugfs 1.42.3.wc2 (15-Aug-2012)
 944 /dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
 945 s
 946 Inode      Pathname
 947 2415289    /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
 948 </screen>
 949       </listitem>
 950     </orderedlist>
 951     <para>The command lists the inode and pathname associated with the
 952     object.</para>
 953     <note>
 954       <para>
 955       <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
 956       take a long time to complete.</para>
 957     </note>
 958     <note>
 959       <para>To find the Lustre file from a disk LBA, follow the steps listed in
 960       the document at this URL:
 961       <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">
 962       http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then,
 963       follow the steps above to resolve the Lustre filename.</para>
 964     </note>
 965   </section>
 966 </chapter>