LustreOperations.xml

   1 <?xml version='1.0' encoding='utf-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3  xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4  xml:id="lustreoperations">
   5   <title xml:id="lustreoperations.title">Lustre Operations</title>
   6   <para>Once you have the Lustre file system up and running, you can use the
   7   procedures in this section to perform these basic Lustre administration
   8   tasks.</para>
   9   <section xml:id="mount_by_label">
  10     <title>
  11     <indexterm>
  12       <primary>operations</primary>
  13     </indexterm>
  14     <indexterm>
  15       <primary>operations</primary>
  16       <secondary>mounting by label</secondary>
  17     </indexterm>Mounting by Label</title>
  18     <para>The file system name is limited to 8 characters. We have encoded the
  19     file system and target information in the disk label, so you can mount by
  20     label. This allows system administrators to move disks around without
  21     worrying about issues such as SCSI disk reordering or getting the
  22     <literal>/dev/device</literal> wrong for a shared target. Soon, file system
  23     naming will be made as fail-safe as possible. Currently, Linux disk labels
  24     are limited to 16 characters. To identify the target within the file
  25     system, 8 characters are reserved, leaving 8 characters for the file system
  26     name:</para>
  27     <screen>
  28 <replaceable>fsname</replaceable>-MDT0000 or
  29 <replaceable>fsname</replaceable>-OST0a19
  30 </screen>
  31     <para>To mount by label, use this command:</para>
  32     <screen>
  33 mount -t lustre -L
  34 <replaceable>file_system_label</replaceable>
  35 <replaceable>/mount_point</replaceable>
  36 </screen>
  37     <para>This is an example of mount-by-label:</para>
  38     <screen>
  39 mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
  40 </screen>
  41     <caution>
  42       <para>Mount-by-label should NOT be used in a multi-path environment or
  43       when snapshots are being created of the device, since multiple block
  44       devices will have the same label.</para>
  45     </caution>
  46     <para>Although the file system name is internally limited to 8 characters,
  47     you can mount the clients at any mount point, so file system users are not
  48     subjected to short names. Here is an example:</para>
  49     <screen>
  50 client# mount -t lustre mds0@tcp0:/short
  51 <replaceable>/dev/long_mountpoint_name</replaceable>
  52 </screen>
  53   </section>
  54   <section xml:id="starting_lustre">
  55     <title>
  56     <indexterm>
  57       <primary>operations</primary>
  58       <secondary>starting</secondary>
  59     </indexterm>Starting Lustre</title>
  60     <para>On the first start of a Lustre file system, the components must be
  61     started in the following order:</para>
  62     <orderedlist>
  63       <listitem>
  64         <para>Mount the MGT.</para>
  65         <note>
  66           <para>If a combined MGT/MDT is present, Lustre will correctly mount
  67           the MGT and MDT automatically.</para>
  68         </note>
  69       </listitem>
  70       <listitem>
  71         <para>Mount the MDT.</para>
  72         <note>
  73           <para>Mount all MDTs if multiple MDTs are present.</para>
  74         </note>
  75       </listitem>
  76       <listitem>
  77         <para>Mount the OST(s).</para>
  78       </listitem>
  79       <listitem>
  80         <para>Mount the client(s).</para>
  81       </listitem>
  82     </orderedlist>
  83   </section>
  84   <section xml:id="mounting_server">
  85     <title>
  86     <indexterm>
  87       <primary>operations</primary>
  88       <secondary>mounting</secondary>
  89     </indexterm>Mounting a Server</title>
  90     <para>Starting a Lustre server is straightforward and only involves the
  91     mount command. Lustre servers can be added to
  92     <literal>/etc/fstab</literal>:</para>
  93     <screen>
  94 mount -t lustre
  95 </screen>
  96     <para>The mount command generates output similar to this:</para>
  97     <screen>
  98 /dev/sda1 on /mnt/test/mdt type lustre (rw)
  99 /dev/sda2 on /mnt/test/ost0 type lustre (rw)
 100 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
 101 </screen>
 102     <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
 103     mounted.</para>
 104     <screen>
 105 LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
 106 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
 107 </screen>
 108     <para>In general, it is wise to specify noauto and let your
 109     high-availability (HA) package manage when to mount the device. If you are
 110     not using failover, make sure that networking has been started before
 111     mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
 112     Linux Enterprise Server, Debian operating system (and perhaps others), use
 113     the
 114     <literal>_netdev</literal> flag to ensure that these disks are mounted after
 115     the network is up.</para>
 116     <para>We are mounting by disk label here. The label of a device can be read
 117     with
 118     <literal>e2label</literal>. The label of a newly-formatted Lustre server
 119     may end in
 120     <literal>FFFF</literal> if the
 121     <literal>--index</literal> option is not specified to
 122     <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
 123     assignment takes place when the server is first started, and the disk label
 124     is updated. It is recommended that the
 125     <literal>--index</literal> option always be used, which will also ensure
 126     that the label is set at format time.</para>
 127     <caution>
 128       <para>Do not do this when the client and OSS are on the same node, as
 129       memory pressure between the client and OSS can lead to deadlocks.</para>
 130     </caution>
 131     <caution>
 132       <para>Mount-by-label should NOT be used in a multi-path
 133       environment.</para>
 134     </caution>
 135   </section>
 136   <section xml:id="shutdownLustre">
 137       <title>
 138           <indexterm>
 139               <primary>operations</primary>
 140               <secondary>shutdownLustre</secondary>
 141           </indexterm>Stopping the Filesystem</title>
 142       <para>A complete Lustre filesystem shutdown occurs by unmounting all
 143       clients and servers in the order shown below.  Please note that unmounting
 144       a block device causes the Lustre software to be shut down on that node.
 145       </para>
 146       <note><para>Please note that the <literal>-a -t lustre</literal> in the
 147           commands below is not the name of a filesystem, but rather is
 148           specifying to unmount all entries in /etc/mtab that are of type
 149           <literal>lustre</literal></para></note>
 150       <orderedlist>
 151           <listitem><para>Unmount the clients</para>
 152               <para>On each client node, unmount the filesystem on that client
 153               using the <literal>umount</literal> command:</para>
 154               <para><literal>umount -a -t lustre</literal></para>
 155               <para>The example below shows the unmount of the
 156               <literal>testfs</literal> filesystem on a client node:</para>
 157               <para><screen>[root@client1 ~]# mount |grep testfs
 158 XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs)
 159
 160 [root@client1 ~]# umount -a -t lustre
 161 [154523.177714] Lustre: Unmounted testfs-client</screen></para>
 162           </listitem>
 163           <listitem><para>Unmount the MDT and MGT</para>
 164               <para>On the MGS and MDS node(s), run the
 165               <literal>umount</literal> command:</para>
 166               <para><literal>umount -a -t lustre</literal></para>
 167               <para>The example below shows the unmount of the MDT and MGT for
 168               the <literal>testfs</literal> filesystem on a combined MGS/MDS:
 169               </para>
 170               <para><screen>[root@mds1 ~]# mount |grep lustre
 171 /dev/sda on /mnt/mgt type lustre (ro)
 172 /dev/sdb on /mnt/mdt type lustre (ro)
 173
 174 [root@mds1 ~]# umount -a -t lustre
 175 [155263.566230] Lustre: Failing over testfs-MDT0000
 176 [155263.775355] Lustre: server umount testfs-MDT0000 complete
 177 [155269.843862] Lustre: server umount MGS complete</screen></para>
 178           <para>For a seperate MGS and MDS, the same command is used, first on
 179           the MDS and then followed by the MGS.</para>
 180           </listitem>
 181           <listitem><para>Unmount all the OSTs</para>
 182               <para>On each OSS node, use the <literal>umount</literal> command:
 183               </para>
 184               <para><literal>umount -a -t lustre</literal></para>
 185               <para>The example below shows the unmount of all OSTs for the
 186               <literal>testfs</literal> filesystem on server
 187               <literal>OSS1</literal>:
 188               </para>
 189               <para><screen>[root@oss1 ~]# mount |grep lustre
 190 /dev/sda on /mnt/ost0 type lustre (ro)
 191 /dev/sdb on /mnt/ost1 type lustre (ro)
 192 /dev/sdc on /mnt/ost2 type lustre (ro)
 193
 194 [root@oss1 ~]# umount -a -t lustre
 195 [155336.491445] Lustre: Failing over testfs-OST0002
 196 [155336.556752] Lustre: server umount testfs-OST0002 complete</screen></para>
 197           </listitem>
 198       </orderedlist>
 199       <para>For unmount command syntax for a single OST, MDT, or MGT target
 200       please refer to <xref linkend="umountTarget"/></para>
 201   </section>
 202   <section xml:id="umountTarget">
 203     <title>
 204     <indexterm>
 205       <primary>operations</primary>
 206       <secondary>unmounting</secondary>
 207     </indexterm>Unmounting a Specific Target on a Server</title>
 208     <para>To stop a Lustre OST, MDT, or MGT , use the
 209     <literal>umount
 210     <replaceable>/mount_point</replaceable></literal> command.</para>
 211     <para>The example below stops an OST, <literal>ost0</literal>, on mount
 212     point <literal>/mnt/ost0</literal> for the <literal>testfs</literal>
 213     filesystem:</para>
 214     <screen>[root@oss1 ~]# umount /mnt/ost0
 215 [  385.142264] Lustre: Failing over testfs-OST0000
 216 [  385.210810] Lustre: server umount testfs-OST0000 complete</screen>
 217     <para>Gracefully stopping a server with the
 218     <literal>umount</literal> command preserves the state of the connected
 219     clients. The next time the server is started, it waits for clients to
 220     reconnect, and then goes through the recovery procedure.</para>
 221     <para>If the force (
 222     <literal>-f</literal>) flag is used, then the server evicts all clients and
 223     stops WITHOUT recovery. Upon restart, the server does not wait for
 224     recovery. Any currently connected clients receive I/O errors until they
 225     reconnect.</para>
 226     <note>
 227       <para>If you are using loopback devices, use the
 228       <literal>-d</literal> flag. This flag cleans up loop devices and can
 229       always be safely specified.</para>
 230     </note>
 231   </section>
 232   <section xml:id="failover_ost">
 233     <title>
 234     <indexterm>
 235       <primary>operations</primary>
 236       <secondary>failover</secondary>
 237     </indexterm>Specifying Failout/Failover Mode for OSTs</title>
 238     <para>In a Lustre file system, an OST that has become unreachable because
 239     it fails, is taken off the network, or is unmounted can be handled in one
 240     of two ways:</para>
 241     <itemizedlist>
 242       <listitem>
 243         <para>In
 244         <literal>failout</literal> mode, Lustre clients immediately receive
 245         errors (EIOs) after a timeout, instead of waiting for the OST to
 246         recover.</para>
 247       </listitem>
 248       <listitem>
 249         <para>In
 250         <literal>failover</literal> mode, Lustre clients wait for the OST to
 251         recover.</para>
 252       </listitem>
 253     </itemizedlist>
 254     <para>By default, the Lustre file system uses
 255     <literal>failover</literal> mode for OSTs. To specify
 256     <literal>failout</literal> mode instead, use the
 257     <literal>--param="failover.mode=failout"</literal> option as shown below
 258     (entered on one line):</para>
 259     <screen>
 260 oss# mkfs.lustre --fsname=
 261 <replaceable>fsname</replaceable> --mgsnode=
 262 <replaceable>mgs_NID</replaceable> --param=failover.mode=failout
 263       --ost --index=
 264 <replaceable>ost_index</replaceable>
 265 <replaceable>/dev/ost_block_device</replaceable>
 266 </screen>
 267     <para>In the example below,
 268     <literal>failout</literal> mode is specified for the OSTs on the MGS
 269     <literal>mds0</literal> in the file system
 270     <literal>testfs</literal>(entered on one line).</para>
 271     <screen>
 272 oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
 273       --ost --index=3 /dev/sdb
 274 </screen>
 275     <caution>
 276       <para>Before running this command, unmount all OSTs that will be affected
 277       by a change in
 278       <literal>failover</literal>/
 279       <literal>failout</literal> mode.</para>
 280     </caution>
 281     <note>
 282       <para>After initial file system configuration, use the
 283       <literal>tunefs.lustre</literal> utility to change the mode. For example,
 284       to set the
 285       <literal>failout</literal> mode, run:</para>
 286       <para>
 287         <screen>
 288 $ tunefs.lustre --param failover.mode=failout
 289 <replaceable>/dev/ost_device</replaceable>
 290 </screen>
 291       </para>
 292     </note>
 293   </section>
 294   <section xml:id="degraded_ost">
 295     <title>
 296     <indexterm>
 297       <primary>operations</primary>
 298       <secondary>degraded OST RAID</secondary>
 299     </indexterm>Handling Degraded OST RAID Arrays</title>
 300     <para>Lustre includes functionality that notifies Lustre if an external
 301     RAID array has degraded performance (resulting in reduced overall file
 302     system performance), either because a disk has failed and not been
 303     replaced, or because a disk was replaced and is undergoing a rebuild. To
 304     avoid a global performance slowdown due to a degraded OST, the MDS can
 305     avoid the OST for new object allocation if it is notified of the degraded
 306     state.</para>
 307     <para>A parameter for each OST, called
 308     <literal>degraded</literal>, specifies whether the OST is running in
 309     degraded mode or not.</para>
 310     <para>To mark the OST as degraded, use:</para>
 311     <screen>
 312 lctl set_param obdfilter.{OST_name}.degraded=1
 313 </screen>
 314     <para>To mark that the OST is back in normal operation, use:</para>
 315     <screen>
 316 lctl set_param obdfilter.{OST_name}.degraded=0
 317 </screen>
 318     <para>To determine if OSTs are currently in degraded mode, use:</para>
 319     <screen>
 320 lctl get_param obdfilter.*.degraded
 321 </screen>
 322     <para>If the OST is remounted due to a reboot or other condition, the flag
 323     resets to
 324     <literal>0</literal>.</para>
 325     <para>It is recommended that this be implemented by an automated script
 326     that monitors the status of individual RAID devices, such as MD-RAID's
 327     <literal>mdadm(8)</literal> command with the <literal>--monitor</literal>
 328     option to mark an affected device degraded or restored.</para>
 329   </section>
 330   <section xml:id="lustre_configure_multiple_fs">
 331     <title>
 332     <indexterm>
 333       <primary>operations</primary>
 334       <secondary>multiple file systems</secondary>
 335     </indexterm>Running Multiple Lustre File Systems</title>
 336     <para>Lustre supports multiple file systems provided the combination of
 337     <literal>NID:fsname</literal> is unique. Each file system must be allocated
 338     a unique name during creation with the
 339     <literal>--fsname</literal> parameter. Unique names for file systems are
 340     enforced if a single MGS is present. If multiple MGSs are present (for
 341     example if you have an MGS on every MDS) the administrator is responsible
 342     for ensuring file system names are unique. A single MGS and unique file
 343     system names provides a single point of administration and allows commands
 344     to be issued against the file system even if it is not mounted.</para>
 345     <para>Lustre supports multiple file systems on a single MGS. With a single
 346     MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
 347     to co-exist. For example, multiple MGSs will be necessary if multiple file
 348     systems on different Lustre software versions are to be concurrently
 349     available. With multiple MGSs additional care must be taken to ensure file
 350     system names are unique. Each file system should have a unique fsname among
 351     all systems that may interoperate in the future.</para>
 352     <para>By default, the
 353     <literal>mkfs.lustre</literal> command creates a file system named
 354     <literal>lustre</literal>. To specify a different file system name (limited
 355     to 8 characters) at format time, use the
 356     <literal>--fsname</literal> option:</para>
 357     <para>
 358       <screen>
 359 mkfs.lustre --fsname=
 360 <replaceable>file_system_name</replaceable>
 361 </screen>
 362     </para>
 363     <note>
 364       <para>The MDT, OSTs and clients in the new file system must use the same
 365       file system name (prepended to the device name). For example, for a new
 366       file system named
 367       <literal>foo</literal>, the MDT and two OSTs would be named
 368       <literal>foo-MDT0000</literal>,
 369       <literal>foo-OST0000</literal>, and
 370       <literal>foo-OST0001</literal>.</para>
 371     </note>
 372     <para>To mount a client on the file system, run:</para>
 373     <screen>
 374 client# mount -t lustre
 375 <replaceable>mgsnode</replaceable>:
 376 <replaceable>/new_fsname</replaceable>
 377 <replaceable>/mount_point</replaceable>
 378 </screen>
 379     <para>For example, to mount a client on file system foo at mount point
 380     /mnt/foo, run:</para>
 381     <screen>
 382 client# mount -t lustre mgsnode:/foo /mnt/foo
 383 </screen>
 384     <note>
 385       <para>If a client(s) will be mounted on several file systems, add the
 386       following line to
 387       <literal>/etc/xattr.conf</literal> file to avoid problems when files are
 388       moved between the file systems:
 389       <literal>lustre.* skip</literal></para>
 390     </note>
 391     <note>
 392       <para>To ensure that a new MDT is added to an existing MGS create the MDT
 393       by specifying:
 394       <literal>--mdt --mgsnode=
 395       <replaceable>mgs_NID</replaceable></literal>.</para>
 396     </note>
 397     <para>A Lustre installation with two file systems (
 398     <literal>foo</literal> and
 399     <literal>bar</literal>) could look like this, where the MGS node is
 400     <literal>mgsnode@tcp0</literal> and the mount points are
 401     <literal>/mnt/foo</literal> and
 402     <literal>/mnt/bar</literal>.</para>
 403     <screen>
 404 mgsnode# mkfs.lustre --mgs /dev/sda
 405 mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
 406 /dev/sdb
 407 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
 408 /dev/sda
 409 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
 410 /dev/sdb
 411 mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
 412 /dev/sda
 413 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
 414 /dev/sdc
 415 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
 416 /dev/sdd
 417 </screen>
 418     <para>To mount a client on file system foo at mount point
 419     <literal>/mnt/foo</literal>, run:</para>
 420     <screen>
 421 client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
 422 </screen>
 423     <para>To mount a client on file system bar at mount point
 424     <literal>/mnt/bar</literal>, run:</para>
 425     <screen>
 426 client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
 427 </screen>
 428   </section>
 429   <section xml:id="lfsmkdir">
 430     <title>
 431     <indexterm>
 432       <primary>operations</primary>
 433       <secondary>remote directory</secondary>
 434     </indexterm>Creating a sub-directory on a specific MDT</title>
 435     <para>It is possible to create individual directories, along with its
 436       files and sub-directories, to be stored on specific MDTs. To create
 437       a sub-directory on a given MDT use the command:
 438     </para>
 439     <screen>
 440 client# lfs mkdir –i
 441 <replaceable>mdt_index</replaceable>
 442 <replaceable>/mount_point/remote_dir</replaceable>
 443 </screen>
 444     <para>This command will allocate the sub-directory
 445     <literal>remote_dir</literal> onto the MDT of index
 446     <literal>mdt_index</literal>. For more information on adding additional MDTs
 447     and
 448     <literal>mdt_index</literal> see
 449     <xref linkend='addmdtindex' />.</para>
 450     <warning>
 451       <para>An administrator can allocate remote sub-directories to separate
 452       MDTs. Creating remote sub-directories in parent directories not hosted on
 453       MDT0000 is not recommended. This is because the failure of the parent MDT
 454       will leave the namespace below it inaccessible. For this reason, by
 455       default it is only possible to create remote sub-directories off MDT0000.
 456       To relax this restriction and enable remote sub-directories off any MDT,
 457       an administrator must issue the following command on the MGS:
 458       <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir=1</screen>
 459       For Lustre filesystem 'scratch', the command executed is:
 460       <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir=1</screen>
 461       To verify the configuration setting execute the following command on any
 462       MDS:
 463           <screen>mds# lctl get_param mdt.*.enable_remote_dir</screen></para>
 464     </warning>
 465     <para condition='l28'>With Lustre software version 2.8, a new
 466     tunable is available to allow users with a specific group ID to create
 467     and delete remote and striped directories. This tunable is
 468     <literal>enable_remote_dir_gid</literal>. For example, setting this
 469     parameter to the 'wheel' or 'admin' group ID allows users with that GID
 470     to create and delete remote and striped directories. Setting this
 471     parameter to <literal>-1</literal> on MDT0000 to permanently allow any
 472     non-root users create and delete remote and striped directories.
 473     On the MGS execute the following command:
 474     <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir_gid=-1</screen>
 475     For the Lustre filesystem 'scratch', the commands expands to:
 476     <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1</screen>.
 477     The change can be verified by executing the following command on every MDS:
 478     <screen>mds# lctl get_param mdt.<replaceable>*</replaceable>.enable_remote_dir_gid</screen>
 479     </para>
 480   </section>
 481   <section xml:id="lfsmkdirdne2" condition='l28'>
 482     <title>
 483     <indexterm>
 484       <primary>operations</primary>
 485       <secondary>striped directory</secondary>
 486     </indexterm>
 487     <indexterm>
 488       <primary>operations</primary>
 489       <secondary>mkdir</secondary>
 490     </indexterm>
 491     <indexterm>
 492       <primary>operations</primary>
 493       <secondary>setdirstripe</secondary>
 494     </indexterm>
 495     <indexterm>
 496       <primary>striping</primary>
 497       <secondary>metadata</secondary>
 498     </indexterm>Creating a directory striped across multiple MDTs</title>
 499     <para>The Lustre 2.8 DNE feature enables individual files in a given
 500     directory to store their metadata on separate MDTs (a <emphasis>striped
 501     directory</emphasis>) once additional MDTs have been added to the
 502     filesystem, see <xref linkend="lustremaint.adding_new_mdt"/>.
 503     The result of this is that metadata requests for
 504     files in a striped directory are serviced by multiple MDTs and metadata
 505     service load is distributed over all the MDTs that service a given
 506     directory. By distributing metadata service load over multiple MDTs,
 507     performance can be improved beyond the limit of single MDT
 508     performance. Prior to the development of this feature all files in a
 509     directory must record their metadata on a single MDT.</para>
 510     <para>This command to stripe a directory over
 511     <replaceable>mdt_count</replaceable> MDTs is:
 512     </para>
 513     <screen>
 514 client# lfs mkdir -c
 515 <replaceable>mdt_count</replaceable>
 516 <replaceable>/mount_point/new_directory</replaceable>
 517 </screen>
 518     <para>The striped directory feature is most useful for distributing
 519     single large directories (50k entries or more) across multiple MDTs,
 520     since it incurs more overhead than non-striped directories.</para>
 521     <section xml:id="lfsmkdirbyspace" condition='l2D'>
 522       <title>Directory creation by space/inode usage</title>
 523       <para>If the starting MDT is not specified when creating a new directory,
 524       this directory and its stripes will be distributed on MDTs by space usage.
 525       For example the following will create a directory and its stripes on MDTs
 526       with balanced space usage:</para>
 527       <screen>lfs mkdir -c 2 &lt;dir1&gt;</screen>
 528       <para>Alternatively, if a default directory stripe is set on a directory,
 529       the subsequent syscall <literal>mkdir</literal> under
 530       <literal>&lt;dir1&gt;</literal> will have the same effect:
 531       <screen>lfs setdirstripe -D -c 2 &lt;dir1&gt;</screen></para>
 532       <para>The policy is:</para>
 533       <itemizedlist>
 534         <listitem><para>If free inodes/blocks on all MDT are almost the same,
 535         i.e. <literal>max_inodes_avail * 84% &lt; min_inodes_avail</literal> and
 536         <literal>max_blocks_avail * 84% &lt; min_blocks_avail</literal>, then
 537         choose MDT roundrobin.</para></listitem>
 538         <listitem><para>Otherwise, create more subdirectories on MDTs with more
 539         free inodes/blocks.</para></listitem>
 540       </itemizedlist>
 541     </section>
 542     <section xml:id="fsdefaultlmv" condition='l2E'>
 543       <title>Filesystem-wide default directory striping</title>
 544       <para>Similar to file objects allocation, the directory objects are
 545       allocated on MDTs by a round-robin algorithm or a weighted algorithm. For
 546       the top three level of directories from the root of the filesystem, if the
 547       amount of free inodes and blocks is well balanced (i.e., by default, when
 548       the free inodes and blocks across MDTs differ by less than 5%), the
 549       round-robin algorithm is used to select the next MDT on which a directory
 550       is to be created.
 551       </para>
 552       <para>If the directory is more than three levels below the root directory,
 553       or MDTs are not balanced, then the weighted algorithm is used to randomly
 554       select an MDT with more free inodes and blocks.
 555       </para>
 556       <para> To avoid creating unnecessary remote directories, if the MDT where
 557       its parent directory is located is not too full (the free inodes and
 558       blocks of the parent MDT is not more than 5% full than average of all
 559       MDTs), this directory will be created on parent MDT.
 560       </para>
 561       <para>If administrator wants to change this default filesystem-wide
 562       directory striping, run the following command to limit this striping to
 563       the top level below the root directory:</para>
 564       <screen>lfs setdirstripe -D -i -1 -c 1 --max-inherit 0 &lt;mountpoint&gt;
 565       </screen>
 566       <para>To revert to the pre-2.15 behavior of all directories being created
 567       only on MDT0000 by default (deleting this striping won't work because it
 568       will be recreated if missing):</para>
 569       <screen>lfs setdirstripe -D -i 0 -c 1 --max-inherit 0 &lt;mountpoint&gt;
 570       </screen>
 571     </section>
 572   </section>
 573   <section xml:id="set_get_lustre_params">
 574     <title>
 575     <indexterm>
 576       <primary>operations</primary>
 577       <secondary>parameters</secondary>
 578     </indexterm>Setting and Retrieving Lustre Parameters</title>
 579     <para>Several options are available for setting parameters in
 580     Lustre:</para>
 581     <itemizedlist>
 582       <listitem>
 583         <para>When creating a file system, use mkfs.lustre. See
 584         <xref linkend="tuning_params_mkfs_lustre" />below.</para>
 585       </listitem>
 586       <listitem>
 587         <para>When a server is stopped, use tunefs.lustre. See
 588         <xref linkend="setting_param_tunefs" />below.</para>
 589       </listitem>
 590       <listitem>
 591         <para>When the file system is running, use lctl to set or retrieve
 592         Lustre parameters. See
 593         <xref linkend="setting_param_with_lctl" />and
 594         <xref linkend="reporting_current_param" />below.</para>
 595       </listitem>
 596     </itemizedlist>
 597     <section xml:id="tuning_params_mkfs_lustre">
 598       <title>Setting Tunable Parameters with
 599       <literal>mkfs.lustre</literal></title>
 600       <para>When the file system is first formatted, parameters can simply be
 601       added as a
 602       <literal>--param</literal> option to the
 603       <literal>mkfs.lustre</literal> command. For example:</para>
 604       <screen>
 605 mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
 606 </screen>
 607       <para>For more details about creating a file system,see
 608       <xref linkend="configuringlustre" />. For more details about
 609       <literal>mkfs.lustre</literal>, see
 610       <xref linkend="systemconfigurationutilities" />.</para>
 611     </section>
 612     <section xml:id="setting_param_tunefs">
 613       <title>Setting Parameters with
 614       <literal>tunefs.lustre</literal></title>
 615       <para>If a server (OSS or MDS) is stopped, parameters can be added to an
 616       existing file system using the
 617       <literal>--param</literal> option to the
 618       <literal>tunefs.lustre</literal> command. For example:</para>
 619       <screen>
 620 oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
 621 </screen>
 622       <para>With
 623       <literal>tunefs.lustre</literal>, parameters are
 624       <emphasis>additive</emphasis>-- new parameters are specified in addition
 625       to old parameters, they do not replace them. To erase all old
 626       <literal>tunefs.lustre</literal> parameters and just use newly-specified
 627       parameters, run:</para>
 628       <screen>
 629 mds# tunefs.lustre --erase-params --param=
 630 <replaceable>new_parameters</replaceable>
 631 </screen>
 632       <para>The tunefs.lustre command can be used to set any parameter settable
 633       via <literal>lctl conf_param</literal> and that has its own OBD device,
 634       so it can be specified as
 635       <literal>
 636       <replaceable>obdname|fsname</replaceable>.
 637       <replaceable>obdtype</replaceable>.
 638       <replaceable>proc_file_name</replaceable>=
 639       <replaceable>value</replaceable></literal>. For example:</para>
 640       <screen>
 641 mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
 642 </screen>
 643       <para>For more details about
 644       <literal>tunefs.lustre</literal>, see
 645       <xref linkend="systemconfigurationutilities" />.</para>
 646     </section>
 647     <section xml:id="setting_param_with_lctl">
 648       <title>Setting Parameters with
 649       <literal>lctl</literal></title>
 650       <para>When the file system is running, the
 651       <literal>lctl</literal> command can be used to set parameters (temporary
 652       or permanent) and report current parameter values. Temporary parameters
 653       are active as long as the server or client is not shut down. Permanent
 654       parameters live through server and client reboots.</para>
 655       <note>
 656         <para>The <literal>lctl list_param</literal> command enables users to
 657           list all parameters that can be set. See
 658         <xref linkend="list_params" />.</para>
 659       </note>
 660       <para>For more details about the
 661       <literal>lctl</literal> command, see the examples in the sections below
 662       and
 663       <xref linkend="systemconfigurationutilities" />.</para>
 664       <section remap="h4">
 665         <title>Setting Temporary Parameters</title>
 666         <para>Use
 667         <literal>lctl set_param</literal> to set temporary parameters on the
 668         node where it is run. These parameters internally map to corresponding
 669         items in the kernel <literal>/proc/{fs,sys}/{lnet,lustre}</literal> and
 670         <literal>/sys/{fs,kernel/debug}/lustre</literal> virtual filesystems.
 671         However, since the mapping between a particular parameter name and the
 672         underlying virtual pathname may change, it is <emphasis>not</emphasis>
 673         recommended to access the virtual pathname directly. The
 674         <literal>lctl set_param</literal> command uses this syntax:</para>
 675         <screen>
 676 lctl set_param [-n] [-P]
 677 <replaceable>obdtype</replaceable>.
 678 <replaceable>obdname</replaceable>.
 679 <replaceable>proc_file_name</replaceable>=
 680 <replaceable>value</replaceable>
 681 </screen>
 682         <para>For example:</para>
 683         <screen>
 684 # lctl set_param osc.*.max_dirty_mb=1024
 685 osc.myth-OST0000-osc.max_dirty_mb=32
 686 osc.myth-OST0001-osc.max_dirty_mb=32
 687 osc.myth-OST0002-osc.max_dirty_mb=32
 688 osc.myth-OST0003-osc.max_dirty_mb=32
 689 osc.myth-OST0004-osc.max_dirty_mb=32
 690 </screen>
 691       </section>
 692       <section xml:id="setting_permanent_params">
 693         <title>Setting Permanent Parameters</title>
 694         <para>Use <literal>lctl set_param -P</literal> or
 695         <literal>lctl conf_param</literal> command to set permanent parameters.
 696         In general, the
 697         <literal>lctl conf_param</literal> command can be used to specify any
 698         settable parameter with its own OBD device. The
 699         <literal>lctl conf_param</literal> command uses the following syntax
 700         (the same as the <literal>mkfs.lustre</literal> and
 701         <literal>tunefs.lustre</literal> commands):</para>
 702         <screen>
 703 <replaceable>obdname|fsname</replaceable>.
 704 <replaceable>obdtype</replaceable>.
 705 <replaceable>proc_file_name</replaceable>=
 706 <replaceable>value</replaceable>)
 707 </screen>
 708         <note><para>The <literal>lctl conf_param</literal> and
 709         <literal>lctl set_param</literal> syntax is <emphasis>not</emphasis>
 710         the same.</para></note>
 711         <para>Here are a few examples of
 712         <literal>lctl conf_param</literal> commands:</para>
 713         <screen>
 714 mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
 715 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
 716 $ lctl conf_param testfs.llite.max_read_ahead_mb=16
 717 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M
 718 $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
 719 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
 720 $ lctl conf_param testfs.sys.timeout=40
 721 </screen>
 722         <caution>
 723           <para>Parameters specified with the
 724           <literal>lctl conf_param</literal> command are set permanently in the
 725           file system's configuration file on the MGS.</para>
 726         </caution>
 727       </section>
 728       <section xml:id="setparamp" condition='l25'>
 729         <title>Setting Permanent Parameters with lctl set_param -P</title>
 730         <para>The <literal>lctl set_param -P</literal> command can also
 731           set parameters permanently using the same syntax as
 732           <literal>lctl set_param</literal> and <literal>lctl
 733           get_param</literal> commands. This command must be issued on the MGS.
 734           The given parameter is set on every host using
 735           <literal>lctl</literal> upcall.  The <literal>lctl set_param</literal>
 736           command uses the following syntax:</para>
 737         <screen>
 738 lctl set_param -P
 739 <replaceable>obdtype</replaceable>.
 740 <replaceable>obdname</replaceable>.
 741 <replaceable>proc_file_name</replaceable>=
 742 <replaceable>value</replaceable>
 743 </screen>
 744         <para>For example:</para>
 745         <screen>
 746 # lctl set_param -P osc.*.max_dirty_mb=1024
 747 osc.myth-OST0000-osc.max_dirty_mb=32
 748 osc.myth-OST0001-osc.max_dirty_mb=32
 749 osc.myth-OST0002-osc.max_dirty_mb=32
 750 osc.myth-OST0003-osc.max_dirty_mb=32
 751 osc.myth-OST0004-osc.max_dirty_mb=32
 752 </screen>
 753         <para>Use
 754         <literal>-d</literal>(only with -P) option to delete permanent
 755         parameter. Syntax:</para>
 756         <screen>
 757 lctl set_param -P -d
 758 <replaceable>obdtype</replaceable>.
 759 <replaceable>obdname</replaceable>.
 760 <replaceable>parameter_name</replaceable>
 761 </screen>
 762         <para>For example:</para>
 763         <screen>
 764 # lctl set_param -P -d osc.*.max_dirty_mb
 765 </screen>
 766         <note condition='l2c'><para>Starting in Lustre 2.12, there is
 767         <literal>lctl get_param</literal> command can provide
 768         <emphasis>tab completion</emphasis> when using an interactive shell
 769         with <literal>bash-completion</literal> installed.  This simplifies
 770         the use of <literal>get_param</literal> significantly, since it
 771         provides an interactive list of available parameters.
 772         </para></note>
 773       </section>
 774       <section xml:id="list_params">
 775         <title>Listing Parameters</title>
 776         <para>To list Lustre or LNet parameters that are available to set, use
 777         the
 778         <literal>lctl list_param</literal> command. For example:</para>
 779         <screen>
 780 lctl list_param [-FR]
 781 <replaceable>obdtype</replaceable>.
 782 <replaceable>obdname</replaceable>
 783 </screen>
 784         <para>The following arguments are available for the
 785         <literal>lctl list_param</literal> command.</para>
 786         <para>
 787         <literal>-F</literal> Add '
 788         <literal>/</literal>', '
 789         <literal>@</literal>' or '
 790         <literal>=</literal>' for directories, symlinks and writeable files,
 791         respectively</para>
 792         <para>
 793         <literal>-R</literal> Recursively lists all parameters under the
 794         specified path</para>
 795         <para>For example:</para>
 796         <screen>
 797 oss# lctl list_param obdfilter.lustre-OST0000
 798 </screen>
 799       </section>
 800       <section xml:id="reporting_current_param">
 801         <title>Reporting Current Parameter Values</title>
 802         <para>To report current Lustre parameter values, use the
 803         <literal>lctl get_param</literal> command with this syntax:</para>
 804         <screen>
 805 lctl get_param [-n]
 806 <replaceable>obdtype</replaceable>.
 807 <replaceable>obdname</replaceable>.
 808 <replaceable>proc_file_name</replaceable>
 809 </screen>
 810         <note condition='l2c'><para>Starting in Lustre 2.12, there is
 811         <literal>lctl get_param</literal> command can provide
 812         <emphasis>tab completion</emphasis> when using an interactive shell
 813         with <literal>bash-completion</literal> installed.  This simplifies
 814         the use of <literal>get_param</literal> significantly, since it
 815         provides an interactive list of available parameters.
 816         </para></note>
 817         <para>This example reports data on RPC service times.</para>
 818         <screen>
 819 oss# lctl get_param -n ost.*.ost_io.timeouts
 820 service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
 821 </screen>
 822         <para>This example reports the amount of space this client has reserved
 823         for writeback cache with each OST:</para>
 824         <screen>
 825 client# lctl get_param osc.*.cur_grant_bytes
 826 osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
 827 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
 828 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
 829 osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
 830 osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
 831 </screen>
 832       </section>
 833     </section>
 834   </section>
 835   <section xml:id="failover_nids">
 836     <title>
 837     <indexterm>
 838       <primary>operations</primary>
 839       <secondary>failover</secondary>
 840     </indexterm>Specifying NIDs and Failover</title>
 841     <para>If a node has multiple network interfaces, it may have multiple NIDs,
 842     which must all be identified so other nodes can choose the NID that is
 843     appropriate for their network interfaces. Typically, NIDs are specified in
 844     a list delimited by commas (
 845     <literal>,</literal>). However, when failover nodes are specified, the NIDs
 846     are delimited by a colon (
 847     <literal>:</literal>) or by repeating a keyword such as
 848     <literal>--mgsnode=</literal> or
 849     <literal>--servicenode=</literal>).</para>
 850     <para>To display the NIDs of all servers in networks configured to work
 851     with the Lustre file system, run (while LNet is running):</para>
 852     <screen>
 853 lctl list_nids
 854 </screen>
 855     <para>In the example below,
 856     <literal>mds0</literal> and
 857     <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
 858     and
 859     <literal>oss0</literal> and
 860     <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
 861     address for
 862     <literal>mds0</literal> is 192.168.10.1, and for
 863     <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
 864     <literal>oss0</literal> and
 865     <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
 866     respectively.</para>
 867     <screen>
 868 mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
 869         --servicenode=192.168.10.2@tcp0 \
 870         -–servicenode=192.168.10.1@tcp0 /dev/sda1
 871 mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
 872 oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \
 873         --servicenode=192.168.10.21 --ost --index=0 \
 874         --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \
 875         /dev/sdb
 876 oss0# mount -t lustre /dev/sdb /mnt/test/ost0
 877 client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \
 878         /mnt/testfs
 879 mds0# umount /mnt/mdt
 880 mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
 881 mds1# lctl get_param mdt.testfs-MDT0000.recovery_status
 882 </screen>
 883     <para>Where multiple NIDs are specified separated by commas (for example,
 884     <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
 885     to the same host, and the Lustre software chooses the
 886     <emphasis>best</emphasis> one for communication. When a pair of NIDs is
 887     separated by a colon (for example,
 888     <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
 889     to two different hosts and are treated as a failover pair (the Lustre
 890     software tries the first one, and if that fails, it tries the second
 891     one.)</para>
 892     <para>Two options to
 893     <literal>mkfs.lustre</literal> can be used to specify failover nodes.  The
 894     <literal>--servicenode</literal> option is used to specify all service NIDs,
 895     including those for primary nodes and failover nodes. When the
 896     <literal>--servicenode</literal> option is used, the first service node to
 897     load the target device becomes the primary service node, while nodes
 898     corresponding to the other specified NIDs become failover locations for the
 899     target device. An older option, <literal>--failnode</literal>, specifies
 900     just the NIDs of failover nodes.  For more information about the
 901     <literal>--servicenode</literal> and
 902     <literal>--failnode</literal> options, see
 903     <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 904     linkend="configuringfailover" />.</para>
 905   </section>
 906   <section xml:id="erasing_filesystem">
 907     <title>
 908     <indexterm>
 909       <primary>operations</primary>
 910       <secondary>erasing a file system</secondary>
 911     </indexterm>Erasing a File System</title>
 912     <para>If you want to erase a file system and permanently delete all the
 913     data in the file system, run this command on your targets:</para>
 914     <screen>
 915 $ "mkfs.lustre --reformat"
 916 </screen>
 917     <para>If you are using a separate MGS and want to keep other file systems
 918     defined on that MGS, then set the
 919     <literal>writeconf</literal> flag on the MDT for that file system. The
 920     <literal>writeconf</literal> flag causes the configuration logs to be
 921     erased; they are regenerated the next time the servers start.</para>
 922     <para>To set the
 923     <literal>writeconf</literal> flag on the MDT:</para>
 924     <orderedlist>
 925       <listitem>
 926         <para>Unmount all clients/servers using this file system, run:</para>
 927         <screen>
 928 $ umount /mnt/lustre
 929 </screen>
 930       </listitem>
 931       <listitem>
 932         <para>Permanently erase the file system and, presumably, replace it
 933         with another file system, run:</para>
 934         <screen>
 935 $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
 936 <emphasis>{mdsdev}</emphasis>
 937 </screen>
 938       </listitem>
 939       <listitem>
 940         <para>If you have a separate MGS (that you do not want to reformat),
 941         then add the
 942         <literal>--writeconf</literal> flag to
 943         <literal>mkfs.lustre</literal> on the MDT, run:</para>
 944         <screen>
 945 $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
 946 <replaceable>mgs_nid</replaceable> --mdt --index=0
 947 <replaceable>/dev/mds_device</replaceable>
 948 </screen>
 949       </listitem>
 950     </orderedlist>
 951     <note>
 952       <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
 953       MGS as well, causing all configuration information to be lost; you can
 954       start building your new file system. Nothing needs to be done with old
 955       disks that will not be part of the new file system, just do not mount
 956       them.</para>
 957     </note>
 958   </section>
 959   <section xml:id="reclaiming_reserved_disk_space">
 960     <title>
 961     <indexterm>
 962       <primary>operations</primary>
 963       <secondary>reclaiming space</secondary>
 964     </indexterm>Reclaiming Reserved Disk Space</title>
 965     <para>All current Lustre installations run the ldiskfs file system
 966     internally on service nodes. By default, ldiskfs reserves 5% of the disk
 967     space to avoid file system fragmentation. In order to reclaim this space,
 968     run the following command on your OSS for each OST in the file
 969     system:</para>
 970     <screen>
 971 tune2fs [-m reserved_blocks_percent] /dev/
 972 <emphasis>{ostdev}</emphasis>
 973 </screen>
 974     <para>You do not need to shut down Lustre before running this command or
 975     restart it afterwards.</para>
 976     <warning>
 977       <para>Reducing the space reservation can cause severe performance
 978       degradation as the OST file system becomes more than 95% full, due to
 979       difficulty in locating large areas of contiguous free space. This
 980       performance degradation may persist even if the space usage drops below
 981       95% again. It is recommended NOT to reduce the reserved disk space below
 982       5%.</para>
 983     </warning>
 984   </section>
 985   <section xml:id="replacing_existing_ost_mdt">
 986     <title>
 987     <indexterm>
 988       <primary>operations</primary>
 989       <secondary>replacing an OST or MDS</secondary>
 990     </indexterm>Replacing an Existing OST or MDT</title>
 991     <para>To copy the contents of an existing OST to a new OST (or an old MDT
 992     to a new MDT), follow the process for either OST/MDT backups in
 993     <xref linkend='backup_device' />or
 994     <xref linkend='backup_fs_level' />.
 995     For more information on removing a MDT, see
 996     <xref linkend='lustremaint.rmremotedir' />.</para>
 997   </section>
 998   <section xml:id="identifying_file_objects">
 999     <title>
1000     <indexterm>
1001       <primary>operations</primary>
1002       <secondary>identifying OSTs</secondary>
1003     </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
1004     <para>Use this procedure to identify the file containing a given object on
1005     a given OST.</para>
1006     <orderedlist>
1007       <listitem>
1008         <para>On the OST (as root), run
1009         <literal>debugfs</literal> to display the file identifier (
1010         <literal>FID</literal>) of the file associated with the object.</para>
1011         <para>For example, if the object is
1012         <literal>34976</literal> on
1013         <literal>/dev/lustre/ost_test2</literal>, the debug command is:
1014         <screen>
1015 # debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
1016 </screen></para>
1017         <para>The command output is:
1018         <screen>
1019 debugfs 1.45.6.wc1 (20-Mar-2020)
1020 /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
1021 Inode: 352365   Type: regular    Mode:  0666   Flags: 0x80000
1022 Generation: 2393149953    Version: 0x0000002a:00005f81
1023 User:  1000   Group:  1000   Size: 260096
1024 File ACL: 0    Directory ACL: 0
1025 Links: 1   Blockcount: 512
1026 Fragment:  Address: 0    Number: 0    Size: 0
1027 ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
1028 atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
1029 mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
1030 crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
1031 Size of extra inode fields: 24
1032 Extended attributes stored in inode body:
1033   fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
1034 00 00 00 00 00 00 00 00 " (32)
1035   fid: objid=34976 seq=0 parent=[0x200000400:0x122:0x0] stripe=1
1036 EXTENTS:
1037 (0-64):4620544-4620607
1038 </screen></para>
1039       </listitem>
1040       <listitem>
1041         <para>The parent FID will be of the form
1042         <literal>[0x200000400:0x122:0x0]</literal> and can be resolved directly
1043         using the command <literal>lfs fid2path [0x200000404:0x122:0x0]
1044         /mnt/lustre</literal> on any Lustre client, and the process is
1045         complete.</para>
1046       </listitem>
1047       <listitem>
1048         <para>In cases of an upgraded 1.x inode (if the first part of the
1049         FID is below 0x200000400), the MDT inode number is
1050         <literal>0x24dab9</literal> and generation
1051         <literal>0x3f0dfa6a</literal> and the pathname can also be resolved
1052         using
1053         <literal>debugfs</literal>.</para>
1054       </listitem>
1055       <listitem>
1056         <para>On the MDS (as root), use
1057         <literal>debugfs</literal> to find the file associated with the
1058         inode:</para>
1059         <screen>
1060 # debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
1061 </screen>
1062         <para>Here is the command output:</para>
1063         <screen>
1064 debugfs 1.42.3.wc3 (15-Aug-2012)
1065 /dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
1066 s
1067 Inode      Pathname
1068 2415289    /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
1069 </screen>
1070       </listitem>
1071     </orderedlist>
1072     <para>The command lists the inode and pathname associated with the
1073     object.</para>
1074     <note>
1075       <para>
1076       <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
1077       take a long time to complete.</para>
1078     </note>
1079     <note>
1080       <para>To find the Lustre file from a disk LBA, follow the steps listed in
1081       the document at this URL:
1082       <link xl:href="https://www.smartmontools.org/wiki/BadBlockHowto">
1083       https://www.smartmontools.org/wiki/BadBlockHowto</link>. Then,
1084       follow the steps above to resolve the Lustre filename.</para>
1085     </note>
1086   </section>
1087 </chapter>
1088 <!--
1089   vim:expandtab:shiftwidth=2:tabstop=8:
1090   -->