1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="lustreoperations">
5 <title xml:id="lustreoperations.title">Lustre Operations</title>
6 <para>Once you have the Lustre file system up and running, you can use the
7 procedures in this section to perform these basic Lustre administration
12 <xref linkend="dbdoclet.50438194_42877" />
17 <xref linkend="dbdoclet.50438194_24122" />
22 <xref linkend="dbdoclet.50438194_84876" />
27 <xref linkend="dbdoclet.50438194_69255" />
32 <xref linkend="dbdoclet.50438194_57420" />
37 <xref linkend="dbdoclet.50438194_54138" />
42 <xref linkend="dbdoclet.50438194_88063" />
47 <xref linkend="dbdoclet.lfsmkdir" />
52 <xref linkend="dbdoclet.50438194_88980" />
57 <xref linkend="dbdoclet.50438194_41817" />
62 <xref linkend="dbdoclet.50438194_70905" />
67 <xref linkend="dbdoclet.50438194_16954" />
72 <xref linkend="dbdoclet.50438194_69998" />
77 <xref linkend="dbdoclet.50438194_30872" />
81 <section xml:id="dbdoclet.50438194_42877">
84 <primary>operations</primary>
87 <primary>operations</primary>
88 <secondary>mounting by label</secondary>
89 </indexterm>Mounting by Label</title>
90 <para>The file system name is limited to 8 characters. We have encoded the
91 file system and target information in the disk label, so you can mount by
92 label. This allows system administrators to move disks around without
93 worrying about issues such as SCSI disk reordering or getting the
94 <literal>/dev/device</literal> wrong for a shared target. Soon, file system
95 naming will be made as fail-safe as possible. Currently, Linux disk labels
96 are limited to 16 characters. To identify the target within the file
97 system, 8 characters are reserved, leaving 8 characters for the file system
100 <replaceable>fsname</replaceable>-MDT0000 or
101 <replaceable>fsname</replaceable>-OST0a19
103 <para>To mount by label, use this command:</para>
106 <replaceable>file_system_label</replaceable>
107 <replaceable>/mount_point</replaceable>
109 <para>This is an example of mount-by-label:</para>
111 mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
114 <para>Mount-by-label should NOT be used in a multi-path environment or
115 when snapshots are being created of the device, since multiple block
116 devices will have the same label.</para>
118 <para>Although the file system name is internally limited to 8 characters,
119 you can mount the clients at any mount point, so file system users are not
120 subjected to short names. Here is an example:</para>
122 client# mount -t lustre mds0@tcp0:/short
123 <replaceable>/dev/long_mountpoint_name</replaceable>
126 <section xml:id="dbdoclet.50438194_24122">
129 <primary>operations</primary>
130 <secondary>starting</secondary>
131 </indexterm>Starting Lustre</title>
132 <para>On the first start of a Lustre file system, the components must be
133 started in the following order:</para>
136 <para>Mount the MGT.</para>
138 <para>If a combined MGT/MDT is present, Lustre will correctly mount
139 the MGT and MDT automatically.</para>
143 <para>Mount the MDT.</para>
145 <para condition='l24'>Mount all MDTs if multiple MDTs are
150 <para>Mount the OST(s).</para>
153 <para>Mount the client(s).</para>
157 <section xml:id="dbdoclet.50438194_84876">
160 <primary>operations</primary>
161 <secondary>mounting</secondary>
162 </indexterm>Mounting a Server</title>
163 <para>Starting a Lustre server is straightforward and only involves the
164 mount command. Lustre servers can be added to
165 <literal>/etc/fstab</literal>:</para>
169 <para>The mount command generates output similar to this:</para>
171 /dev/sda1 on /mnt/test/mdt type lustre (rw)
172 /dev/sda2 on /mnt/test/ost0 type lustre (rw)
173 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
175 <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
178 LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
179 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
181 <para>In general, it is wise to specify noauto and let your
182 high-availability (HA) package manage when to mount the device. If you are
183 not using failover, make sure that networking has been started before
184 mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
185 Linux Enterprise Server, Debian operating system (and perhaps others), use
187 <literal>_netdev</literal> flag to ensure that these disks are mounted after
188 the network is up.</para>
189 <para>We are mounting by disk label here. The label of a device can be read
191 <literal>e2label</literal>. The label of a newly-formatted Lustre server
193 <literal>FFFF</literal> if the
194 <literal>--index</literal> option is not specified to
195 <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
196 assignment takes place when the server is first started, and the disk label
197 is updated. It is recommended that the
198 <literal>--index</literal> option always be used, which will also ensure
199 that the label is set at format time.</para>
201 <para>Do not do this when the client and OSS are on the same node, as
202 memory pressure between the client and OSS can lead to deadlocks.</para>
205 <para>Mount-by-label should NOT be used in a multi-path
209 <section xml:id="dbdoclet.50438194_69255">
212 <primary>operations</primary>
213 <secondary>unmounting</secondary>
214 </indexterm>Unmounting a Server</title>
215 <para>To stop a Lustre server, use the
217 <replaceable>/mount</replaceable>
218 <replaceable>point</replaceable></literal> command.</para>
219 <para>For example, to stop
220 <literal>ost0</literal> on mount point
221 <literal>/mnt/test</literal>, run:</para>
225 <para>Gracefully stopping a server with the
226 <literal>umount</literal> command preserves the state of the connected
227 clients. The next time the server is started, it waits for clients to
228 reconnect, and then goes through the recovery procedure.</para>
230 <literal>-f</literal>) flag is used, then the server evicts all clients and
231 stops WITHOUT recovery. Upon restart, the server does not wait for
232 recovery. Any currently connected clients receive I/O errors until they
235 <para>If you are using loopback devices, use the
236 <literal>-d</literal> flag. This flag cleans up loop devices and can
237 always be safely specified.</para>
240 <section xml:id="dbdoclet.50438194_57420">
243 <primary>operations</primary>
244 <secondary>failover</secondary>
245 </indexterm>Specifying Failout/Failover Mode for OSTs</title>
246 <para>In a Lustre file system, an OST that has become unreachable because
247 it fails, is taken off the network, or is unmounted can be handled in one
252 <literal>failout</literal> mode, Lustre clients immediately receive
253 errors (EIOs) after a timeout, instead of waiting for the OST to
258 <literal>failover</literal> mode, Lustre clients wait for the OST to
262 <para>By default, the Lustre file system uses
263 <literal>failover</literal> mode for OSTs. To specify
264 <literal>failout</literal> mode instead, use the
265 <literal>--param="failover.mode=failout"</literal> option as shown below
266 (entered on one line):</para>
268 oss# mkfs.lustre --fsname=
269 <replaceable>fsname</replaceable> --mgsnode=
270 <replaceable>mgs_NID</replaceable> --param=failover.mode=failout
272 <replaceable>ost_index</replaceable>
273 <replaceable>/dev/ost_block_device</replaceable>
275 <para>In the example below,
276 <literal>failout</literal> mode is specified for the OSTs on the MGS
277 <literal>mds0</literal> in the file system
278 <literal>testfs</literal>(entered on one line).</para>
280 oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
281 --ost --index=3 /dev/sdb
284 <para>Before running this command, unmount all OSTs that will be affected
286 <literal>failover</literal>/
287 <literal>failout</literal> mode.</para>
290 <para>After initial file system configuration, use the
291 <literal>tunefs.lustre</literal> utility to change the mode. For example,
293 <literal>failout</literal> mode, run:</para>
296 $ tunefs.lustre --param failover.mode=failout
297 <replaceable>/dev/ost_device</replaceable>
302 <section xml:id="dbdoclet.50438194_54138">
305 <primary>operations</primary>
306 <secondary>degraded OST RAID</secondary>
307 </indexterm>Handling Degraded OST RAID Arrays</title>
308 <para>Lustre includes functionality that notifies Lustre if an external
309 RAID array has degraded performance (resulting in reduced overall file
310 system performance), either because a disk has failed and not been
311 replaced, or because a disk was replaced and is undergoing a rebuild. To
312 avoid a global performance slowdown due to a degraded OST, the MDS can
313 avoid the OST for new object allocation if it is notified of the degraded
315 <para>A parameter for each OST, called
316 <literal>degraded</literal>, specifies whether the OST is running in
317 degraded mode or not.</para>
318 <para>To mark the OST as degraded, use:</para>
320 lctl set_param obdfilter.{OST_name}.degraded=1
322 <para>To mark that the OST is back in normal operation, use:</para>
324 lctl set_param obdfilter.{OST_name}.degraded=0
326 <para>To determine if OSTs are currently in degraded mode, use:</para>
328 lctl get_param obdfilter.*.degraded
330 <para>If the OST is remounted due to a reboot or other condition, the flag
332 <literal>0</literal>.</para>
333 <para>It is recommended that this be implemented by an automated script
334 that monitors the status of individual RAID devices.</para>
336 <section xml:id="dbdoclet.50438194_88063">
339 <primary>operations</primary>
340 <secondary>multiple file systems</secondary>
341 </indexterm>Running Multiple Lustre File Systems</title>
342 <para>Lustre supports multiple file systems provided the combination of
343 <literal>NID:fsname</literal> is unique. Each file system must be allocated
344 a unique name during creation with the
345 <literal>--fsname</literal> parameter. Unique names for file systems are
346 enforced if a single MGS is present. If multiple MGSs are present (for
347 example if you have an MGS on every MDS) the administrator is responsible
348 for ensuring file system names are unique. A single MGS and unique file
349 system names provides a single point of administration and allows commands
350 to be issued against the file system even if it is not mounted.</para>
351 <para>Lustre supports multiple file systems on a single MGS. With a single
352 MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
353 to co-exist. For example, multiple MGSs will be necessary if multiple file
354 systems on different Lustre software versions are to be concurrently
355 available. With multiple MGSs additional care must be taken to ensure file
356 system names are unique. Each file system should have a unique fsname among
357 all systems that may interoperate in the future.</para>
358 <para>By default, the
359 <literal>mkfs.lustre</literal> command creates a file system named
360 <literal>lustre</literal>. To specify a different file system name (limited
361 to 8 characters) at format time, use the
362 <literal>--fsname</literal> option:</para>
365 mkfs.lustre --fsname=
366 <replaceable>file_system_name</replaceable>
370 <para>The MDT, OSTs and clients in the new file system must use the same
371 file system name (prepended to the device name). For example, for a new
373 <literal>foo</literal>, the MDT and two OSTs would be named
374 <literal>foo-MDT0000</literal>,
375 <literal>foo-OST0000</literal>, and
376 <literal>foo-OST0001</literal>.</para>
378 <para>To mount a client on the file system, run:</para>
380 client# mount -t lustre
381 <replaceable>mgsnode</replaceable>:
382 <replaceable>/new_fsname</replaceable>
383 <replaceable>/mount_point</replaceable>
385 <para>For example, to mount a client on file system foo at mount point
386 /mnt/foo, run:</para>
388 client# mount -t lustre mgsnode:/foo /mnt/foo
391 <para>If a client(s) will be mounted on several file systems, add the
393 <literal>/etc/xattr.conf</literal> file to avoid problems when files are
394 moved between the file systems:
395 <literal>lustre.* skip</literal></para>
398 <para>To ensure that a new MDT is added to an existing MGS create the MDT
400 <literal>--mdt --mgsnode=
401 <replaceable>mgs_NID</replaceable></literal>.</para>
403 <para>A Lustre installation with two file systems (
404 <literal>foo</literal> and
405 <literal>bar</literal>) could look like this, where the MGS node is
406 <literal>mgsnode@tcp0</literal> and the mount points are
407 <literal>/mnt/foo</literal> and
408 <literal>/mnt/bar</literal>.</para>
410 mgsnode# mkfs.lustre --mgs /dev/sda
411 mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
413 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
415 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
417 mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
419 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
421 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
424 <para>To mount a client on file system foo at mount point
425 <literal>/mnt/foo</literal>, run:</para>
427 client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
429 <para>To mount a client on file system bar at mount point
430 <literal>/mnt/bar</literal>, run:</para>
432 client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
435 <section xml:id="dbdoclet.lfsmkdir" condition='l24'>
438 <primary>operations</primary>
439 <secondary>remote directory</secondary>
440 </indexterm>Creating a sub-directory on a given MDT</title>
441 <para>Lustre 2.4 enables individual sub-directories to be serviced by
442 unique MDTs. An administrator can allocate a sub-directory to a given MDT
443 using the command:</para>
445 client# lfs mkdir –i
446 <replaceable>mdt_index</replaceable>
447 <replaceable>/mount_point/remote_dir</replaceable>
450 <para>This command will allocate the sub-directory
451 <literal>remote_dir</literal> onto the MDT of index
452 <literal>mdtindex</literal>. For more information on adding additional MDTs
454 <literal>mdtindex</literal> see
455 <xref linkend='dbdoclet.addmdtindex' />.</para>
457 <para>An administrator can allocate remote sub-directories to separate
458 MDTs. Creating remote sub-directories in parent directories not hosted on
459 MDT0 is not recommended. This is because the failure of the parent MDT
460 will leave the namespace below it inaccessible. For this reason, by
461 default it is only possible to create remote sub-directories off MDT0. To
462 relax this restriction and enable remote sub-directories off any MDT, an
463 administrator must issue the command
464 <literal>lctl set_param mdd.*.enable_remote_dir=1</literal>.</para>
467 <section xml:id="dbdoclet.50438194_88980">
470 <primary>operations</primary>
471 <secondary>parameters</secondary>
472 </indexterm>Setting and Retrieving Lustre Parameters</title>
473 <para>Several options are available for setting parameters in
477 <para>When creating a file system, use mkfs.lustre. See
478 <xref linkend="dbdoclet.50438194_17237" />below.</para>
481 <para>When a server is stopped, use tunefs.lustre. See
482 <xref linkend="dbdoclet.50438194_55253" />below.</para>
485 <para>When the file system is running, use lctl to set or retrieve
486 Lustre parameters. See
487 <xref linkend="dbdoclet.50438194_51490" />and
488 <xref linkend="dbdoclet.50438194_63247" />below.</para>
491 <section xml:id="dbdoclet.50438194_17237">
492 <title>Setting Tunable Parameters with
493 <literal>mkfs.lustre</literal></title>
494 <para>When the file system is first formatted, parameters can simply be
496 <literal>--param</literal> option to the
497 <literal>mkfs.lustre</literal> command. For example:</para>
499 mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
501 <para>For more details about creating a file system,see
502 <xref linkend="configuringlustre" />. For more details about
503 <literal>mkfs.lustre</literal>, see
504 <xref linkend="systemconfigurationutilities" />.</para>
506 <section xml:id="dbdoclet.50438194_55253">
507 <title>Setting Parameters with
508 <literal>tunefs.lustre</literal></title>
509 <para>If a server (OSS or MDS) is stopped, parameters can be added to an
510 existing file system using the
511 <literal>--param</literal> option to the
512 <literal>tunefs.lustre</literal> command. For example:</para>
514 oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
517 <literal>tunefs.lustre</literal>, parameters are
518 <emphasis>additive</emphasis>-- new parameters are specified in addition
519 to old parameters, they do not replace them. To erase all old
520 <literal>tunefs.lustre</literal> parameters and just use newly-specified
521 parameters, run:</para>
523 mds# tunefs.lustre --erase-params --param=
524 <replaceable>new_parameters</replaceable>
526 <para>The tunefs.lustre command can be used to set any parameter settable
527 in a /proc/fs/lustre file and that has its own OBD device, so it can be
530 <replaceable>obdname|fsname</replaceable>.
531 <replaceable>obdtype</replaceable>.
532 <replaceable>proc_file_name</replaceable>=
533 <replaceable>value</replaceable></literal>. For example:</para>
535 mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
537 <para>For more details about
538 <literal>tunefs.lustre</literal>, see
539 <xref linkend="systemconfigurationutilities" />.</para>
541 <section xml:id="dbdoclet.50438194_51490">
542 <title>Setting Parameters with
543 <literal>lctl</literal></title>
544 <para>When the file system is running, the
545 <literal>lctl</literal> command can be used to set parameters (temporary
546 or permanent) and report current parameter values. Temporary parameters
547 are active as long as the server or client is not shut down. Permanent
548 parameters live through server and client reboots.</para>
550 <para>The lctl list_param command enables users to list all parameters
552 <xref linkend="dbdoclet.50438194_88217" />.</para>
554 <para>For more details about the
555 <literal>lctl</literal> command, see the examples in the sections below
557 <xref linkend="systemconfigurationutilities" />.</para>
559 <title>Setting Temporary Parameters</title>
561 <literal>lctl set_param</literal> to set temporary parameters on the
562 node where it is run. These parameters map to items in
563 <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
564 <literal>lctl set_param</literal> command uses this syntax:</para>
567 <replaceable>obdtype</replaceable>.
568 <replaceable>obdname</replaceable>.
569 <replaceable>proc_file_name</replaceable>=
570 <replaceable>value</replaceable>
572 <para>For example:</para>
574 # lctl set_param osc.*.max_dirty_mb=1024
575 osc.myth-OST0000-osc.max_dirty_mb=32
576 osc.myth-OST0001-osc.max_dirty_mb=32
577 osc.myth-OST0002-osc.max_dirty_mb=32
578 osc.myth-OST0003-osc.max_dirty_mb=32
579 osc.myth-OST0004-osc.max_dirty_mb=32
582 <section xml:id="dbdoclet.50438194_64195">
583 <title>Setting Permanent Parameters</title>
585 <literal>lctl conf_param</literal> command to set permanent parameters.
587 <literal>lctl conf_param</literal> command can be used to specify any
588 parameter settable in a
589 <literal>/proc/fs/lustre</literal> file, with its own OBD device. The
590 <literal>lctl conf_param</literal> command uses this syntax (same as the
592 <literal>mkfs.lustre</literal> and
593 <literal>tunefs.lustre</literal> commands):</para>
595 <replaceable>obdname|fsname</replaceable>.
596 <replaceable>obdtype</replaceable>.
597 <replaceable>proc_file_name</replaceable>=
598 <replaceable>value</replaceable>)
600 <para>Here are a few examples of
601 <literal>lctl conf_param</literal> commands:</para>
603 mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
604 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
605 $ lctl conf_param testfs.llite.max_read_ahead_mb=16
606 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M
607 $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
608 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
609 $ lctl conf_param testfs.sys.timeout=40
612 <para>Parameters specified with the
613 <literal>lctl conf_param</literal> command are set permanently in the
614 file system's configuration file on the MGS.</para>
617 <section xml:id="dbdoclet.setparamp" condition='l25'>
618 <title>Setting Permanent Parameters with lctl set_param -P</title>
620 <literal>lctl set_param -P</literal> to set parameters permanently. This
621 command must be issued on the MGS. The given parameter is set on every
623 <literal>lctl</literal> upcall. Parameters map to items in
624 <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
625 <literal>lctl set_param</literal> command uses this syntax:</para>
628 <replaceable>obdtype</replaceable>.
629 <replaceable>obdname</replaceable>.
630 <replaceable>proc_file_name</replaceable>=
631 <replaceable>value</replaceable>
633 <para>For example:</para>
635 # lctl set_param -P osc.*.max_dirty_mb=1024
636 osc.myth-OST0000-osc.max_dirty_mb=32
637 osc.myth-OST0001-osc.max_dirty_mb=32
638 osc.myth-OST0002-osc.max_dirty_mb=32
639 osc.myth-OST0003-osc.max_dirty_mb=32
640 osc.myth-OST0004-osc.max_dirty_mb=32
643 <literal>-d</literal>(only with -P) option to delete permanent
644 parameter. Syntax:</para>
647 <replaceable>obdtype</replaceable>.
648 <replaceable>obdname</replaceable>.
649 <replaceable>proc_file_name</replaceable>
651 <para>For example:</para>
653 # lctl set_param -P -d osc.*.max_dirty_mb
656 <section xml:id="dbdoclet.50438194_88217">
657 <title>Listing Parameters</title>
658 <para>To list Lustre or LNET parameters that are available to set, use
660 <literal>lctl list_param</literal> command. For example:</para>
662 lctl list_param [-FR]
663 <replaceable>obdtype</replaceable>.
664 <replaceable>obdname</replaceable>
666 <para>The following arguments are available for the
667 <literal>lctl list_param</literal> command.</para>
669 <literal>-F</literal> Add '
670 <literal>/</literal>', '
671 <literal>@</literal>' or '
672 <literal>=</literal>' for directories, symlinks and writeable files,
675 <literal>-R</literal> Recursively lists all parameters under the
676 specified path</para>
677 <para>For example:</para>
679 oss# lctl list_param obdfilter.lustre-OST0000
682 <section xml:id="dbdoclet.50438194_63247">
683 <title>Reporting Current Parameter Values</title>
684 <para>To report current Lustre parameter values, use the
685 <literal>lctl get_param</literal> command with this syntax:</para>
688 <replaceable>obdtype</replaceable>.
689 <replaceable>obdname</replaceable>.
690 <replaceable>proc_file_name</replaceable>
692 <para>This example reports data on RPC service times.</para>
694 oss# lctl get_param -n ost.*.ost_io.timeouts
695 service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
697 <para>This example reports the amount of space this client has reserved
698 for writeback cache with each OST:</para>
700 client# lctl get_param osc.*.cur_grant_bytes
701 osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
702 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
703 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
704 osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
705 osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
710 <section xml:id="dbdoclet.50438194_41817">
713 <primary>operations</primary>
714 <secondary>failover</secondary>
715 </indexterm>Specifying NIDs and Failover</title>
716 <para>If a node has multiple network interfaces, it may have multiple NIDs,
717 which must all be identified so other nodes can choose the NID that is
718 appropriate for their network interfaces. Typically, NIDs are specified in
719 a list delimited by commas (
720 <literal>,</literal>). However, when failover nodes are specified, the NIDs
721 are delimited by a colon (
722 <literal>:</literal>) or by repeating a keyword such as
723 <literal>--mgsnode=</literal> or
724 <literal>--servicenode=</literal>).</para>
725 <para>To display the NIDs of all servers in networks configured to work
726 with the Lustre file system, run (while LNET is running):</para>
730 <para>In the example below,
731 <literal>mds0</literal> and
732 <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
734 <literal>oss0</literal> and
735 <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
737 <literal>mds0</literal> is 192.168.10.1, and for
738 <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
739 <literal>oss0</literal> and
740 <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
743 mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
744 --servicenode=192.168.10.2@tcp0 \
745 -–servicenode=192.168.10.1@tcp0 /dev/sda1
746 mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
747 oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \
748 --servicenode=192.168.10.21 --ost --index=0 \
749 --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \
751 oss0# mount -t lustre /dev/sdb /mnt/test/ost0
752 client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \
754 mds0# umount /mnt/mdt
755 mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
756 mds1# cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status
758 <para>Where multiple NIDs are specified separated by commas (for example,
759 <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
760 to the same host, and the Lustre software chooses the
761 <emphasis>best</emphasis>one for communication. When a pair of NIDs is
762 separated by a colon (for example,
763 <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
764 to two different hosts and are treated as a failover pair (the Lustre
765 software tries the first one, and if that fails, it tries the second
768 <literal>mkfs.lustre</literal> can be used to specify failover nodes.
769 Introduced in Lustre software release 2.0, the
770 <literal>--servicenode</literal> option is used to specify all service NIDs,
771 including those for primary nodes and failover nodes. When the
772 <literal>--servicenode</literal> option is used, the first service node to
773 load the target device becomes the primary service node, while nodes
774 corresponding to the other specified NIDs become failover locations for the
775 target device. An older option,
776 <literal>--failnode</literal>, specifies just the NIDS of failover nodes.
777 For more information about the
778 <literal>--servicenode</literal> and
779 <literal>--failnode</literal> options, see
780 <xref xmlns:xlink="http://www.w3.org/1999/xlink"
781 linkend="configuringfailover" />.</para>
783 <section xml:id="dbdoclet.50438194_70905">
786 <primary>operations</primary>
787 <secondary>erasing a file system</secondary>
788 </indexterm>Erasing a File System</title>
789 <para>If you want to erase a file system and permanently delete all the
790 data in the file system, run this command on your targets:</para>
792 $ "mkfs.lustre --reformat"
794 <para>If you are using a separate MGS and want to keep other file systems
795 defined on that MGS, then set the
796 <literal>writeconf</literal> flag on the MDT for that file system. The
797 <literal>writeconf</literal> flag causes the configuration logs to be
798 erased; they are regenerated the next time the servers start.</para>
800 <literal>writeconf</literal> flag on the MDT:</para>
803 <para>Unmount all clients/servers using this file system, run:</para>
809 <para>Permanently erase the file system and, presumably, replace it
810 with another file system, run:</para>
812 $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
813 <emphasis>{mdsdev}</emphasis>
817 <para>If you have a separate MGS (that you do not want to reformat),
819 <literal>--writeconf</literal> flag to
820 <literal>mkfs.lustre</literal> on the MDT, run:</para>
822 $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
823 <replaceable>mgs_nid</replaceable> --mdt --index=0
824 <replaceable>/dev/mds_device</replaceable>
829 <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
830 MGS as well, causing all configuration information to be lost; you can
831 start building your new file system. Nothing needs to be done with old
832 disks that will not be part of the new file system, just do not mount
836 <section xml:id="dbdoclet.50438194_16954">
839 <primary>operations</primary>
840 <secondary>reclaiming space</secondary>
841 </indexterm>Reclaiming Reserved Disk Space</title>
842 <para>All current Lustre installations run the ldiskfs file system
843 internally on service nodes. By default, ldiskfs reserves 5% of the disk
844 space to avoid file system fragmentation. In order to reclaim this space,
845 run the following command on your OSS for each OST in the file
848 tune2fs [-m reserved_blocks_percent] /dev/
849 <emphasis>{ostdev}</emphasis>
851 <para>You do not need to shut down Lustre before running this command or
852 restart it afterwards.</para>
854 <para>Reducing the space reservation can cause severe performance
855 degradation as the OST file system becomes more than 95% full, due to
856 difficulty in locating large areas of contiguous free space. This
857 performance degradation may persist even if the space usage drops below
858 95% again. It is recommended NOT to reduce the reserved disk space below
862 <section xml:id="dbdoclet.50438194_69998">
865 <primary>operations</primary>
866 <secondary>replacing an OST or MDS</secondary>
867 </indexterm>Replacing an Existing OST or MDT</title>
868 <para>To copy the contents of an existing OST to a new OST (or an old MDT
869 to a new MDT), follow the process for either OST/MDT backups in
870 <xref linkend='dbdoclet.50438207_71633' />or
871 <xref linkend='dbdoclet.50438207_21638' />. For more information on
873 <xref linkend='dbdoclet.rmremotedir' />.</para>
875 <section xml:id="dbdoclet.50438194_30872">
878 <primary>operations</primary>
879 <secondary>identifying OSTs</secondary>
880 </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
881 <para>Use this procedure to identify the file containing a given object on
885 <para>On the OST (as root), run
886 <literal>debugfs</literal> to display the file identifier (
887 <literal>FID</literal>) of the file associated with the object.</para>
888 <para>For example, if the object is
889 <literal>34976</literal> on
890 <literal>/dev/lustre/ost_test2</literal>, the debug command is:
892 # debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
894 <para>The command output is:
896 debugfs 1.42.3.wc3 (15-Aug-2012)
897 /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
898 Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000
899 Generation: 2393149953 Version: 0x0000002a:00005f81
900 User: 1000 Group: 1000 Size: 260096
901 File ACL: 0 Directory ACL: 0
902 Links: 1 Blockcount: 512
903 Fragment: Address: 0 Number: 0 Size: 0
904 ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
905 atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
906 mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
907 crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
908 Size of extra inode fields: 24
909 Extended attributes stored in inode body:
910 fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
911 00 00 00 00 00 00 00 00 " (32)
912 fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1
914 (0-64):4620544-4620607
918 <para>For Lustre software release 2.x file systems, the parent FID will
919 be of the form [0x200000400:0x122:0x0] and can be resolved directly
921 <literal>lfs fid2path [0x200000404:0x122:0x0]
922 /mnt/lustre</literal> command on any Lustre client, and the process is
926 <para>In this example the parent inode FID is an upgraded 1.x inode
927 (due to the first part of the FID being below 0x200000400), the MDT
929 <literal>0x24dab9</literal> and generation
930 <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved
932 <literal>debugfs</literal>.</para>
935 <para>On the MDS (as root), use
936 <literal>debugfs</literal> to find the file associated with the
939 # debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
941 <para>Here is the command output:</para>
943 debugfs 1.42.3.wc2 (15-Aug-2012)
944 /dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
947 2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
951 <para>The command lists the inode and pathname associated with the
955 <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
956 take a long time to complete.</para>
959 <para>To find the Lustre file from a disk LBA, follow the steps listed in
960 the document at this URL:
961 <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">
962 http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then,
963 follow the steps above to resolve the Lustre filename.</para>