1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="lustreoperations">
5 <title xml:id="lustreoperations.title">Lustre Operations</title>
6 <para>Once you have the Lustre file system up and running, you can use the
7 procedures in this section to perform these basic Lustre administration
9 <section xml:id="dbdoclet.50438194_42877">
12 <primary>operations</primary>
15 <primary>operations</primary>
16 <secondary>mounting by label</secondary>
17 </indexterm>Mounting by Label</title>
18 <para>The file system name is limited to 8 characters. We have encoded the
19 file system and target information in the disk label, so you can mount by
20 label. This allows system administrators to move disks around without
21 worrying about issues such as SCSI disk reordering or getting the
22 <literal>/dev/device</literal> wrong for a shared target. Soon, file system
23 naming will be made as fail-safe as possible. Currently, Linux disk labels
24 are limited to 16 characters. To identify the target within the file
25 system, 8 characters are reserved, leaving 8 characters for the file system
28 <replaceable>fsname</replaceable>-MDT0000 or
29 <replaceable>fsname</replaceable>-OST0a19
31 <para>To mount by label, use this command:</para>
34 <replaceable>file_system_label</replaceable>
35 <replaceable>/mount_point</replaceable>
37 <para>This is an example of mount-by-label:</para>
39 mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
42 <para>Mount-by-label should NOT be used in a multi-path environment or
43 when snapshots are being created of the device, since multiple block
44 devices will have the same label.</para>
46 <para>Although the file system name is internally limited to 8 characters,
47 you can mount the clients at any mount point, so file system users are not
48 subjected to short names. Here is an example:</para>
50 client# mount -t lustre mds0@tcp0:/short
51 <replaceable>/dev/long_mountpoint_name</replaceable>
54 <section xml:id="dbdoclet.50438194_24122">
57 <primary>operations</primary>
58 <secondary>starting</secondary>
59 </indexterm>Starting Lustre</title>
60 <para>On the first start of a Lustre file system, the components must be
61 started in the following order:</para>
64 <para>Mount the MGT.</para>
66 <para>If a combined MGT/MDT is present, Lustre will correctly mount
67 the MGT and MDT automatically.</para>
71 <para>Mount the MDT.</para>
73 <para condition='l24'>Mount all MDTs if multiple MDTs are
78 <para>Mount the OST(s).</para>
81 <para>Mount the client(s).</para>
85 <section xml:id="dbdoclet.50438194_84876">
88 <primary>operations</primary>
89 <secondary>mounting</secondary>
90 </indexterm>Mounting a Server</title>
91 <para>Starting a Lustre server is straightforward and only involves the
92 mount command. Lustre servers can be added to
93 <literal>/etc/fstab</literal>:</para>
97 <para>The mount command generates output similar to this:</para>
99 /dev/sda1 on /mnt/test/mdt type lustre (rw)
100 /dev/sda2 on /mnt/test/ost0 type lustre (rw)
101 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
103 <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
106 LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
107 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
109 <para>In general, it is wise to specify noauto and let your
110 high-availability (HA) package manage when to mount the device. If you are
111 not using failover, make sure that networking has been started before
112 mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
113 Linux Enterprise Server, Debian operating system (and perhaps others), use
115 <literal>_netdev</literal> flag to ensure that these disks are mounted after
116 the network is up.</para>
117 <para>We are mounting by disk label here. The label of a device can be read
119 <literal>e2label</literal>. The label of a newly-formatted Lustre server
121 <literal>FFFF</literal> if the
122 <literal>--index</literal> option is not specified to
123 <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
124 assignment takes place when the server is first started, and the disk label
125 is updated. It is recommended that the
126 <literal>--index</literal> option always be used, which will also ensure
127 that the label is set at format time.</para>
129 <para>Do not do this when the client and OSS are on the same node, as
130 memory pressure between the client and OSS can lead to deadlocks.</para>
133 <para>Mount-by-label should NOT be used in a multi-path
137 <section xml:id="dbdoclet.shutdownLustre">
140 <primary>operations</primary>
141 <secondary>shutdownLustre</secondary>
142 </indexterm>Stopping the Filesystem</title>
143 <para>A complete Lustre filesystem shutdown occurs by unmounting all
144 clients and servers in the order shown below. Please note that unmounting
145 a block device causes the Lustre software to be shut down on that node.
147 <note><para>Please note that the <literal>-a -t lustre</literal> in the
148 commands below is not the name of a filesystem, but rather is
149 specifying to unmount all entries in /etc/mtab that are of type
150 <literal>lustre</literal></para></note>
152 <listitem><para>Unmount the clients</para>
153 <para>On each client node, unmount the filesystem on that client
154 using the <literal>umount</literal> command:</para>
155 <para><literal>umount -a -t lustre</literal></para>
156 <para>The example below shows the unmount of the
157 <literal>testfs</literal> filesystem on a client node:</para>
158 <para><screen>[root@client1 ~]# mount |grep testfs
159 XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs)
161 [root@client1 ~]# umount -a -t lustre
162 [154523.177714] Lustre: Unmounted testfs-client</screen></para>
164 <listitem><para>Unmount the MDT and MGT</para>
165 <para>On the MGS and MDS node(s), use the <literal>umount</literal>
167 <para><literal>umount -a -t lustre</literal></para>
168 <para>The example below shows the unmount of the MDT and MGT for
169 the <literal>testfs</literal> filesystem on a combined MGS/MDS:
171 <para><screen>[root@mds1 ~]# mount |grep lustre
172 /dev/sda on /mnt/mgt type lustre (ro)
173 /dev/sdb on /mnt/mdt type lustre (ro)
175 [root@mds1 ~]# umount -a -t lustre
176 [155263.566230] Lustre: Failing over testfs-MDT0000
177 [155263.775355] Lustre: server umount testfs-MDT0000 complete
178 [155269.843862] Lustre: server umount MGS complete</screen></para>
179 <para>For a seperate MGS and MDS, the same command is used, first on
180 the MDS and then followed by the MGS.</para>
182 <listitem><para>Unmount all the OSTs</para>
183 <para>On each OSS node, use the <literal>umount</literal> command:
185 <para><literal>umount -a -t lustre</literal></para>
186 <para>The example below shows the unmount of all OSTs for the
187 <literal>testfs</literal> filesystem on server
188 <literal>OSS1</literal>:
190 <para><screen>[root@oss1 ~]# mount |grep lustre
191 /dev/sda on /mnt/ost0 type lustre (ro)
192 /dev/sdb on /mnt/ost1 type lustre (ro)
193 /dev/sdc on /mnt/ost2 type lustre (ro)
195 [root@oss1 ~]# umount -a -t lustre
196 [155336.491445] Lustre: Failing over testfs-OST0002
197 [155336.556752] Lustre: server umount testfs-OST0002 complete</screen></para>
200 <para>For unmount command syntax for a single OST, MDT, or MGT target
201 please refer to <xref linkend="dbdoclet.umountTarget"/></para>
203 <section xml:id="dbdoclet.umountTarget">
206 <primary>operations</primary>
207 <secondary>unmounting</secondary>
208 </indexterm>Unmounting a Specific Target on a Server</title>
209 <para>To stop a Lustre OST, MDT, or MGT , use the
211 <replaceable>/mount_point</replaceable></literal> command.</para>
212 <para>The example below stops an OST, <literal>ost0</literal>, on mount
213 point <literal>/mnt/ost0</literal> for the <literal>testfs</literal>
215 <screen>[root@oss1 ~]# umount /mnt/ost0
216 [ 385.142264] Lustre: Failing over testfs-OST0000
217 [ 385.210810] Lustre: server umount testfs-OST0000 complete</screen>
218 <para>Gracefully stopping a server with the
219 <literal>umount</literal> command preserves the state of the connected
220 clients. The next time the server is started, it waits for clients to
221 reconnect, and then goes through the recovery procedure.</para>
223 <literal>-f</literal>) flag is used, then the server evicts all clients and
224 stops WITHOUT recovery. Upon restart, the server does not wait for
225 recovery. Any currently connected clients receive I/O errors until they
228 <para>If you are using loopback devices, use the
229 <literal>-d</literal> flag. This flag cleans up loop devices and can
230 always be safely specified.</para>
233 <section xml:id="dbdoclet.50438194_57420">
236 <primary>operations</primary>
237 <secondary>failover</secondary>
238 </indexterm>Specifying Failout/Failover Mode for OSTs</title>
239 <para>In a Lustre file system, an OST that has become unreachable because
240 it fails, is taken off the network, or is unmounted can be handled in one
245 <literal>failout</literal> mode, Lustre clients immediately receive
246 errors (EIOs) after a timeout, instead of waiting for the OST to
251 <literal>failover</literal> mode, Lustre clients wait for the OST to
255 <para>By default, the Lustre file system uses
256 <literal>failover</literal> mode for OSTs. To specify
257 <literal>failout</literal> mode instead, use the
258 <literal>--param="failover.mode=failout"</literal> option as shown below
259 (entered on one line):</para>
261 oss# mkfs.lustre --fsname=
262 <replaceable>fsname</replaceable> --mgsnode=
263 <replaceable>mgs_NID</replaceable> --param=failover.mode=failout
265 <replaceable>ost_index</replaceable>
266 <replaceable>/dev/ost_block_device</replaceable>
268 <para>In the example below,
269 <literal>failout</literal> mode is specified for the OSTs on the MGS
270 <literal>mds0</literal> in the file system
271 <literal>testfs</literal>(entered on one line).</para>
273 oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
274 --ost --index=3 /dev/sdb
277 <para>Before running this command, unmount all OSTs that will be affected
279 <literal>failover</literal>/
280 <literal>failout</literal> mode.</para>
283 <para>After initial file system configuration, use the
284 <literal>tunefs.lustre</literal> utility to change the mode. For example,
286 <literal>failout</literal> mode, run:</para>
289 $ tunefs.lustre --param failover.mode=failout
290 <replaceable>/dev/ost_device</replaceable>
295 <section xml:id="dbdoclet.50438194_54138">
298 <primary>operations</primary>
299 <secondary>degraded OST RAID</secondary>
300 </indexterm>Handling Degraded OST RAID Arrays</title>
301 <para>Lustre includes functionality that notifies Lustre if an external
302 RAID array has degraded performance (resulting in reduced overall file
303 system performance), either because a disk has failed and not been
304 replaced, or because a disk was replaced and is undergoing a rebuild. To
305 avoid a global performance slowdown due to a degraded OST, the MDS can
306 avoid the OST for new object allocation if it is notified of the degraded
308 <para>A parameter for each OST, called
309 <literal>degraded</literal>, specifies whether the OST is running in
310 degraded mode or not.</para>
311 <para>To mark the OST as degraded, use:</para>
313 lctl set_param obdfilter.{OST_name}.degraded=1
315 <para>To mark that the OST is back in normal operation, use:</para>
317 lctl set_param obdfilter.{OST_name}.degraded=0
319 <para>To determine if OSTs are currently in degraded mode, use:</para>
321 lctl get_param obdfilter.*.degraded
323 <para>If the OST is remounted due to a reboot or other condition, the flag
325 <literal>0</literal>.</para>
326 <para>It is recommended that this be implemented by an automated script
327 that monitors the status of individual RAID devices.</para>
329 <section xml:id="dbdoclet.50438194_88063">
332 <primary>operations</primary>
333 <secondary>multiple file systems</secondary>
334 </indexterm>Running Multiple Lustre File Systems</title>
335 <para>Lustre supports multiple file systems provided the combination of
336 <literal>NID:fsname</literal> is unique. Each file system must be allocated
337 a unique name during creation with the
338 <literal>--fsname</literal> parameter. Unique names for file systems are
339 enforced if a single MGS is present. If multiple MGSs are present (for
340 example if you have an MGS on every MDS) the administrator is responsible
341 for ensuring file system names are unique. A single MGS and unique file
342 system names provides a single point of administration and allows commands
343 to be issued against the file system even if it is not mounted.</para>
344 <para>Lustre supports multiple file systems on a single MGS. With a single
345 MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
346 to co-exist. For example, multiple MGSs will be necessary if multiple file
347 systems on different Lustre software versions are to be concurrently
348 available. With multiple MGSs additional care must be taken to ensure file
349 system names are unique. Each file system should have a unique fsname among
350 all systems that may interoperate in the future.</para>
351 <para>By default, the
352 <literal>mkfs.lustre</literal> command creates a file system named
353 <literal>lustre</literal>. To specify a different file system name (limited
354 to 8 characters) at format time, use the
355 <literal>--fsname</literal> option:</para>
358 mkfs.lustre --fsname=
359 <replaceable>file_system_name</replaceable>
363 <para>The MDT, OSTs and clients in the new file system must use the same
364 file system name (prepended to the device name). For example, for a new
366 <literal>foo</literal>, the MDT and two OSTs would be named
367 <literal>foo-MDT0000</literal>,
368 <literal>foo-OST0000</literal>, and
369 <literal>foo-OST0001</literal>.</para>
371 <para>To mount a client on the file system, run:</para>
373 client# mount -t lustre
374 <replaceable>mgsnode</replaceable>:
375 <replaceable>/new_fsname</replaceable>
376 <replaceable>/mount_point</replaceable>
378 <para>For example, to mount a client on file system foo at mount point
379 /mnt/foo, run:</para>
381 client# mount -t lustre mgsnode:/foo /mnt/foo
384 <para>If a client(s) will be mounted on several file systems, add the
386 <literal>/etc/xattr.conf</literal> file to avoid problems when files are
387 moved between the file systems:
388 <literal>lustre.* skip</literal></para>
391 <para>To ensure that a new MDT is added to an existing MGS create the MDT
393 <literal>--mdt --mgsnode=
394 <replaceable>mgs_NID</replaceable></literal>.</para>
396 <para>A Lustre installation with two file systems (
397 <literal>foo</literal> and
398 <literal>bar</literal>) could look like this, where the MGS node is
399 <literal>mgsnode@tcp0</literal> and the mount points are
400 <literal>/mnt/foo</literal> and
401 <literal>/mnt/bar</literal>.</para>
403 mgsnode# mkfs.lustre --mgs /dev/sda
404 mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
406 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
408 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
410 mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
412 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
414 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
417 <para>To mount a client on file system foo at mount point
418 <literal>/mnt/foo</literal>, run:</para>
420 client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
422 <para>To mount a client on file system bar at mount point
423 <literal>/mnt/bar</literal>, run:</para>
425 client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
428 <section xml:id="dbdoclet.lfsmkdir" condition='l24'>
431 <primary>operations</primary>
432 <secondary>remote directory</secondary>
433 </indexterm>Creating a sub-directory on a given MDT</title>
434 <para>Lustre 2.4 enables individual sub-directories to be serviced by
435 unique MDTs. An administrator can allocate a sub-directory to a given MDT
436 using the command:</para>
438 client# lfs mkdir –i
439 <replaceable>mdt_index</replaceable>
440 <replaceable>/mount_point/remote_dir</replaceable>
442 <para>This command will allocate the sub-directory
443 <literal>remote_dir</literal> onto the MDT of index
444 <literal>mdt_index</literal>. For more information on adding additional MDTs
446 <literal>mdt_index</literal> see
447 <xref linkend='dbdoclet.addmdtindex' />.</para>
449 <para>An administrator can allocate remote sub-directories to separate
450 MDTs. Creating remote sub-directories in parent directories not hosted on
451 MDT0 is not recommended. This is because the failure of the parent MDT
452 will leave the namespace below it inaccessible. For this reason, by
453 default it is only possible to create remote sub-directories off MDT0. To
454 relax this restriction and enable remote sub-directories off any MDT, an
455 administrator must issue the following command on the MGS:
456 <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir=1</screen>
457 For Lustre filesystem 'scratch', the command executed is:
458 <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir=1</screen>
459 To verify the configuration setting execute the following command on any
461 <screen>mds# lctl get_param mdt.*.enable_remote_dir</screen></para>
463 <para condition='l28'>With Lustre software version 2.8, a new
464 tunable is available to allow users with a specific group ID to create
465 and delete remote and striped directories. This tunable is
466 <literal>enable_remote_dir_gid</literal>. For example, setting this
467 parameter to the 'wheel' or 'admin' group ID allows users with that GID
468 to create and delete remote and striped directories. Setting this
469 parameter to <literal>-1</literal> on MDT0 to permanently allow any
470 non-root users create and delete remote and striped directories.
471 On the MGS execute the following command:
472 <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir_gid=-1</screen>
473 For the Lustre filesystem 'scratch', the commands expands to:
474 <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1</screen>.
475 The change can be verified by executing the following command on every MDS:
476 <screen>mds# lctl get_param mdt.<replaceable>*</replaceable>.enable_remote_dir_gid</screen>
479 <section xml:id="dbdoclet.lfsmkdirdne2" condition='l28'>
482 <primary>operations</primary>
483 <secondary>striped directory</secondary>
486 <primary>operations</primary>
487 <secondary>mkdir</secondary>
490 <primary>operations</primary>
491 <secondary>setdirstripe</secondary>
494 <primary>striping</primary>
495 <secondary>metadata</secondary>
496 </indexterm>Creating a directory striped across multiple MDTs</title>
497 <para>The Lustre 2.8 DNE feature enables individual files in a given
498 directory to store their metadata on separate MDTs (a <emphasis>striped
499 directory</emphasis>) once additional MDTs have been added to the
500 filesystem, see <xref linkend="dbdoclet.addingamdt"/>.
501 The result of this is that metadata requests for
502 files in a striped directory are serviced by multiple MDTs and metadata
503 service load is distributed over all the MDTs that service a given
504 directory. By distributing metadata service load over multiple MDTs,
505 performance can be improved beyond the limit of single MDT
506 performance. Prior to the development of this feature all files in a
507 directory must record their metadata on a single MDT.</para>
508 <para>This command to stripe a directory over
509 <replaceable>mdt_count</replaceable> MDTs is:
513 <replaceable>mdt_count</replaceable>
514 <replaceable>/mount_point/new_directory</replaceable>
516 <para>The striped directory feature is most useful for distributing
517 single large directories (50k entries or more) across multiple MDTs,
518 since it incurs more overhead than non-striped directories.</para>
520 <section xml:id="dbdoclet.50438194_88980">
523 <primary>operations</primary>
524 <secondary>parameters</secondary>
525 </indexterm>Setting and Retrieving Lustre Parameters</title>
526 <para>Several options are available for setting parameters in
530 <para>When creating a file system, use mkfs.lustre. See
531 <xref linkend="dbdoclet.50438194_17237" />below.</para>
534 <para>When a server is stopped, use tunefs.lustre. See
535 <xref linkend="dbdoclet.50438194_55253" />below.</para>
538 <para>When the file system is running, use lctl to set or retrieve
539 Lustre parameters. See
540 <xref linkend="dbdoclet.50438194_51490" />and
541 <xref linkend="dbdoclet.50438194_63247" />below.</para>
544 <section xml:id="dbdoclet.50438194_17237">
545 <title>Setting Tunable Parameters with
546 <literal>mkfs.lustre</literal></title>
547 <para>When the file system is first formatted, parameters can simply be
549 <literal>--param</literal> option to the
550 <literal>mkfs.lustre</literal> command. For example:</para>
552 mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
554 <para>For more details about creating a file system,see
555 <xref linkend="configuringlustre" />. For more details about
556 <literal>mkfs.lustre</literal>, see
557 <xref linkend="systemconfigurationutilities" />.</para>
559 <section xml:id="dbdoclet.50438194_55253">
560 <title>Setting Parameters with
561 <literal>tunefs.lustre</literal></title>
562 <para>If a server (OSS or MDS) is stopped, parameters can be added to an
563 existing file system using the
564 <literal>--param</literal> option to the
565 <literal>tunefs.lustre</literal> command. For example:</para>
567 oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
570 <literal>tunefs.lustre</literal>, parameters are
571 <emphasis>additive</emphasis>-- new parameters are specified in addition
572 to old parameters, they do not replace them. To erase all old
573 <literal>tunefs.lustre</literal> parameters and just use newly-specified
574 parameters, run:</para>
576 mds# tunefs.lustre --erase-params --param=
577 <replaceable>new_parameters</replaceable>
579 <para>The tunefs.lustre command can be used to set any parameter settable
580 in a /proc/fs/lustre file and that has its own OBD device, so it can be
583 <replaceable>obdname|fsname</replaceable>.
584 <replaceable>obdtype</replaceable>.
585 <replaceable>proc_file_name</replaceable>=
586 <replaceable>value</replaceable></literal>. For example:</para>
588 mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
590 <para>For more details about
591 <literal>tunefs.lustre</literal>, see
592 <xref linkend="systemconfigurationutilities" />.</para>
594 <section xml:id="dbdoclet.50438194_51490">
595 <title>Setting Parameters with
596 <literal>lctl</literal></title>
597 <para>When the file system is running, the
598 <literal>lctl</literal> command can be used to set parameters (temporary
599 or permanent) and report current parameter values. Temporary parameters
600 are active as long as the server or client is not shut down. Permanent
601 parameters live through server and client reboots.</para>
603 <para>The lctl list_param command enables users to list all parameters
605 <xref linkend="dbdoclet.50438194_88217" />.</para>
607 <para>For more details about the
608 <literal>lctl</literal> command, see the examples in the sections below
610 <xref linkend="systemconfigurationutilities" />.</para>
612 <title>Setting Temporary Parameters</title>
614 <literal>lctl set_param</literal> to set temporary parameters on the
615 node where it is run. These parameters map to items in
616 <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
617 <literal>lctl set_param</literal> command uses this syntax:</para>
620 <replaceable>obdtype</replaceable>.
621 <replaceable>obdname</replaceable>.
622 <replaceable>proc_file_name</replaceable>=
623 <replaceable>value</replaceable>
625 <para>For example:</para>
627 # lctl set_param osc.*.max_dirty_mb=1024
628 osc.myth-OST0000-osc.max_dirty_mb=32
629 osc.myth-OST0001-osc.max_dirty_mb=32
630 osc.myth-OST0002-osc.max_dirty_mb=32
631 osc.myth-OST0003-osc.max_dirty_mb=32
632 osc.myth-OST0004-osc.max_dirty_mb=32
635 <section xml:id="dbdoclet.50438194_64195">
636 <title>Setting Permanent Parameters</title>
638 <literal>lctl conf_param</literal> command to set permanent parameters.
640 <literal>lctl conf_param</literal> command can be used to specify any
641 parameter settable in a
642 <literal>/proc/fs/lustre</literal> file, with its own OBD device. The
643 <literal>lctl conf_param</literal> command uses this syntax (same as the
645 <literal>mkfs.lustre</literal> and
646 <literal>tunefs.lustre</literal> commands):</para>
648 <replaceable>obdname|fsname</replaceable>.
649 <replaceable>obdtype</replaceable>.
650 <replaceable>proc_file_name</replaceable>=
651 <replaceable>value</replaceable>)
653 <para>Here are a few examples of
654 <literal>lctl conf_param</literal> commands:</para>
656 mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
657 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
658 $ lctl conf_param testfs.llite.max_read_ahead_mb=16
659 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M
660 $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
661 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
662 $ lctl conf_param testfs.sys.timeout=40
665 <para>Parameters specified with the
666 <literal>lctl conf_param</literal> command are set permanently in the
667 file system's configuration file on the MGS.</para>
670 <section xml:id="dbdoclet.setparamp" condition='l25'>
671 <title>Setting Permanent Parameters with lctl set_param -P</title>
673 <literal>lctl set_param -P</literal> to set parameters permanently. This
674 command must be issued on the MGS. The given parameter is set on every
676 <literal>lctl</literal> upcall. Parameters map to items in
677 <literal>/proc/{fs,sys}/{lnet,lustre}</literal>. The
678 <literal>lctl set_param</literal> command uses this syntax:</para>
681 <replaceable>obdtype</replaceable>.
682 <replaceable>obdname</replaceable>.
683 <replaceable>proc_file_name</replaceable>=
684 <replaceable>value</replaceable>
686 <para>For example:</para>
688 # lctl set_param -P osc.*.max_dirty_mb=1024
689 osc.myth-OST0000-osc.max_dirty_mb=32
690 osc.myth-OST0001-osc.max_dirty_mb=32
691 osc.myth-OST0002-osc.max_dirty_mb=32
692 osc.myth-OST0003-osc.max_dirty_mb=32
693 osc.myth-OST0004-osc.max_dirty_mb=32
696 <literal>-d</literal>(only with -P) option to delete permanent
697 parameter. Syntax:</para>
700 <replaceable>obdtype</replaceable>.
701 <replaceable>obdname</replaceable>.
702 <replaceable>proc_file_name</replaceable>
704 <para>For example:</para>
706 # lctl set_param -P -d osc.*.max_dirty_mb
709 <section xml:id="dbdoclet.50438194_88217">
710 <title>Listing Parameters</title>
711 <para>To list Lustre or LNet parameters that are available to set, use
713 <literal>lctl list_param</literal> command. For example:</para>
715 lctl list_param [-FR]
716 <replaceable>obdtype</replaceable>.
717 <replaceable>obdname</replaceable>
719 <para>The following arguments are available for the
720 <literal>lctl list_param</literal> command.</para>
722 <literal>-F</literal> Add '
723 <literal>/</literal>', '
724 <literal>@</literal>' or '
725 <literal>=</literal>' for directories, symlinks and writeable files,
728 <literal>-R</literal> Recursively lists all parameters under the
729 specified path</para>
730 <para>For example:</para>
732 oss# lctl list_param obdfilter.lustre-OST0000
735 <section xml:id="dbdoclet.50438194_63247">
736 <title>Reporting Current Parameter Values</title>
737 <para>To report current Lustre parameter values, use the
738 <literal>lctl get_param</literal> command with this syntax:</para>
741 <replaceable>obdtype</replaceable>.
742 <replaceable>obdname</replaceable>.
743 <replaceable>proc_file_name</replaceable>
745 <para>This example reports data on RPC service times.</para>
747 oss# lctl get_param -n ost.*.ost_io.timeouts
748 service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
750 <para>This example reports the amount of space this client has reserved
751 for writeback cache with each OST:</para>
753 client# lctl get_param osc.*.cur_grant_bytes
754 osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
755 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
756 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
757 osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
758 osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
763 <section xml:id="dbdoclet.50438194_41817">
766 <primary>operations</primary>
767 <secondary>failover</secondary>
768 </indexterm>Specifying NIDs and Failover</title>
769 <para>If a node has multiple network interfaces, it may have multiple NIDs,
770 which must all be identified so other nodes can choose the NID that is
771 appropriate for their network interfaces. Typically, NIDs are specified in
772 a list delimited by commas (
773 <literal>,</literal>). However, when failover nodes are specified, the NIDs
774 are delimited by a colon (
775 <literal>:</literal>) or by repeating a keyword such as
776 <literal>--mgsnode=</literal> or
777 <literal>--servicenode=</literal>).</para>
778 <para>To display the NIDs of all servers in networks configured to work
779 with the Lustre file system, run (while LNet is running):</para>
783 <para>In the example below,
784 <literal>mds0</literal> and
785 <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
787 <literal>oss0</literal> and
788 <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
790 <literal>mds0</literal> is 192.168.10.1, and for
791 <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
792 <literal>oss0</literal> and
793 <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
796 mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
797 --servicenode=192.168.10.2@tcp0 \
798 -–servicenode=192.168.10.1@tcp0 /dev/sda1
799 mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
800 oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \
801 --servicenode=192.168.10.21 --ost --index=0 \
802 --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \
804 oss0# mount -t lustre /dev/sdb /mnt/test/ost0
805 client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \
807 mds0# umount /mnt/mdt
808 mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
809 mds1# lctl get_param mdt.testfs-MDT0000.recovery_status
811 <para>Where multiple NIDs are specified separated by commas (for example,
812 <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
813 to the same host, and the Lustre software chooses the
814 <emphasis>best</emphasis> one for communication. When a pair of NIDs is
815 separated by a colon (for example,
816 <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
817 to two different hosts and are treated as a failover pair (the Lustre
818 software tries the first one, and if that fails, it tries the second
821 <literal>mkfs.lustre</literal> can be used to specify failover nodes.
822 Introduced in Lustre software release 2.0, the
823 <literal>--servicenode</literal> option is used to specify all service NIDs,
824 including those for primary nodes and failover nodes. When the
825 <literal>--servicenode</literal> option is used, the first service node to
826 load the target device becomes the primary service node, while nodes
827 corresponding to the other specified NIDs become failover locations for the
828 target device. An older option,
829 <literal>--failnode</literal>, specifies just the NIDS of failover nodes.
830 For more information about the
831 <literal>--servicenode</literal> and
832 <literal>--failnode</literal> options, see
833 <xref xmlns:xlink="http://www.w3.org/1999/xlink"
834 linkend="configuringfailover" />.</para>
836 <section xml:id="dbdoclet.50438194_70905">
839 <primary>operations</primary>
840 <secondary>erasing a file system</secondary>
841 </indexterm>Erasing a File System</title>
842 <para>If you want to erase a file system and permanently delete all the
843 data in the file system, run this command on your targets:</para>
845 $ "mkfs.lustre --reformat"
847 <para>If you are using a separate MGS and want to keep other file systems
848 defined on that MGS, then set the
849 <literal>writeconf</literal> flag on the MDT for that file system. The
850 <literal>writeconf</literal> flag causes the configuration logs to be
851 erased; they are regenerated the next time the servers start.</para>
853 <literal>writeconf</literal> flag on the MDT:</para>
856 <para>Unmount all clients/servers using this file system, run:</para>
862 <para>Permanently erase the file system and, presumably, replace it
863 with another file system, run:</para>
865 $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
866 <emphasis>{mdsdev}</emphasis>
870 <para>If you have a separate MGS (that you do not want to reformat),
872 <literal>--writeconf</literal> flag to
873 <literal>mkfs.lustre</literal> on the MDT, run:</para>
875 $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
876 <replaceable>mgs_nid</replaceable> --mdt --index=0
877 <replaceable>/dev/mds_device</replaceable>
882 <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
883 MGS as well, causing all configuration information to be lost; you can
884 start building your new file system. Nothing needs to be done with old
885 disks that will not be part of the new file system, just do not mount
889 <section xml:id="dbdoclet.50438194_16954">
892 <primary>operations</primary>
893 <secondary>reclaiming space</secondary>
894 </indexterm>Reclaiming Reserved Disk Space</title>
895 <para>All current Lustre installations run the ldiskfs file system
896 internally on service nodes. By default, ldiskfs reserves 5% of the disk
897 space to avoid file system fragmentation. In order to reclaim this space,
898 run the following command on your OSS for each OST in the file
901 tune2fs [-m reserved_blocks_percent] /dev/
902 <emphasis>{ostdev}</emphasis>
904 <para>You do not need to shut down Lustre before running this command or
905 restart it afterwards.</para>
907 <para>Reducing the space reservation can cause severe performance
908 degradation as the OST file system becomes more than 95% full, due to
909 difficulty in locating large areas of contiguous free space. This
910 performance degradation may persist even if the space usage drops below
911 95% again. It is recommended NOT to reduce the reserved disk space below
915 <section xml:id="dbdoclet.50438194_69998">
918 <primary>operations</primary>
919 <secondary>replacing an OST or MDS</secondary>
920 </indexterm>Replacing an Existing OST or MDT</title>
921 <para>To copy the contents of an existing OST to a new OST (or an old MDT
922 to a new MDT), follow the process for either OST/MDT backups in
923 <xref linkend='dbdoclet.backup_device' />or
924 <xref linkend='dbdoclet.backup_target_filesystem' />.
925 For more information on removing a MDT, see
926 <xref linkend='dbdoclet.rmremotedir' />.</para>
928 <section xml:id="dbdoclet.50438194_30872">
931 <primary>operations</primary>
932 <secondary>identifying OSTs</secondary>
933 </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
934 <para>Use this procedure to identify the file containing a given object on
938 <para>On the OST (as root), run
939 <literal>debugfs</literal> to display the file identifier (
940 <literal>FID</literal>) of the file associated with the object.</para>
941 <para>For example, if the object is
942 <literal>34976</literal> on
943 <literal>/dev/lustre/ost_test2</literal>, the debug command is:
945 # debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
947 <para>The command output is:
949 debugfs 1.42.3.wc3 (15-Aug-2012)
950 /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
951 Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000
952 Generation: 2393149953 Version: 0x0000002a:00005f81
953 User: 1000 Group: 1000 Size: 260096
954 File ACL: 0 Directory ACL: 0
955 Links: 1 Blockcount: 512
956 Fragment: Address: 0 Number: 0 Size: 0
957 ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
958 atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
959 mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
960 crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
961 Size of extra inode fields: 24
962 Extended attributes stored in inode body:
963 fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
964 00 00 00 00 00 00 00 00 " (32)
965 fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1
967 (0-64):4620544-4620607
971 <para>For Lustre software release 2.x file systems, the parent FID will
972 be of the form [0x200000400:0x122:0x0] and can be resolved directly
974 <literal>lfs fid2path [0x200000404:0x122:0x0]
975 /mnt/lustre</literal> command on any Lustre client, and the process is
979 <para>In this example the parent inode FID is an upgraded 1.x inode
980 (due to the first part of the FID being below 0x200000400), the MDT
982 <literal>0x24dab9</literal> and generation
983 <literal>0x3f0dfa6a</literal> and the pathname needs to be resolved
985 <literal>debugfs</literal>.</para>
988 <para>On the MDS (as root), use
989 <literal>debugfs</literal> to find the file associated with the
992 # debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
994 <para>Here is the command output:</para>
996 debugfs 1.42.3.wc2 (15-Aug-2012)
997 /dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
1000 2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
1004 <para>The command lists the inode and pathname associated with the
1008 <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
1009 take a long time to complete.</para>
1012 <para>To find the Lustre file from a disk LBA, follow the steps listed in
1013 the document at this URL:
1014 <link xl:href="http://smartmontools.sourceforge.net/badblockhowto.html">
1015 http://smartmontools.sourceforge.net/badblockhowto.html</link>. Then,
1016 follow the steps above to resolve the Lustre filename.</para>