1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="lustreoperations">
5 <title xml:id="lustreoperations.title">Lustre Operations</title>
6 <para>Once you have the Lustre file system up and running, you can use the
7 procedures in this section to perform these basic Lustre administration
9 <section xml:id="dbdoclet.50438194_42877">
12 <primary>operations</primary>
15 <primary>operations</primary>
16 <secondary>mounting by label</secondary>
17 </indexterm>Mounting by Label</title>
18 <para>The file system name is limited to 8 characters. We have encoded the
19 file system and target information in the disk label, so you can mount by
20 label. This allows system administrators to move disks around without
21 worrying about issues such as SCSI disk reordering or getting the
22 <literal>/dev/device</literal> wrong for a shared target. Soon, file system
23 naming will be made as fail-safe as possible. Currently, Linux disk labels
24 are limited to 16 characters. To identify the target within the file
25 system, 8 characters are reserved, leaving 8 characters for the file system
28 <replaceable>fsname</replaceable>-MDT0000 or
29 <replaceable>fsname</replaceable>-OST0a19
31 <para>To mount by label, use this command:</para>
34 <replaceable>file_system_label</replaceable>
35 <replaceable>/mount_point</replaceable>
37 <para>This is an example of mount-by-label:</para>
39 mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt
42 <para>Mount-by-label should NOT be used in a multi-path environment or
43 when snapshots are being created of the device, since multiple block
44 devices will have the same label.</para>
46 <para>Although the file system name is internally limited to 8 characters,
47 you can mount the clients at any mount point, so file system users are not
48 subjected to short names. Here is an example:</para>
50 client# mount -t lustre mds0@tcp0:/short
51 <replaceable>/dev/long_mountpoint_name</replaceable>
54 <section xml:id="dbdoclet.50438194_24122">
57 <primary>operations</primary>
58 <secondary>starting</secondary>
59 </indexterm>Starting Lustre</title>
60 <para>On the first start of a Lustre file system, the components must be
61 started in the following order:</para>
64 <para>Mount the MGT.</para>
66 <para>If a combined MGT/MDT is present, Lustre will correctly mount
67 the MGT and MDT automatically.</para>
71 <para>Mount the MDT.</para>
73 <para>Mount all MDTs if multiple MDTs are present.</para>
77 <para>Mount the OST(s).</para>
80 <para>Mount the client(s).</para>
84 <section xml:id="dbdoclet.50438194_84876">
87 <primary>operations</primary>
88 <secondary>mounting</secondary>
89 </indexterm>Mounting a Server</title>
90 <para>Starting a Lustre server is straightforward and only involves the
91 mount command. Lustre servers can be added to
92 <literal>/etc/fstab</literal>:</para>
96 <para>The mount command generates output similar to this:</para>
98 /dev/sda1 on /mnt/test/mdt type lustre (rw)
99 /dev/sda2 on /mnt/test/ost0 type lustre (rw)
100 192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)
102 <para>In this example, the MDT, an OST (ost0) and file system (testfs) are
105 LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
106 LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
108 <para>In general, it is wise to specify noauto and let your
109 high-availability (HA) package manage when to mount the device. If you are
110 not using failover, make sure that networking has been started before
111 mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE
112 Linux Enterprise Server, Debian operating system (and perhaps others), use
114 <literal>_netdev</literal> flag to ensure that these disks are mounted after
115 the network is up.</para>
116 <para>We are mounting by disk label here. The label of a device can be read
118 <literal>e2label</literal>. The label of a newly-formatted Lustre server
120 <literal>FFFF</literal> if the
121 <literal>--index</literal> option is not specified to
122 <literal>mkfs.lustre</literal>, meaning that it has yet to be assigned. The
123 assignment takes place when the server is first started, and the disk label
124 is updated. It is recommended that the
125 <literal>--index</literal> option always be used, which will also ensure
126 that the label is set at format time.</para>
128 <para>Do not do this when the client and OSS are on the same node, as
129 memory pressure between the client and OSS can lead to deadlocks.</para>
132 <para>Mount-by-label should NOT be used in a multi-path
136 <section xml:id="dbdoclet.shutdownLustre">
139 <primary>operations</primary>
140 <secondary>shutdownLustre</secondary>
141 </indexterm>Stopping the Filesystem</title>
142 <para>A complete Lustre filesystem shutdown occurs by unmounting all
143 clients and servers in the order shown below. Please note that unmounting
144 a block device causes the Lustre software to be shut down on that node.
146 <note><para>Please note that the <literal>-a -t lustre</literal> in the
147 commands below is not the name of a filesystem, but rather is
148 specifying to unmount all entries in /etc/mtab that are of type
149 <literal>lustre</literal></para></note>
151 <listitem><para>Unmount the clients</para>
152 <para>On each client node, unmount the filesystem on that client
153 using the <literal>umount</literal> command:</para>
154 <para><literal>umount -a -t lustre</literal></para>
155 <para>The example below shows the unmount of the
156 <literal>testfs</literal> filesystem on a client node:</para>
157 <para><screen>[root@client1 ~]# mount |grep testfs
158 XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs)
160 [root@client1 ~]# umount -a -t lustre
161 [154523.177714] Lustre: Unmounted testfs-client</screen></para>
163 <listitem><para>Unmount the MDT and MGT</para>
164 <para>On the MGS and MDS node(s), run the
165 <literal>umount</literal> command:</para>
166 <para><literal>umount -a -t lustre</literal></para>
167 <para>The example below shows the unmount of the MDT and MGT for
168 the <literal>testfs</literal> filesystem on a combined MGS/MDS:
170 <para><screen>[root@mds1 ~]# mount |grep lustre
171 /dev/sda on /mnt/mgt type lustre (ro)
172 /dev/sdb on /mnt/mdt type lustre (ro)
174 [root@mds1 ~]# umount -a -t lustre
175 [155263.566230] Lustre: Failing over testfs-MDT0000
176 [155263.775355] Lustre: server umount testfs-MDT0000 complete
177 [155269.843862] Lustre: server umount MGS complete</screen></para>
178 <para>For a seperate MGS and MDS, the same command is used, first on
179 the MDS and then followed by the MGS.</para>
181 <listitem><para>Unmount all the OSTs</para>
182 <para>On each OSS node, use the <literal>umount</literal> command:
184 <para><literal>umount -a -t lustre</literal></para>
185 <para>The example below shows the unmount of all OSTs for the
186 <literal>testfs</literal> filesystem on server
187 <literal>OSS1</literal>:
189 <para><screen>[root@oss1 ~]# mount |grep lustre
190 /dev/sda on /mnt/ost0 type lustre (ro)
191 /dev/sdb on /mnt/ost1 type lustre (ro)
192 /dev/sdc on /mnt/ost2 type lustre (ro)
194 [root@oss1 ~]# umount -a -t lustre
195 [155336.491445] Lustre: Failing over testfs-OST0002
196 [155336.556752] Lustre: server umount testfs-OST0002 complete</screen></para>
199 <para>For unmount command syntax for a single OST, MDT, or MGT target
200 please refer to <xref linkend="dbdoclet.umountTarget"/></para>
202 <section xml:id="dbdoclet.umountTarget">
205 <primary>operations</primary>
206 <secondary>unmounting</secondary>
207 </indexterm>Unmounting a Specific Target on a Server</title>
208 <para>To stop a Lustre OST, MDT, or MGT , use the
210 <replaceable>/mount_point</replaceable></literal> command.</para>
211 <para>The example below stops an OST, <literal>ost0</literal>, on mount
212 point <literal>/mnt/ost0</literal> for the <literal>testfs</literal>
214 <screen>[root@oss1 ~]# umount /mnt/ost0
215 [ 385.142264] Lustre: Failing over testfs-OST0000
216 [ 385.210810] Lustre: server umount testfs-OST0000 complete</screen>
217 <para>Gracefully stopping a server with the
218 <literal>umount</literal> command preserves the state of the connected
219 clients. The next time the server is started, it waits for clients to
220 reconnect, and then goes through the recovery procedure.</para>
222 <literal>-f</literal>) flag is used, then the server evicts all clients and
223 stops WITHOUT recovery. Upon restart, the server does not wait for
224 recovery. Any currently connected clients receive I/O errors until they
227 <para>If you are using loopback devices, use the
228 <literal>-d</literal> flag. This flag cleans up loop devices and can
229 always be safely specified.</para>
232 <section xml:id="failover_ost">
235 <primary>operations</primary>
236 <secondary>failover</secondary>
237 </indexterm>Specifying Failout/Failover Mode for OSTs</title>
238 <para>In a Lustre file system, an OST that has become unreachable because
239 it fails, is taken off the network, or is unmounted can be handled in one
244 <literal>failout</literal> mode, Lustre clients immediately receive
245 errors (EIOs) after a timeout, instead of waiting for the OST to
250 <literal>failover</literal> mode, Lustre clients wait for the OST to
254 <para>By default, the Lustre file system uses
255 <literal>failover</literal> mode for OSTs. To specify
256 <literal>failout</literal> mode instead, use the
257 <literal>--param="failover.mode=failout"</literal> option as shown below
258 (entered on one line):</para>
260 oss# mkfs.lustre --fsname=
261 <replaceable>fsname</replaceable> --mgsnode=
262 <replaceable>mgs_NID</replaceable> --param=failover.mode=failout
264 <replaceable>ost_index</replaceable>
265 <replaceable>/dev/ost_block_device</replaceable>
267 <para>In the example below,
268 <literal>failout</literal> mode is specified for the OSTs on the MGS
269 <literal>mds0</literal> in the file system
270 <literal>testfs</literal>(entered on one line).</para>
272 oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout
273 --ost --index=3 /dev/sdb
276 <para>Before running this command, unmount all OSTs that will be affected
278 <literal>failover</literal>/
279 <literal>failout</literal> mode.</para>
282 <para>After initial file system configuration, use the
283 <literal>tunefs.lustre</literal> utility to change the mode. For example,
285 <literal>failout</literal> mode, run:</para>
288 $ tunefs.lustre --param failover.mode=failout
289 <replaceable>/dev/ost_device</replaceable>
294 <section xml:id="dbdoclet.degraded_ost">
297 <primary>operations</primary>
298 <secondary>degraded OST RAID</secondary>
299 </indexterm>Handling Degraded OST RAID Arrays</title>
300 <para>Lustre includes functionality that notifies Lustre if an external
301 RAID array has degraded performance (resulting in reduced overall file
302 system performance), either because a disk has failed and not been
303 replaced, or because a disk was replaced and is undergoing a rebuild. To
304 avoid a global performance slowdown due to a degraded OST, the MDS can
305 avoid the OST for new object allocation if it is notified of the degraded
307 <para>A parameter for each OST, called
308 <literal>degraded</literal>, specifies whether the OST is running in
309 degraded mode or not.</para>
310 <para>To mark the OST as degraded, use:</para>
312 lctl set_param obdfilter.{OST_name}.degraded=1
314 <para>To mark that the OST is back in normal operation, use:</para>
316 lctl set_param obdfilter.{OST_name}.degraded=0
318 <para>To determine if OSTs are currently in degraded mode, use:</para>
320 lctl get_param obdfilter.*.degraded
322 <para>If the OST is remounted due to a reboot or other condition, the flag
324 <literal>0</literal>.</para>
325 <para>It is recommended that this be implemented by an automated script
326 that monitors the status of individual RAID devices, such as MD-RAID's
327 <literal>mdadm(8)</literal> command with the <literal>--monitor</literal>
328 option to mark an affected device degraded or restored.</para>
330 <section xml:id="dbdoclet.50438194_88063">
333 <primary>operations</primary>
334 <secondary>multiple file systems</secondary>
335 </indexterm>Running Multiple Lustre File Systems</title>
336 <para>Lustre supports multiple file systems provided the combination of
337 <literal>NID:fsname</literal> is unique. Each file system must be allocated
338 a unique name during creation with the
339 <literal>--fsname</literal> parameter. Unique names for file systems are
340 enforced if a single MGS is present. If multiple MGSs are present (for
341 example if you have an MGS on every MDS) the administrator is responsible
342 for ensuring file system names are unique. A single MGS and unique file
343 system names provides a single point of administration and allows commands
344 to be issued against the file system even if it is not mounted.</para>
345 <para>Lustre supports multiple file systems on a single MGS. With a single
346 MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs
347 to co-exist. For example, multiple MGSs will be necessary if multiple file
348 systems on different Lustre software versions are to be concurrently
349 available. With multiple MGSs additional care must be taken to ensure file
350 system names are unique. Each file system should have a unique fsname among
351 all systems that may interoperate in the future.</para>
352 <para>By default, the
353 <literal>mkfs.lustre</literal> command creates a file system named
354 <literal>lustre</literal>. To specify a different file system name (limited
355 to 8 characters) at format time, use the
356 <literal>--fsname</literal> option:</para>
359 mkfs.lustre --fsname=
360 <replaceable>file_system_name</replaceable>
364 <para>The MDT, OSTs and clients in the new file system must use the same
365 file system name (prepended to the device name). For example, for a new
367 <literal>foo</literal>, the MDT and two OSTs would be named
368 <literal>foo-MDT0000</literal>,
369 <literal>foo-OST0000</literal>, and
370 <literal>foo-OST0001</literal>.</para>
372 <para>To mount a client on the file system, run:</para>
374 client# mount -t lustre
375 <replaceable>mgsnode</replaceable>:
376 <replaceable>/new_fsname</replaceable>
377 <replaceable>/mount_point</replaceable>
379 <para>For example, to mount a client on file system foo at mount point
380 /mnt/foo, run:</para>
382 client# mount -t lustre mgsnode:/foo /mnt/foo
385 <para>If a client(s) will be mounted on several file systems, add the
387 <literal>/etc/xattr.conf</literal> file to avoid problems when files are
388 moved between the file systems:
389 <literal>lustre.* skip</literal></para>
392 <para>To ensure that a new MDT is added to an existing MGS create the MDT
394 <literal>--mdt --mgsnode=
395 <replaceable>mgs_NID</replaceable></literal>.</para>
397 <para>A Lustre installation with two file systems (
398 <literal>foo</literal> and
399 <literal>bar</literal>) could look like this, where the MGS node is
400 <literal>mgsnode@tcp0</literal> and the mount points are
401 <literal>/mnt/foo</literal> and
402 <literal>/mnt/bar</literal>.</para>
404 mgsnode# mkfs.lustre --mgs /dev/sda
405 mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0
407 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0
409 ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1
411 mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0
413 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0
415 ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1
418 <para>To mount a client on file system foo at mount point
419 <literal>/mnt/foo</literal>, run:</para>
421 client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo
423 <para>To mount a client on file system bar at mount point
424 <literal>/mnt/bar</literal>, run:</para>
426 client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
429 <section xml:id="dbdoclet.lfsmkdir">
432 <primary>operations</primary>
433 <secondary>remote directory</secondary>
434 </indexterm>Creating a sub-directory on a specific MDT</title>
435 <para>It is possible to create individual directories, along with its
436 files and sub-directories, to be stored on specific MDTs. To create
437 a sub-directory on a given MDT use the command:
440 client# lfs mkdir –i
441 <replaceable>mdt_index</replaceable>
442 <replaceable>/mount_point/remote_dir</replaceable>
444 <para>This command will allocate the sub-directory
445 <literal>remote_dir</literal> onto the MDT of index
446 <literal>mdt_index</literal>. For more information on adding additional MDTs
448 <literal>mdt_index</literal> see
449 <xref linkend='dbdoclet.addmdtindex' />.</para>
451 <para>An administrator can allocate remote sub-directories to separate
452 MDTs. Creating remote sub-directories in parent directories not hosted on
453 MDT0000 is not recommended. This is because the failure of the parent MDT
454 will leave the namespace below it inaccessible. For this reason, by
455 default it is only possible to create remote sub-directories off MDT0000.
456 To relax this restriction and enable remote sub-directories off any MDT,
457 an administrator must issue the following command on the MGS:
458 <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir=1</screen>
459 For Lustre filesystem 'scratch', the command executed is:
460 <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir=1</screen>
461 To verify the configuration setting execute the following command on any
463 <screen>mds# lctl get_param mdt.*.enable_remote_dir</screen></para>
465 <para condition='l28'>With Lustre software version 2.8, a new
466 tunable is available to allow users with a specific group ID to create
467 and delete remote and striped directories. This tunable is
468 <literal>enable_remote_dir_gid</literal>. For example, setting this
469 parameter to the 'wheel' or 'admin' group ID allows users with that GID
470 to create and delete remote and striped directories. Setting this
471 parameter to <literal>-1</literal> on MDT0000 to permanently allow any
472 non-root users create and delete remote and striped directories.
473 On the MGS execute the following command:
474 <screen>mgs# lctl conf_param <replaceable>fsname</replaceable>.mdt.enable_remote_dir_gid=-1</screen>
475 For the Lustre filesystem 'scratch', the commands expands to:
476 <screen>mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1</screen>.
477 The change can be verified by executing the following command on every MDS:
478 <screen>mds# lctl get_param mdt.<replaceable>*</replaceable>.enable_remote_dir_gid</screen>
481 <section xml:id="dbdoclet.lfsmkdirdne2" condition='l28'>
484 <primary>operations</primary>
485 <secondary>striped directory</secondary>
488 <primary>operations</primary>
489 <secondary>mkdir</secondary>
492 <primary>operations</primary>
493 <secondary>setdirstripe</secondary>
496 <primary>striping</primary>
497 <secondary>metadata</secondary>
498 </indexterm>Creating a directory striped across multiple MDTs</title>
499 <para>The Lustre 2.8 DNE feature enables individual files in a given
500 directory to store their metadata on separate MDTs (a <emphasis>striped
501 directory</emphasis>) once additional MDTs have been added to the
502 filesystem, see <xref linkend="lustremaint.adding_new_mdt"/>.
503 The result of this is that metadata requests for
504 files in a striped directory are serviced by multiple MDTs and metadata
505 service load is distributed over all the MDTs that service a given
506 directory. By distributing metadata service load over multiple MDTs,
507 performance can be improved beyond the limit of single MDT
508 performance. Prior to the development of this feature all files in a
509 directory must record their metadata on a single MDT.</para>
510 <para>This command to stripe a directory over
511 <replaceable>mdt_count</replaceable> MDTs is:
515 <replaceable>mdt_count</replaceable>
516 <replaceable>/mount_point/new_directory</replaceable>
518 <para>The striped directory feature is most useful for distributing
519 single large directories (50k entries or more) across multiple MDTs,
520 since it incurs more overhead than non-striped directories.</para>
521 <section xml:id="dbdoclet.lfsmkdirbyspace" condition='l2D'>
522 <title>Directory creation by space/inode usage</title>
523 <para>If the starting MDT is not specified when creating a new directory,
524 this directory and its stripes will be distributed on MDTs by space usage.
525 For example the following will create a directory and its stripes on MDTs
526 with balanced space usage:</para>
527 <screen>lfs mkdir -c 2 <dir1></screen>
528 <para>Alternatively, if a default directory stripe is set on a directory,
529 the subsequent syscall <literal>mkdir</literal> under
530 <literal><dir1></literal> will have the same effect:
531 <screen>lfs setdirstripe -D -c 2 <dir1></screen></para>
532 <para>The policy is:</para>
534 <listitem><para>If free inodes/blocks on all MDT are almost the same,
535 i.e. <literal>max_inodes_avail * 84% < min_inodes_avail</literal> and
536 <literal>max_blocks_avail * 84% < min_blocks_avail</literal>, then
537 choose MDT roundrobin.</para></listitem>
538 <listitem><para>Otherwise, create more subdirectories on MDTs with more
539 free inodes/blocks.</para></listitem>
543 <section xml:id="dbdoclet.50438194_88980">
546 <primary>operations</primary>
547 <secondary>parameters</secondary>
548 </indexterm>Setting and Retrieving Lustre Parameters</title>
549 <para>Several options are available for setting parameters in
553 <para>When creating a file system, use mkfs.lustre. See
554 <xref linkend="dbdoclet.50438194_17237" />below.</para>
557 <para>When a server is stopped, use tunefs.lustre. See
558 <xref linkend="dbdoclet.50438194_55253" />below.</para>
561 <para>When the file system is running, use lctl to set or retrieve
562 Lustre parameters. See
563 <xref linkend="dbdoclet.50438194_51490" />and
564 <xref linkend="dbdoclet.50438194_63247" />below.</para>
567 <section xml:id="dbdoclet.50438194_17237">
568 <title>Setting Tunable Parameters with
569 <literal>mkfs.lustre</literal></title>
570 <para>When the file system is first formatted, parameters can simply be
572 <literal>--param</literal> option to the
573 <literal>mkfs.lustre</literal> command. For example:</para>
575 mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda
577 <para>For more details about creating a file system,see
578 <xref linkend="configuringlustre" />. For more details about
579 <literal>mkfs.lustre</literal>, see
580 <xref linkend="systemconfigurationutilities" />.</para>
582 <section xml:id="dbdoclet.50438194_55253">
583 <title>Setting Parameters with
584 <literal>tunefs.lustre</literal></title>
585 <para>If a server (OSS or MDS) is stopped, parameters can be added to an
586 existing file system using the
587 <literal>--param</literal> option to the
588 <literal>tunefs.lustre</literal> command. For example:</para>
590 oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda
593 <literal>tunefs.lustre</literal>, parameters are
594 <emphasis>additive</emphasis>-- new parameters are specified in addition
595 to old parameters, they do not replace them. To erase all old
596 <literal>tunefs.lustre</literal> parameters and just use newly-specified
597 parameters, run:</para>
599 mds# tunefs.lustre --erase-params --param=
600 <replaceable>new_parameters</replaceable>
602 <para>The tunefs.lustre command can be used to set any parameter settable
603 via <literal>lctl conf_param</literal> and that has its own OBD device,
604 so it can be specified as
606 <replaceable>obdname|fsname</replaceable>.
607 <replaceable>obdtype</replaceable>.
608 <replaceable>proc_file_name</replaceable>=
609 <replaceable>value</replaceable></literal>. For example:</para>
611 mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
613 <para>For more details about
614 <literal>tunefs.lustre</literal>, see
615 <xref linkend="systemconfigurationutilities" />.</para>
617 <section xml:id="dbdoclet.50438194_51490">
618 <title>Setting Parameters with
619 <literal>lctl</literal></title>
620 <para>When the file system is running, the
621 <literal>lctl</literal> command can be used to set parameters (temporary
622 or permanent) and report current parameter values. Temporary parameters
623 are active as long as the server or client is not shut down. Permanent
624 parameters live through server and client reboots.</para>
626 <para>The <literal>lctl list_param</literal> command enables users to
627 list all parameters that can be set. See
628 <xref linkend="dbdoclet.50438194_88217" />.</para>
630 <para>For more details about the
631 <literal>lctl</literal> command, see the examples in the sections below
633 <xref linkend="systemconfigurationutilities" />.</para>
635 <title>Setting Temporary Parameters</title>
637 <literal>lctl set_param</literal> to set temporary parameters on the
638 node where it is run. These parameters internally map to corresponding
639 items in the kernel <literal>/proc/{fs,sys}/{lnet,lustre}</literal> and
640 <literal>/sys/{fs,kernel/debug}/lustre</literal> virtual filesystems.
641 However, since the mapping between a particular parameter name and the
642 underlying virtual pathname may change, it is <emphasis>not</emphasis>
643 recommended to access the virtual pathname directly. The
644 <literal>lctl set_param</literal> command uses this syntax:</para>
646 lctl set_param [-n] [-P]
647 <replaceable>obdtype</replaceable>.
648 <replaceable>obdname</replaceable>.
649 <replaceable>proc_file_name</replaceable>=
650 <replaceable>value</replaceable>
652 <para>For example:</para>
654 # lctl set_param osc.*.max_dirty_mb=1024
655 osc.myth-OST0000-osc.max_dirty_mb=32
656 osc.myth-OST0001-osc.max_dirty_mb=32
657 osc.myth-OST0002-osc.max_dirty_mb=32
658 osc.myth-OST0003-osc.max_dirty_mb=32
659 osc.myth-OST0004-osc.max_dirty_mb=32
662 <section xml:id="dbdoclet.50438194_64195">
663 <title>Setting Permanent Parameters</title>
664 <para>Use <literal>lctl set_param -P</literal> or
665 <literal>lctl conf_param</literal> command to set permanent parameters.
667 <literal>lctl conf_param</literal> command can be used to specify any
668 settable parameter with its own OBD device. The
669 <literal>lctl conf_param</literal> command uses the following syntax
670 (the same as the <literal>mkfs.lustre</literal> and
671 <literal>tunefs.lustre</literal> commands):</para>
673 <replaceable>obdname|fsname</replaceable>.
674 <replaceable>obdtype</replaceable>.
675 <replaceable>proc_file_name</replaceable>=
676 <replaceable>value</replaceable>)
678 <note><para>The <literal>lctl conf_param</literal> and
679 <literal>lctl set_param</literal> syntax is <emphasis>not</emphasis>
680 the same.</para></note>
681 <para>Here are a few examples of
682 <literal>lctl conf_param</literal> commands:</para>
684 mgs# lctl conf_param testfs-MDT0000.sys.timeout=40
685 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE
686 $ lctl conf_param testfs.llite.max_read_ahead_mb=16
687 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M
688 $ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15
689 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15
690 $ lctl conf_param testfs.sys.timeout=40
693 <para>Parameters specified with the
694 <literal>lctl conf_param</literal> command are set permanently in the
695 file system's configuration file on the MGS.</para>
698 <section xml:id="dbdoclet.setparamp" condition='l25'>
699 <title>Setting Permanent Parameters with lctl set_param -P</title>
700 <para>The <literal>lctl set_param -P</literal> command can also
701 set parameters permanently using the same syntax as
702 <literal>lctl set_param</literal> and <literal>lctl
703 get_param</literal> commands. This command must be issued on the MGS.
704 The given parameter is set on every host using
705 <literal>lctl</literal> upcall. The <literal>lctl set_param</literal>
706 command uses the following syntax:</para>
709 <replaceable>obdtype</replaceable>.
710 <replaceable>obdname</replaceable>.
711 <replaceable>proc_file_name</replaceable>=
712 <replaceable>value</replaceable>
714 <para>For example:</para>
716 # lctl set_param -P osc.*.max_dirty_mb=1024
717 osc.myth-OST0000-osc.max_dirty_mb=32
718 osc.myth-OST0001-osc.max_dirty_mb=32
719 osc.myth-OST0002-osc.max_dirty_mb=32
720 osc.myth-OST0003-osc.max_dirty_mb=32
721 osc.myth-OST0004-osc.max_dirty_mb=32
724 <literal>-d</literal>(only with -P) option to delete permanent
725 parameter. Syntax:</para>
728 <replaceable>obdtype</replaceable>.
729 <replaceable>obdname</replaceable>.
730 <replaceable>parameter_name</replaceable>
732 <para>For example:</para>
734 # lctl set_param -P -d osc.*.max_dirty_mb
736 <note condition='l2c'><para>Starting in Lustre 2.12, there is
737 <literal>lctl get_param</literal> command can provide
738 <emphasis>tab completion</emphasis> when using an interactive shell
739 with <literal>bash-completion</literal> installed. This simplifies
740 the use of <literal>get_param</literal> significantly, since it
741 provides an interactive list of available parameters.
744 <section xml:id="dbdoclet.50438194_88217">
745 <title>Listing Parameters</title>
746 <para>To list Lustre or LNet parameters that are available to set, use
748 <literal>lctl list_param</literal> command. For example:</para>
750 lctl list_param [-FR]
751 <replaceable>obdtype</replaceable>.
752 <replaceable>obdname</replaceable>
754 <para>The following arguments are available for the
755 <literal>lctl list_param</literal> command.</para>
757 <literal>-F</literal> Add '
758 <literal>/</literal>', '
759 <literal>@</literal>' or '
760 <literal>=</literal>' for directories, symlinks and writeable files,
763 <literal>-R</literal> Recursively lists all parameters under the
764 specified path</para>
765 <para>For example:</para>
767 oss# lctl list_param obdfilter.lustre-OST0000
770 <section xml:id="dbdoclet.50438194_63247">
771 <title>Reporting Current Parameter Values</title>
772 <para>To report current Lustre parameter values, use the
773 <literal>lctl get_param</literal> command with this syntax:</para>
776 <replaceable>obdtype</replaceable>.
777 <replaceable>obdname</replaceable>.
778 <replaceable>proc_file_name</replaceable>
780 <note condition='l2c'><para>Starting in Lustre 2.12, there is
781 <literal>lctl get_param</literal> command can provide
782 <emphasis>tab completion</emphasis> when using an interactive shell
783 with <literal>bash-completion</literal> installed. This simplifies
784 the use of <literal>get_param</literal> significantly, since it
785 provides an interactive list of available parameters.
787 <para>This example reports data on RPC service times.</para>
789 oss# lctl get_param -n ost.*.ost_io.timeouts
790 service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1
792 <para>This example reports the amount of space this client has reserved
793 for writeback cache with each OST:</para>
795 client# lctl get_param osc.*.cur_grant_bytes
796 osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152
797 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304
798 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112
799 osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152
800 osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
805 <section xml:id="failover_nids">
808 <primary>operations</primary>
809 <secondary>failover</secondary>
810 </indexterm>Specifying NIDs and Failover</title>
811 <para>If a node has multiple network interfaces, it may have multiple NIDs,
812 which must all be identified so other nodes can choose the NID that is
813 appropriate for their network interfaces. Typically, NIDs are specified in
814 a list delimited by commas (
815 <literal>,</literal>). However, when failover nodes are specified, the NIDs
816 are delimited by a colon (
817 <literal>:</literal>) or by repeating a keyword such as
818 <literal>--mgsnode=</literal> or
819 <literal>--servicenode=</literal>).</para>
820 <para>To display the NIDs of all servers in networks configured to work
821 with the Lustre file system, run (while LNet is running):</para>
825 <para>In the example below,
826 <literal>mds0</literal> and
827 <literal>mds1</literal> are configured as a combined MGS/MDT failover pair
829 <literal>oss0</literal> and
830 <literal>oss1</literal> are configured as an OST failover pair. The Ethernet
832 <literal>mds0</literal> is 192.168.10.1, and for
833 <literal>mds1</literal> is 192.168.10.2. The Ethernet addresses for
834 <literal>oss0</literal> and
835 <literal>oss1</literal> are 192.168.10.20 and 192.168.10.21
838 mds0# mkfs.lustre --fsname=testfs --mdt --mgs \
839 --servicenode=192.168.10.2@tcp0 \
840 -–servicenode=192.168.10.1@tcp0 /dev/sda1
841 mds0# mount -t lustre /dev/sda1 /mnt/test/mdt
842 oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \
843 --servicenode=192.168.10.21 --ost --index=0 \
844 --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \
846 oss0# mount -t lustre /dev/sdb /mnt/test/ost0
847 client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \
849 mds0# umount /mnt/mdt
850 mds1# mount -t lustre /dev/sda1 /mnt/test/mdt
851 mds1# lctl get_param mdt.testfs-MDT0000.recovery_status
853 <para>Where multiple NIDs are specified separated by commas (for example,
854 <literal>10.67.73.200@tcp,192.168.10.1@tcp</literal>), the two NIDs refer
855 to the same host, and the Lustre software chooses the
856 <emphasis>best</emphasis> one for communication. When a pair of NIDs is
857 separated by a colon (for example,
858 <literal>10.67.73.200@tcp:10.67.73.201@tcp</literal>), the two NIDs refer
859 to two different hosts and are treated as a failover pair (the Lustre
860 software tries the first one, and if that fails, it tries the second
863 <literal>mkfs.lustre</literal> can be used to specify failover nodes. The
864 <literal>--servicenode</literal> option is used to specify all service NIDs,
865 including those for primary nodes and failover nodes. When the
866 <literal>--servicenode</literal> option is used, the first service node to
867 load the target device becomes the primary service node, while nodes
868 corresponding to the other specified NIDs become failover locations for the
869 target device. An older option, <literal>--failnode</literal>, specifies
870 just the NIDs of failover nodes. For more information about the
871 <literal>--servicenode</literal> and
872 <literal>--failnode</literal> options, see
873 <xref xmlns:xlink="http://www.w3.org/1999/xlink"
874 linkend="configuringfailover" />.</para>
876 <section xml:id="dbdoclet.50438194_70905">
879 <primary>operations</primary>
880 <secondary>erasing a file system</secondary>
881 </indexterm>Erasing a File System</title>
882 <para>If you want to erase a file system and permanently delete all the
883 data in the file system, run this command on your targets:</para>
885 $ "mkfs.lustre --reformat"
887 <para>If you are using a separate MGS and want to keep other file systems
888 defined on that MGS, then set the
889 <literal>writeconf</literal> flag on the MDT for that file system. The
890 <literal>writeconf</literal> flag causes the configuration logs to be
891 erased; they are regenerated the next time the servers start.</para>
893 <literal>writeconf</literal> flag on the MDT:</para>
896 <para>Unmount all clients/servers using this file system, run:</para>
902 <para>Permanently erase the file system and, presumably, replace it
903 with another file system, run:</para>
905 $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/
906 <emphasis>{mdsdev}</emphasis>
910 <para>If you have a separate MGS (that you do not want to reformat),
912 <literal>--writeconf</literal> flag to
913 <literal>mkfs.lustre</literal> on the MDT, run:</para>
915 $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=
916 <replaceable>mgs_nid</replaceable> --mdt --index=0
917 <replaceable>/dev/mds_device</replaceable>
922 <para>If you have a combined MGS/MDT, reformatting the MDT reformats the
923 MGS as well, causing all configuration information to be lost; you can
924 start building your new file system. Nothing needs to be done with old
925 disks that will not be part of the new file system, just do not mount
929 <section xml:id="dbdoclet.50438194_16954">
932 <primary>operations</primary>
933 <secondary>reclaiming space</secondary>
934 </indexterm>Reclaiming Reserved Disk Space</title>
935 <para>All current Lustre installations run the ldiskfs file system
936 internally on service nodes. By default, ldiskfs reserves 5% of the disk
937 space to avoid file system fragmentation. In order to reclaim this space,
938 run the following command on your OSS for each OST in the file
941 tune2fs [-m reserved_blocks_percent] /dev/
942 <emphasis>{ostdev}</emphasis>
944 <para>You do not need to shut down Lustre before running this command or
945 restart it afterwards.</para>
947 <para>Reducing the space reservation can cause severe performance
948 degradation as the OST file system becomes more than 95% full, due to
949 difficulty in locating large areas of contiguous free space. This
950 performance degradation may persist even if the space usage drops below
951 95% again. It is recommended NOT to reduce the reserved disk space below
955 <section xml:id="dbdoclet.50438194_69998">
958 <primary>operations</primary>
959 <secondary>replacing an OST or MDS</secondary>
960 </indexterm>Replacing an Existing OST or MDT</title>
961 <para>To copy the contents of an existing OST to a new OST (or an old MDT
962 to a new MDT), follow the process for either OST/MDT backups in
963 <xref linkend='dbdoclet.backup_device' />or
964 <xref linkend='backup_fs_level' />.
965 For more information on removing a MDT, see
966 <xref linkend='lustremaint.rmremotedir' />.</para>
968 <section xml:id="dbdoclet.50438194_30872">
971 <primary>operations</primary>
972 <secondary>identifying OSTs</secondary>
973 </indexterm>Identifying To Which Lustre File an OST Object Belongs</title>
974 <para>Use this procedure to identify the file containing a given object on
978 <para>On the OST (as root), run
979 <literal>debugfs</literal> to display the file identifier (
980 <literal>FID</literal>) of the file associated with the object.</para>
981 <para>For example, if the object is
982 <literal>34976</literal> on
983 <literal>/dev/lustre/ost_test2</literal>, the debug command is:
985 # debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2
987 <para>The command output is:
989 debugfs 1.45.6.wc1 (20-Mar-2020)
990 /dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps
991 Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000
992 Generation: 2393149953 Version: 0x0000002a:00005f81
993 User: 1000 Group: 1000 Size: 260096
994 File ACL: 0 Directory ACL: 0
995 Links: 1 Blockcount: 512
996 Fragment: Address: 0 Number: 0 Size: 0
997 ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
998 atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
999 mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009
1000 crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009
1001 Size of extra inode fields: 24
1002 Extended attributes stored in inode body:
1003 fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000
1004 00 00 00 00 00 00 00 00 " (32)
1005 fid: objid=34976 seq=0 parent=[0x200000400:0x122:0x0] stripe=1
1007 (0-64):4620544-4620607
1011 <para>The parent FID will be of the form
1012 <literal>[0x200000400:0x122:0x0]</literal> and can be resolved directly
1013 using the command <literal>lfs fid2path [0x200000404:0x122:0x0]
1014 /mnt/lustre</literal> on any Lustre client, and the process is
1018 <para>In cases of an upgraded 1.x inode (if the first part of the
1019 FID is below 0x200000400), the MDT inode number is
1020 <literal>0x24dab9</literal> and generation
1021 <literal>0x3f0dfa6a</literal> and the pathname can also be resolved
1023 <literal>debugfs</literal>.</para>
1026 <para>On the MDS (as root), use
1027 <literal>debugfs</literal> to find the file associated with the
1030 # debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test
1032 <para>Here is the command output:</para>
1034 debugfs 1.42.3.wc3 (15-Aug-2012)
1035 /dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\
1038 2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP
1042 <para>The command lists the inode and pathname associated with the
1046 <literal>Debugfs</literal>' ''ncheck'' is a brute-force search that may
1047 take a long time to complete.</para>
1050 <para>To find the Lustre file from a disk LBA, follow the steps listed in
1051 the document at this URL:
1052 <link xl:href="https://www.smartmontools.org/wiki/BadBlockHowto">
1053 https://www.smartmontools.org/wiki/BadBlockHowto</link>. Then,
1054 follow the steps above to resolve the Lustre filename.</para>
1059 vim:expandtab:shiftwidth=2:tabstop=8: