1 <?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="flr"
3 <title xml:id="flr.title">File Level Redundancy (FLR)</title>
4 <para>This chapter describes File Level Redundancy (FLR).</para>
5 <section xml:id="flr.intro">
6 <title>Introduction</title>
7 <para> The Lustre file system was initially designed and implemented for HPC
8 use. It has been working well on high-end storage that has internal
9 redundancy and fault-tolerance. However, despite the expense and
10 complexity of these storage systems, storage failures still occur, and
11 before release 2.11, Lustre could not be more reliable than the
12 individual storage and servers’ components on which it was based. The
13 Lustre file system had no mechanism to mitigate storage hardware
14 failures and files would become inaccessible if a server was inaccessible
15 or otherwise out of service.</para>
16 <para>With the File Level Redundancy (FLR) feature introduced in Lustre
17 Release 2.11, any Lustre file can store the same data on multiple OSTs in
18 order for the system to be robust in the event of storage failures or
19 other outages. With the choice of multiple mirrors, the best suited
20 mirror can be chosen to satisfy an individual request, which has a direct
21 impact on IO availability. Furthermore, for files that are concurrently
22 read by many clients (e.g. input decks, shared libraries, or executables)
23 the aggregate parallel read performance of a single file can be improved
24 by creating multiple mirrors of the file data.</para>
25 <para>The first phase of the FLR feature has been implemented with delayed
26 write (<xref linkend="flr.delayedwrite.fig"/>). While writing to a
27 mirrored file, only one primary or preferred mirror will be updated
28 directly during the write, while other mirrors will be simply marked as
29 stale. The file can subsequently return to a mirrored state again by
30 synchronizing among mirrors with command line tools (run by the user or
31 administrator directly or via automated monitoring tools).</para>
32 <figure xml:id="flr.delayedwrite.fig">
33 <title>FLR Delayed Write</title>
36 <imagedata scalefit="1" width="50%"
37 fileref="figures/FLR_DelayedWrite.png" />
39 <textobject><phrase>FLR Delayed Write Diagram</phrase></textobject>
43 <section xml:id="flr.operations">
44 <title>Operations</title>
45 <para>Lustre provides <literal>lfs mirror</literal> command line tools for
46 users to operate on mirrored files or directories.</para>
47 <section xml:id="flr.operations.createmirror">
48 <title>Creating a Mirrored File or Directory</title>
49 <para><emphasis role="strong">Command:</emphasis></para>
50 <screen>lfs mirror create <--mirror-count|-N[mirror_count]
51 [setstripe_options|[--flags<=flags>]]> ... <filename|directory></screen>
52 <para>The above command will create a mirrored file or directory specified
53 by <replaceable>filename</replaceable> or
54 <replaceable>directory</replaceable>, respectively.</para>
57 <colspec align="left" />
58 <colspec align="left" />
62 <entry>Description</entry>
67 <entry>--mirror-count|-N[mirror_count]</entry>
69 <para>Indicates the number of mirrors to be created with the
70 following setstripe options. It can be repeated multiple
71 times to separate mirrors that have different layouts.
73 <para>The <replaceable>mirror_count</replaceable> argument is
74 optional and defaults to <literal>1</literal> if it is not
75 specified; if specified, it must follow the option without a
80 <entry>setstripe_options</entry>
82 <para>Specifies a specific layout for the mirror. It can be a
83 plain layout with a specific striping pattern or a composite
84 layout, such as <xref linkend="pfl"/>. The options are
85 the same as those for the <literal>lfs setstripe</literal>
87 <para>If <replaceable>setstripe_options</replaceable> are not
88 specified, then the stripe options inherited from the previous
89 component will be used. If there is no previous component,
90 then the <literal>stripe_count</literal> and
91 <literal>stripe_size</literal> options inherited from the
92 filesystem-wide default values will be used, and the OST
93 <literal>pool_name</literal> inherited from the parent
94 directory will be used.</para>
98 <entry>--flags<=flags></entry>
100 <para>Sets flags to the mirror to be created.</para>
101 <para>Only the <literal>prefer</literal> flag is supported at
102 this time. This flag will be set to all components that belong
103 to the corresponding mirror. The <literal>prefer</literal>
104 flag gives a hint to Lustre for which mirrors should be used
105 to serve I/O. When a mirrored file is being read, the
106 component(s) with the <literal>prefer</literal> flag is likely
107 to be picked to serve the read; and when a mirrored file is
108 prepared to be written, the MDT will tend to choose the
109 component with the <literal>prefer</literal> flag set and
110 mark the other components with overlapping extents as stale.
111 This flag just provides a hint to Lustre, which means Lustre
112 may still choose mirrors without this flag set, for instance,
113 if all preferred mirrors are unavailable when the I/O occurs.
114 This flag can be set on multiple components.</para>
115 <para><emphasis role="strong">Note:</emphasis> This flag will
116 be set to all components that belong to the corresponding
117 mirror. The <literal>--comp-flags</literal> option also
118 exists, which can be set to individual components at mirror
119 creation time.</para>
125 <para><emphasis role="strong">Note:</emphasis> For redundancy and
126 fault-tolerance, users need to make sure that different mirrors must
127 be on different OSTs, even OSSs and racks. An understanding of cluster
128 topology is necessary to achieve this architecture. In the initial
129 implementation the use of the existing OST pools mechanism will allow
130 separating OSTs by any arbitrary criteria: i.e. fault domain.
131 In practice, users can take advantage of OST pools by grouping OSTs
132 by topological information. Therefore, when creating a mirrored file,
133 users can indicate which OST pools can be used by mirrors.</para>
134 <para><emphasis role="strong">Examples:</emphasis></para>
135 <para>The following command creates a mirrored file with 2 plain layout
137 <screen>client# lfs mirror create -N -S 4M -c 2 -p flash \
138 -N -c -1 -p archive /mnt/testfs/file1</screen>
139 <para>The following command displays the layout information of the
140 mirrored file <literal>/mnt/testfs/file1</literal>:</para>
141 <screen>client# lfs getstripe /mnt/testfs/file1
149 lcme_extent.e_start: 0
150 lcme_extent.e_end: EOF
152 lmm_stripe_size: 4194304
158 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }
159 - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] }
164 lcme_extent.e_start: 0
165 lcme_extent.e_end: EOF
167 lmm_stripe_size: 4194304
173 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] }
174 - 1: { l_ost_idx: 4, l_fid: [0x100040000:0x2:0x0] }
175 - 2: { l_ost_idx: 5, l_fid: [0x100050000:0x2:0x0] }
176 - 3: { l_ost_idx: 6, l_fid: [0x100060000:0x2:0x0] }
177 - 4: { l_ost_idx: 7, l_fid: [0x100070000:0x2:0x0] }
178 - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] }</screen>
179 <para> The first mirror has 4MB stripe size and two stripes across OSTs in
180 the “flash” OST pool. The second mirror has 4MB stripe size inherited
181 from the first mirror, and stripes across all of the available OSTs in
182 the “archive” OST pool.</para>
183 <para>As mentioned above, it is recommended to use the
184 <literal>--pool|-p</literal> option (one of the
185 <literal>lfs setstripe</literal> options) with OST pools configured with
186 independent fault domains to ensure different mirrors will be placed on
187 different OSTs, servers, and/or racks, thereby improving availability
188 and performance. If the setstripe options are not specified, it is
189 possible to create mirrors with objects on the same OST(s), which would
190 remove most of the benefit of using replication.</para>
191 <para>In the layout information printed by <literal>lfs getstripe</literal>,
192 <literal>lcme_mirror_id</literal> shows mirror ID, which is the unique
193 numerical identifier for a mirror. And <literal>lcme_flags</literal> shows
194 mirrored component flags. Valid flag names are:</para>
197 <para><literal>init</literal> - indicates mirrored component has been
198 initialized (has allocated OST objects).</para>
201 <para><literal>stale</literal> - indicates mirrored component does not
202 have up-to-date data. Stale components will not be used for read or
203 write operations, and need to be resynchronized by running
204 <literal>lfs mirror resync</literal> command before they can be
205 accessed again.</para>
208 <para><literal>prefer</literal> - indicates mirrored component is
209 preferred for read or write. For example, the mirror is located on
210 SSD-based OSTs or is closer, fewer hops, on the network to the
211 client. This flag can be set by users at mirror creation time.</para>
214 <para>The following command creates a mirrored file with 3 PFL mirrors:
216 <screen>client# lfs mirror create -N -E 4M -p flash --flags=prefer -E eof -c 2 \
217 -N -E 16M -S 8M -c 4 -p archive --comp-flags=prefer -E eof -c -1 \
218 -N -E 32M -c 1 -p none -E eof -c -1 /mnt/testfs/file2</screen>
219 <para>The following command displays the layout information of the
220 mirrored file <literal>/mnt/testfs/file2</literal>:</para>
221 <screen>client# lfs getstripe /mnt/testfs/file2
228 lcme_flags: init,prefer
229 lcme_extent.e_start: 0
230 lcme_extent.e_end: 4194304
232 lmm_stripe_size: 1048576
238 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] }
243 lcme_extent.e_start: 4194304
244 lcme_extent.e_end: EOF
246 lmm_stripe_size: 1048576
249 lmm_stripe_offset: -1
254 lcme_flags: init,prefer
255 lcme_extent.e_start: 0
256 lcme_extent.e_end: 16777216
258 lmm_stripe_size: 8388608
264 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x3:0x0] }
265 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x3:0x0] }
266 - 2: { l_ost_idx: 6, l_fid: [0x100060000:0x3:0x0] }
267 - 3: { l_ost_idx: 7, l_fid: [0x100070000:0x3:0x0] }
272 lcme_extent.e_start: 16777216
273 lcme_extent.e_end: EOF
275 lmm_stripe_size: 8388608
278 lmm_stripe_offset: -1
284 lcme_extent.e_start: 0
285 lcme_extent.e_end: 33554432
287 lmm_stripe_size: 8388608
292 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] }
297 lcme_extent.e_start: 33554432
298 lcme_extent.e_end: EOF
300 lmm_stripe_size: 8388608
303 lmm_stripe_offset: -1</screen>
304 <para>For the first mirror, the first component inherits the stripe count
305 and stripe size from filesystem-wide default values. The second
306 component inherits the stripe size and OST pool from the first
307 component, and has two stripes. Both of the components are allocated
308 from the “flash” OST pool. Also, the flag <literal>prefer</literal> is
309 applied to all the components of the first mirror, which tells the
310 client to read data from those components whenever they are available.
312 <para>For the second mirror, the first component has an 8MB stripe size
313 and 4 stripes across OSTs in the “archive” OST pool. The second
314 component inherits the stripe size and OST pool from the first
315 component, and stripes across all of the available OSTs in the “archive”
316 OST pool. The flag <literal>prefer</literal> is only applied to the
317 first component.</para>
318 <para>For the third mirror, the first component inherits the stripe size
319 of 8MB from the last component of the second mirror, and has one single
320 stripe. The OST pool name is cleared and inherited from the parent
321 directory (if it was set with OST pool name). The second component
322 inherits stripe size from the first component, and stripes across all of
323 the available OSTs.</para>
325 <section xml:id="flr.operations.extendmirror">
326 <title>Extending a Mirrored File</title>
327 <para><emphasis role="strong">Command:</emphasis></para>
328 <screen>lfs mirror extend [--no-verify] <--mirror-count|-N[mirror_count]
329 [setstripe_options|-f <victim_file>]> ... <filename></screen>
330 <para>The above command will append mirror(s) indicated by
331 <literal>setstripe options</literal> or just take the layout from
332 existing file <replaceable>victim_file</replaceable> into the file
333 <replaceable>filename</replaceable>. The
334 <replaceable>filename</replaceable> must be an existing file, however,
335 it can be a mirrored or regular non-mirrored file. If it is a
336 non-mirrored file, the command will convert it to a mirrored file.
340 <colspec align="left" />
341 <colspec align="left" />
344 <entry>Option</entry>
345 <entry>Description</entry>
350 <entry>--mirror-count|-N[mirror_count]</entry>
352 <para>Indicates the number of mirrors to be added with the
353 following <literal>setstripe options</literal>. It can be
354 repeated multiple times to separate mirrors that have
355 different layouts.</para>
356 <para>The <replaceable>mirror_count</replaceable> argument is
357 optional and defaults to <literal>1</literal> if it is not
358 specified; if specified, it must follow the option without a
363 <entry>setstripe_options</entry>
365 <para>Specifies a specific layout for the mirror. It can be a
366 plain layout with specific striping pattern or a composite
367 layout, such as <xref linkend="pfl"/>. The options are the
368 same as those for the <literal>lfs setstripe</literal>
370 <para>If <replaceable>setstripe_options</replaceable> are not
371 specified, then the stripe options inherited from the previous
372 component will be used. If there is no previous component,
373 then the <literal>stripe_count</literal> and
374 <literal>stripe_size</literal> options inherited from
375 filesystem-wide default values will be used, and the OST
376 <literal>pool_name</literal> inherited from parent directory
381 <entry>-f <victim_file></entry>
383 <para>If <replaceable>victim_file</replaceable> exists, the
384 command will split the layout from that file and use it as a
385 mirror added to the mirrored file. After the command is
386 finished, the <replaceable>victim_file</replaceable> will be
388 <para><emphasis role="strong">Note</emphasis>: The
389 <replaceable>setstripe_options</replaceable> cannot be
390 specified with <literal>-f <victim_file></literal>
391 option in one command line.</para>
395 <entry>--no-verify</entry>
396 <entry>If <replaceable>victim_file</replaceable> is specified, the
397 command will verify that the file contents from
398 <replaceable>victim_file</replaceable> are the same as
399 <replaceable>filename</replaceable>. Otherwise, the command
400 will return a failure. However, the option
401 <literal>--no-verify</literal> can be used to override this
402 verification. This option can save significant time on file
403 comparison if the file size is large, but use it only when the
404 file contents are known to be the same.</entry>
409 <para><emphasis role="strong">Note</emphasis>: The
410 <literal>lfs mirror extend</literal> operation won't be applied to the
412 <para><emphasis role="strong">Examples:</emphasis></para>
413 <para>The following commands create a non-mirrored file, convert it to
414 a mirrored file, and extend it with a plain layout mirror:</para>
415 <screen># lfs setstripe -p flash /mnt/testfs/file1
416 # lfs getstripe /mnt/testfs/file1
419 lmm_stripe_size: 1048576
424 obdidx objid objid group
427 # lfs mirror extend -N -S 8M -c -1 -p archive /mnt/testfs/file1
428 # lfs getstripe /mnt/testfs/file1
436 lcme_extent.e_start: 0
437 lcme_extent.e_end: EOF
439 lmm_stripe_size: 1048576
445 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
450 lcme_extent.e_start: 0
451 lcme_extent.e_end: EOF
453 lmm_stripe_size: 8388608
459 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3:0x0] }
460 - 1: { l_ost_idx: 4, l_fid: [0x100040000:0x4:0x0] }
461 - 2: { l_ost_idx: 5, l_fid: [0x100050000:0x4:0x0] }
462 - 3: { l_ost_idx: 6, l_fid: [0x100060000:0x4:0x0] }
463 - 4: { l_ost_idx: 7, l_fid: [0x100070000:0x4:0x0] }
464 - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x3:0x0] }</screen>
465 <para>The following commands split the PFL layout from a
466 <replaceable>victim_file</replaceable> and use it as a mirror added to
467 the mirrored file <literal>/mnt/testfs/file1</literal> created in the
468 above example without data verification:</para>
469 <screen># lfs setstripe -E 16M -c 2 -p none \
470 -E eof -c -1 /mnt/testfs/victim_file
471 # lfs getstripe /mnt/testfs/victim_file
472 /mnt/testfs/victim_file
479 lcme_extent.e_start: 0
480 lcme_extent.e_end: 16777216
482 lmm_stripe_size: 1048576
487 - 0: { l_ost_idx: 5, l_fid: [0x100050000:0x5:0x0] }
488 - 1: { l_ost_idx: 6, l_fid: [0x100060000:0x5:0x0] }
493 lcme_extent.e_start: 16777216
494 lcme_extent.e_end: EOF
496 lmm_stripe_size: 1048576
499 lmm_stripe_offset: -1
501 # lfs mirror extend --no-verify -N -f /mnt/testfs/victim_file \
503 # lfs getstripe /mnt/testfs/file1
511 lcme_extent.e_start: 0
512 lcme_extent.e_end: EOF
514 lmm_stripe_size: 1048576
520 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
525 lcme_extent.e_start: 0
526 lcme_extent.e_end: EOF
528 lmm_stripe_size: 8388608
534 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3:0x0] }
535 - 1: { l_ost_idx: 4, l_fid: [0x100040000:0x4:0x0] }
536 - 2: { l_ost_idx: 5, l_fid: [0x100050000:0x4:0x0] }
537 - 3: { l_ost_idx: 6, l_fid: [0x100060000:0x4:0x0] }
538 - 4: { l_ost_idx: 7, l_fid: [0x100070000:0x4:0x0] }
539 - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x3:0x0] }
544 lcme_extent.e_start: 0
545 lcme_extent.e_end: 16777216
547 lmm_stripe_size: 1048576
552 - 0: { l_ost_idx: 5, l_fid: [0x100050000:0x5:0x0] }
553 - 1: { l_ost_idx: 6, l_fid: [0x100060000:0x5:0x0] }
558 lcme_extent.e_start: 16777216
559 lcme_extent.e_end: EOF
561 lmm_stripe_size: 1048576
564 lmm_stripe_offset: -1</screen>
565 <para>After extending, the <replaceable>victim_file</replaceable> was
567 <screen># ls /mnt/testfs/victim_file
568 ls: cannot access /mnt/testfs/victim_file: No such file or directory</screen>
570 <section xml:id="flr.operations.splitmirror">
571 <title>Splitting a Mirrored File</title>
572 <para><emphasis role="strong">Command:</emphasis></para>
573 <screen>lfs mirror split <--mirror-id <mirror_id>>
574 [--destroy|-d] [-f <new_file>] <mirrored_file></screen>
575 <para>The above command will split a specified mirror with ID
576 <replaceable><mirror_id></replaceable> out of an existing mirrored
578 <replaceable>mirrored_file</replaceable>. By default, a new file named
579 <literal><mirrored_file>.mirror~<mirror_id></literal> will
580 be created with the layout of the split mirror. If the
581 <literal>--destroy|-d</literal> option is specified, then the split
582 mirror will be destroyed. If the <literal>-f <new_file></literal>
583 option is specified, then a file named
584 <replaceable>new_file</replaceable> will be created with the layout of
585 the split mirror. If <replaceable>mirrored_file</replaceable> has only
586 one mirror existing after split, it will be converted to a regular
587 non-mirrored file. If the original
588 <replaceable>mirrored_file</replaceable> is not a mirrored file, then
589 the command will return an error.</para>
592 <colspec align="left" />
593 <colspec align="left" />
596 <entry>Option</entry>
597 <entry>Description</entry>
602 <entry>--mirror-id <mirror_id></entry>
603 <entry>The unique numerical identifier for a mirror. The mirror
604 ID is unique within a mirrored file and is automatically
605 assigned at file creation or extension time. It can be fetched
606 by the <literal>lfs getstripe</literal> command.
610 <entry>--destroy|-d</entry>
611 <entry>Indicates the split mirror will be destroyed.</entry>
614 <entry>-f <new_file></entry>
615 <entry>Indicates a file named <replaceable>new_file</replaceable>
616 will be created with the layout of the split mirror.</entry>
621 <para><emphasis role="strong">Examples:</emphasis></para>
622 <para>The following commands create a mirrored file with 4 mirrors, then
623 split 3 mirrors separately from the mirrored file.</para>
624 <para>Creating a mirrored file with 4 mirrors:</para>
625 <screen># lfs mirror create -N2 -E 4M -p flash -E eof -c -1 \
626 -N2 -S 8M -c 2 -p archive /mnt/testfs/file1
627 # lfs getstripe /mnt/testfs/file1
635 lcme_extent.e_start: 0
636 lcme_extent.e_end: 4194304
638 lmm_stripe_size: 1048576
644 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x4:0x0] }
649 lcme_extent.e_start: 4194304
650 lcme_extent.e_end: EOF
652 lmm_stripe_size: 1048576
655 lmm_stripe_offset: -1
661 lcme_extent.e_start: 0
662 lcme_extent.e_end: 4194304
664 lmm_stripe_size: 1048576
670 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
675 lcme_extent.e_start: 4194304
676 lcme_extent.e_end: EOF
678 lmm_stripe_size: 1048576
681 lmm_stripe_offset: -1
687 lcme_extent.e_start: 0
688 lcme_extent.e_end: EOF
690 lmm_stripe_size: 8388608
696 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x5:0x0] }
697 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x6:0x0] }
702 lcme_extent.e_start: 0
703 lcme_extent.e_end: EOF
705 lmm_stripe_size: 8388608
711 - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x5:0x0] }
712 - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }</screen>
713 <para>Splitting the mirror with ID <literal>1</literal> from
714 <literal>/mnt/testfs/file1</literal> and creating
715 <literal>/mnt/testfs/file1.mirror~1</literal> with the layout of the
717 <screen># lfs mirror split --mirror-id 1 /mnt/testfs/file1
718 # lfs getstripe /mnt/testfs/file1.mirror~1
719 /mnt/testfs/file1.mirror~1
726 lcme_extent.e_start: 0
727 lcme_extent.e_end: 4194304
729 lmm_stripe_size: 1048576
735 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x4:0x0] }
740 lcme_extent.e_start: 4194304
741 lcme_extent.e_end: EOF
743 lmm_stripe_size: 1048576
746 lmm_stripe_offset: -1
747 lmm_pool: flash</screen>
748 <para>Splitting the mirror with ID <literal>2</literal> from
749 <literal>/mnt/testfs/file1</literal> and destroying it:</para>
750 <screen># lfs mirror split --mirror-id 2 -d /mnt/testfs/file1
751 # lfs getstripe /mnt/testfs/file1
759 lcme_extent.e_start: 0
760 lcme_extent.e_end: EOF
762 lmm_stripe_size: 8388608
768 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x5:0x0] }
769 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x6:0x0] }
774 lcme_extent.e_start: 0
775 lcme_extent.e_end: EOF
777 lmm_stripe_size: 8388608
783 - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x5:0x0] }
784 - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }</screen>
785 <para>Splitting the mirror with ID <literal>3</literal> from
786 <literal>/mnt/testfs/file1</literal> and creating
787 <literal>/mnt/testfs/file2</literal> with the layout of the split
789 <screen># lfs mirror split --mirror-id 3 -f /mnt/testfs/file2 \
791 # lfs getstripe /mnt/testfs/file2
799 lcme_extent.e_start: 0
800 lcme_extent.e_end: EOF
802 lmm_stripe_size: 8388608
808 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x5:0x0] }
809 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x6:0x0] }
811 # lfs getstripe /mnt/testfs/file1
819 lcme_extent.e_start: 0
820 lcme_extent.e_end: EOF
822 lmm_stripe_size: 8388608
828 - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x5:0x0] }
829 - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }</screen>
830 <para>The above layout information showed that mirrors with ID
831 <literal>1, 2, and 3</literal> were all split from the mirrored file
832 <literal>/mnt/testfs/file1</literal>.</para>
834 <section xml:id="flr.operations.resyncmirror">
835 <title>Resynchronizing out-of-sync Mirrored File(s)</title>
836 <para><emphasis role="strong">Command:</emphasis></para>
837 <screen>lfs mirror resync [--only <mirror_id[,...]>]
838 <mirrored_file> [<mirrored_file2>...]</screen>
839 <para>The above command will resynchronize out-of-sync mirrored file(s)
840 specified by <replaceable>mirrored_file</replaceable>. It
841 supports specifying multiple mirrored files in one command line.</para>
842 <para>If there is no stale mirror for the specified mirrored file(s), then
843 the command does nothing. Otherwise, it will copy data from synced
844 mirror to the stale mirror(s), and mark all successfully copied
845 mirror(s) as SYNC. If the
846 <literal>--only <mirror_id[,...]></literal> option is specified,
847 then the command will only resynchronize the mirror(s) specified by the
848 <replaceable>mirror_id(s)</replaceable>. This option cannot be used when
849 multiple mirrored files are specified.</para>
852 <colspec align="left" />
853 <colspec align="left" />
856 <entry>Option</entry>
857 <entry>Description</entry>
862 <entry>--only <mirror_id[,...]></entry>
863 <entry>Indicates which mirror(s) specified by
864 <replaceable>mirror_id(s)</replaceable> needs to be
865 resynchronized. The <replaceable>mirror_id</replaceable> is the
866 unique numerical identifier for a mirror. Multiple
867 <replaceable>mirror_ids</replaceable> are separated by comma.
868 This option cannot be used when multiple mirrored files are
874 <para><emphasis role="strong">Note:</emphasis> With delayed write
875 implemented in FLR phase 1, after writing to a mirrored file, users
876 need to run <literal>lfs mirror resync</literal> command to get all
877 mirrors synchronized.</para>
878 <para><emphasis role="strong">Examples:</emphasis></para>
879 <para>The following commands create a mirrored file with 3 mirrors, then
880 write some data into the file and resynchronizes stale mirrors.</para>
881 <para>Creating a mirrored file with 3 mirrors:</para>
882 <screen># lfs mirror create -N -E 4M -p flash -E eof \
883 -N2 -p archive /mnt/testfs/file1
884 # lfs getstripe /mnt/testfs/file1
892 lcme_extent.e_start: 0
893 lcme_extent.e_end: 4194304
895 lmm_stripe_size: 1048576
901 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x5:0x0] }
906 lcme_extent.e_start: 4194304
907 lcme_extent.e_end: EOF
909 lmm_stripe_size: 1048576
912 lmm_stripe_offset: -1
918 lcme_extent.e_start: 0
919 lcme_extent.e_end: EOF
921 lmm_stripe_size: 1048576
927 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x4:0x0] }
932 lcme_extent.e_start: 0
933 lcme_extent.e_end: EOF
935 lmm_stripe_size: 1048576
941 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x6:0x0] }</screen>
942 <para>Writing some data into the mirrored file
943 <literal>/mnt/testfs/file1</literal>:</para>
944 <screen># yes | dd of=/mnt/testfs/file1 bs=1M count=2
947 2097152 bytes (2.1 MB) copied, 0.0320613 s, 65.4 MB/s
949 # lfs getstripe /mnt/testfs/file1
957 lcme_extent.e_start: 0
958 lcme_extent.e_end: 4194304
964 lcme_extent.e_start: 4194304
965 lcme_extent.e_end: EOF
970 lcme_flags: init,stale
971 lcme_extent.e_start: 0
972 lcme_extent.e_end: EOF
977 lcme_flags: init,stale
978 lcme_extent.e_start: 0
979 lcme_extent.e_end: EOF
982 <para>The above layout information showed that data were written into the
983 first component of mirror with ID <literal>1</literal>, and mirrors with
984 ID <literal>2</literal> and <literal>3</literal> were marked with
986 <para>Resynchronizing the stale mirror with ID <literal>2</literal> for
987 the mirrored file <literal>/mnt/testfs/file1</literal>:</para>
988 <screen># lfs mirror resync --only 2 /mnt/testfs/file1
989 # lfs getstripe /mnt/testfs/file1
997 lcme_extent.e_start: 0
998 lcme_extent.e_end: 4194304
1004 lcme_extent.e_start: 4194304
1005 lcme_extent.e_end: EOF
1011 lcme_extent.e_start: 0
1012 lcme_extent.e_end: EOF
1017 lcme_flags: init,stale
1018 lcme_extent.e_start: 0
1019 lcme_extent.e_end: EOF
1022 <para>The above layout information showed that after resynchronizing, the
1023 “stale” flag was removed from mirror with ID <literal>2</literal>.</para>
1024 <para>Resynchronizing all of the stale mirrors for the mirrored file
1025 <literal>/mnt/testfs/file1</literal>:</para>
1026 <screen># lfs mirror resync /mnt/testfs/file1
1027 # lfs getstripe /mnt/testfs/file1
1035 lcme_extent.e_start: 0
1036 lcme_extent.e_end: 4194304
1042 lcme_extent.e_start: 4194304
1043 lcme_extent.e_end: EOF
1049 lcme_extent.e_start: 0
1050 lcme_extent.e_end: EOF
1056 lcme_extent.e_start: 0
1057 lcme_extent.e_end: EOF
1060 <para>The above layout information showed that after resynchronizing, none
1061 of the mirrors were marked as stale.</para>
1063 <section xml:id="flr.operations.verifymirror">
1064 <title>Verifying Mirrored File(s)</title>
1065 <para><emphasis role="strong">Command:</emphasis></para>
1066 <screen>lfs mirror verify [--only <mirror_id,mirror_id2[,...]>]
1067 [--verbose|-v] <mirrored_file> [<mirrored_file2> ...]</screen>
1068 <para>The above command will verify that each SYNC mirror (contains
1069 up-to-date data) of a mirrored file, specified by
1070 <replaceable>mirrored_file</replaceable>, has exactly the same data. It
1071 supports specifying multiple mirrored files in one command line.</para>
1072 <para>This is a scrub tool that should be run on regular basis to make
1073 sure that mirrored files are not corrupted. The command won't repair the
1074 file if it turns out to be corrupted. Usually, an administrator should
1075 check the file content from each mirror and decide which one is correct
1076 and then invoke <literal>lfs mirror resync</literal> to repair it
1080 <colspec align="left" />
1081 <colspec align="left" />
1084 <entry>Option</entry>
1085 <entry>Description</entry>
1090 <entry>--only <mirror_id,mirror_id2[,...]></entry>
1091 <entry><para>Indicates which mirrors specified by
1092 <replaceable>mirror_ids</replaceable> need to be verified. The
1093 <replaceable>mirror_id</replaceable> is the unique numerical
1094 identifier for a mirror. Multiple
1095 <replaceable>mirror_ids</replaceable> are separated by comma.
1097 <para>Note: At least two <replaceable>mirror_ids</replaceable>
1098 are required. This option cannot be used when multiple
1099 mirrored files are specified.</para>
1103 <entry>--verbose|-v</entry>
1104 <entry>Indicates the command will print where the differences are
1105 if the data do not match. Otherwise, the command will just
1106 return an error in that case. This option can be repeated for
1107 multiple times to print more information.</entry>
1112 <para><emphasis role="strong">Note:</emphasis></para>
1113 <para>Mirror components that have “stale” or “offline” flags will be
1114 skipped and not verified.</para>
1115 <para><emphasis role="strong">Examples:</emphasis></para>
1116 <para>The following command verifies that each mirror of a mirrored file
1117 contains exactly the same data:</para>
1118 <screen># lfs mirror verify /mnt/testfs/file1</screen>
1119 <para>The following command has the <literal>-v</literal> option specified
1120 to print where the differences are if the data does not match:</para>
1121 <screen># lfs mirror verify -vvv /mnt/testfs/file2
1122 Chunks to be verified in /mnt/testfs/file2:
1123 [0, 0x200000) [1, 2, 3, 4] 4
1124 [0x200000, 0x400000) [1, 2, 3, 4] 4
1125 [0x400000, 0x600000) [1, 2, 3, 4] 4
1126 [0x600000, 0x800000) [1, 2, 3, 4] 4
1127 [0x800000, 0xa00000) [1, 2, 3, 4] 4
1128 [0xa00000, 0x1000000) [1, 2, 3, 4] 4
1129 [0x1000000, 0xffffffffffffffff) [1, 2, 3, 4] 4
1131 Verifying chunk [0, 0x200000) on mirror: 1 2 3 4
1132 CRC-32 checksum value for chunk [0, 0x200000):
1133 Mirror 1: 0x207b02f1
1134 Mirror 2: 0x207b02f1
1135 Mirror 3: 0x207b02f1
1136 Mirror 4: 0x207b02f1
1138 Verifying chunk [0, 0x200000) on mirror: 1 2 3 4 PASS
1140 Verifying chunk [0x200000, 0x400000) on mirror: 1 2 3 4
1141 CRC-32 checksum value for chunk [0x200000, 0x400000):
1142 Mirror 1: 0x207b02f1
1143 Mirror 2: 0x207b02f1
1144 Mirror 3: 0x207b02f1
1145 Mirror 4: 0x207b02f1
1147 Verifying chunk [0x200000, 0x400000) on mirror: 1 2 3 4 PASS
1149 Verifying chunk [0x400000, 0x600000) on mirror: 1 2 3 4
1150 CRC-32 checksum value for chunk [0x400000, 0x600000):
1151 Mirror 1: 0x42571b66
1152 Mirror 2: 0x42571b66
1153 Mirror 3: 0x42571b66
1156 lfs mirror verify: chunk [0x400000, 0x600000) has different
1157 checksum value on mirror 1 and mirror 4.
1158 Verifying chunk [0x600000, 0x800000) on mirror: 1 2 3 4
1159 CRC-32 checksum value for chunk [0x600000, 0x800000):
1160 Mirror 1: 0x1f8ad0d8
1161 Mirror 2: 0x1f8ad0d8
1162 Mirror 3: 0x1f8ad0d8
1163 Mirror 4: 0x18975bf9
1165 lfs mirror verify: chunk [0x600000, 0x800000) has different
1166 checksum value on mirror 1 and mirror 4.
1167 Verifying chunk [0x800000, 0xa00000) on mirror: 1 2 3 4
1168 CRC-32 checksum value for chunk [0x800000, 0xa00000):
1169 Mirror 1: 0x69c17478
1170 Mirror 2: 0x69c17478
1171 Mirror 3: 0x69c17478
1172 Mirror 4: 0x69c17478
1174 Verifying chunk [0x800000, 0xa00000) on mirror: 1 2 3 4 PASS
1176 lfs mirror verify: '/mnt/testfs/file2' chunk [0xa00000, 0x1000000]
1177 exceeds file size 0xa00000: skipped</screen>
1178 <para>The following command uses the <literal>--only</literal> option to
1179 only verify the specified mirrors:</para>
1180 <screen># lfs mirror verify -v --only 1,4 /mnt/testfs/file2
1181 CRC-32 checksum value for chunk [0, 0x200000):
1182 Mirror 1: 0x207b02f1
1183 Mirror 4: 0x207b02f1
1185 CRC-32 checksum value for chunk [0x200000, 0x400000):
1186 Mirror 1: 0x207b02f1
1187 Mirror 4: 0x207b02f1
1189 CRC-32 checksum value for chunk [0x400000, 0x600000):
1190 Mirror 1: 0x42571b66
1193 lfs mirror verify: chunk [0x400000, 0x600000) has different
1194 checksum value on mirror 1 and mirror 4.
1195 CRC-32 checksum value for chunk [0x600000, 0x800000):
1196 Mirror 1: 0x1f8ad0d8
1197 Mirror 4: 0x18975bf9
1199 lfs mirror verify: chunk [0x600000, 0x800000) has different
1200 checksum value on mirror 1 and mirror 4.
1201 CRC-32 checksum value for chunk [0x800000, 0xa00000):
1202 Mirror 1: 0x69c17478
1203 Mirror 4: 0x69c17478
1205 lfs mirror verify: '/mnt/testfs/file2' chunk [0xa00000, 0x1000000]
1206 exceeds file size 0xa00000: skipped</screen>
1208 <section xml:id="flr.operations.findingmirror">
1209 <title>Finding Mirrored File(s)</title>
1210 <para>The <literal>lfs find</literal> command is used to list files and
1211 directories with specific attributes. The following two attribute
1212 parameters are specific to a mirrored file or directory:</para>
1213 <screen>lfs find <directory|filename ...>
1214 [[!] --mirror-count|-N [+-]n]
1215 [[!] --mirror-state <[^]state>]</screen>
1218 <colspec align="left" />
1219 <colspec align="left" />
1222 <entry>Option</entry>
1223 <entry>Description</entry>
1228 <entry>--mirror-count|-N [+-]n</entry>
1229 <entry>Indicates mirror count.</entry>
1232 <entry>--mirror-state <[^]state></entry>
1234 <para>Indicates mirrored file state.</para>
1235 <para>If <replaceable>^state</replaceable> is used, print only
1236 files not matching <replaceable>state</replaceable>. Only one
1237 state can be specified.</para>
1238 <para>Valid state names are:</para>
1239 <para><literal>ro</literal> – indicates the mirrored file is in
1240 read-only state. All of the mirrors contain the up-to-date
1242 <para><literal>wp</literal> – indicates the mirrored file is in
1243 a state of being written.</para>
1244 <para><literal>sp</literal> – indicates the mirrored file is in
1245 a state of being resynchronized.</para>
1251 <para><emphasis role="strong">Note:</emphasis></para>
1252 <para>Specifying <literal>!</literal> before an option negates its meaning
1253 (files NOT matching the parameter). Using <literal>+</literal> before a
1254 numeric value means 'more than n', while <literal>-</literal> before a
1255 numeric value means 'less than n'. If neither is used, it means
1256 'equal to n', within the bounds of the unit specified (if any).</para>
1257 <para><emphasis role="strong">Examples:</emphasis></para>
1258 <para>The following command recursively lists all mirrored files that have
1259 more than 2 mirrors under directory <literal>/mnt/testfs</literal>:
1261 <screen># lfs find --mirror-count +2 --type f /mnt/testfs</screen>
1262 <para>The following command recursively lists all out-of-sync mirrored
1263 files under directory <literal>/mnt/testfs</literal>:</para>
1264 <screen># lfs find --mirror-state=^ro --type f /mnt/testfs</screen>
1267 <section xml:id="flr.interop">
1268 <title>Interoperability</title>
1269 <para>Introduced in Lustre release 2.11.0, the FLR feature is based on the
1270 <xref linkend="pfl"/> feature introduced in Lustre 2.10.0</para>
1271 <para>For Lustre release 2.9 and older clients, which do not understand the
1272 PFL layout, they cannot access and open mirrored files created in the
1273 Lustre 2.11 filesystem.</para>
1274 <para>The following example shows the errors returned by accessing and
1275 opening a mirrored file (created in Lustre 2.11 filesystem) on a Lustre
1277 <screen># ls /mnt/testfs/mirrored_file
1278 ls: cannot access /mnt/testfs/mirrored_file: Invalid argument
1280 # cat /mnt/testfs/mirrored_file
1281 cat: /mnt/testfs/mirrored_file: Operation not supported</screen>
1282 <para>For Lustre release 2.10 clients, which understand the PFL layout, but
1283 do not understand a mirrored layout, they can access mirrored files
1284 created in Lustre 2.11 filesystem, however, they cannot open them. This is
1285 because the Lustre 2.10 clients do not verify overlapping components so
1286 they would read and write mirrored files just as if they were normal PFL
1287 files, which will cause a problem where synced mirrors actually contain
1288 different data.</para>
1289 <para>The following example shows the results returned by accessing and
1290 opening a mirrored file (created in Lustre 2.11 filesystem) on a Lustre
1292 <screen># ls /mnt/testfs/mirrored_file
1293 /mnt/testfs/mirrored_file
1295 # cat /mnt/testfs/mirrored_file
1296 cat: /mnt/testfs/mirrored_file: Operation not supported</screen>