1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="flr" condition="l2B">
5 <title xml:id="flr.title">File Level Redundancy (FLR)</title>
6 <para>This chapter describes File Level Redundancy (FLR).</para>
7 <section xml:id="flr.intro">
8 <title>Introduction</title>
9 <para> The Lustre file system was initially designed and implemented for HPC
10 use. It has been working well on high-end storage that has internal
11 redundancy and fault-tolerance. However, despite the expense and
12 complexity of these storage systems, storage failures still occur, and
13 before release 2.11, Lustre could not be more reliable than the
14 individual storage and servers’ components on which it was based. The
15 Lustre file system had no mechanism to mitigate storage hardware
16 failures and files would become inaccessible if a server was inaccessible
17 or otherwise out of service.</para>
18 <para>With the File Level Redundancy (FLR) feature introduced in Lustre
19 Release 2.11, any Lustre file can store the same data on multiple OSTs in
20 order for the system to be robust in the event of storage failures or
21 other outages. With the choice of multiple mirrors, the best suited
22 mirror can be chosen to satisfy an individual request, which has a direct
23 impact on IO availability. Furthermore, for files that are concurrently
24 read by many clients (e.g. input decks, shared libraries, or executables)
25 the aggregate parallel read performance of a single file can be improved
26 by creating multiple mirrors of the file data.</para>
27 <para>The first phase of the FLR feature has been implemented with delayed
28 write (<xref linkend="flr.delayedwrite.fig"/>). While writing to a
29 mirrored file, only one primary or preferred mirror will be updated
30 directly during the write, while other mirrors will be simply marked as
31 stale. The file can subsequently return to a mirrored state again by
32 synchronizing among mirrors with command line tools (run by the user or
33 administrator directly or via automated monitoring tools).</para>
34 <figure xml:id="flr.delayedwrite.fig">
35 <title>FLR Delayed Write</title>
38 <imagedata scalefit="1" width="50%"
39 fileref="figures/FLR_DelayedWrite.png" />
41 <textobject><phrase>FLR Delayed Write Diagram</phrase></textobject>
45 <section xml:id="flr.operations">
46 <title>Operations</title>
47 <para>Lustre provides <literal>lfs mirror</literal> command line tools for
48 users to operate on mirrored files or directories.</para>
49 <section xml:id="flr.operations.createmirror">
50 <title>Creating a Mirrored File or Directory</title>
51 <para><emphasis role="strong">Command:</emphasis></para>
52 <screen>lfs mirror create <--mirror-count|-N[mirror_count]
53 [setstripe_options|[--flags<=flags>]]> ... <filename|directory></screen>
54 <para>The above command will create a mirrored file or directory specified
55 by <replaceable>filename</replaceable> or
56 <replaceable>directory</replaceable>, respectively.</para>
59 <colspec align="left" />
60 <colspec align="left" />
64 <entry>Description</entry>
69 <entry>--mirror-count|-N[mirror_count]</entry>
71 <para>Indicates the number of mirrors to be created with the
72 following setstripe options. It can be repeated multiple
73 times to separate mirrors that have different layouts.
75 <para>The <replaceable>mirror_count</replaceable> argument is
76 optional and defaults to <literal>1</literal> if it is not
77 specified; if specified, it must follow the option without a
82 <entry>setstripe_options</entry>
84 <para>Specifies a specific layout for the mirror. It can be a
85 plain layout with a specific striping pattern or a composite
86 layout, such as <xref linkend="pfl"/>. The options are
87 the same as those for the <literal>lfs setstripe</literal>
89 <para>If <replaceable>setstripe_options</replaceable> are not
90 specified, then the stripe options inherited from the previous
91 component will be used. If there is no previous component,
92 then the <literal>stripe_count</literal> and
93 <literal>stripe_size</literal> options inherited from the
94 filesystem-wide default values will be used, and the OST
95 <literal>pool_name</literal> inherited from the parent
96 directory will be used.</para>
100 <entry>--flags<=flags></entry>
102 <para>Sets flags to the mirror to be created.</para>
103 <para>Only the <literal>prefer</literal> flag is supported at
104 this time. This flag will be set to all components that belong
105 to the corresponding mirror. The <literal>prefer</literal>
106 flag gives a hint to Lustre for which mirrors should be used
107 to serve I/O. When a mirrored file is being read, the
108 component(s) with the <literal>prefer</literal> flag is likely
109 to be picked to serve the read; and when a mirrored file is
110 prepared to be written, the MDT will tend to choose the
111 component with the <literal>prefer</literal> flag set and
112 mark the other components with overlapping extents as stale.
113 This flag just provides a hint to Lustre, which means Lustre
114 may still choose mirrors without this flag set, for instance,
115 if all preferred mirrors are unavailable when the I/O occurs.
116 This flag can be set on multiple components.</para>
117 <para><emphasis role="strong">Note:</emphasis> This flag will
118 be set to all components that belong to the corresponding
119 mirror. The <literal>--comp-flags</literal> option also
120 exists, which can be set to individual components at mirror
121 creation time.</para>
127 <para><emphasis role="strong">Note:</emphasis> For redundancy and
128 fault-tolerance, users need to make sure that different mirrors must
129 be on different OSTs, even OSSs and racks. An understanding of cluster
130 topology is necessary to achieve this architecture. In the initial
131 implementation the use of the existing OST pools mechanism will allow
132 separating OSTs by any arbitrary criteria: i.e. fault domain.
133 In practice, users can take advantage of OST pools by grouping OSTs
134 by topological information. Therefore, when creating a mirrored file,
135 users can indicate which OST pools can be used by mirrors.</para>
136 <para><emphasis role="strong">Examples:</emphasis></para>
137 <para>The following command creates a mirrored file with 2 plain layout
139 <screen>client# lfs mirror create -N -S 4M -c 2 -p flash \
140 -N -c -1 -p archive /mnt/testfs/file1</screen>
141 <para>The following command displays the layout information of the
142 mirrored file <literal>/mnt/testfs/file1</literal>:</para>
143 <screen>client# lfs getstripe /mnt/testfs/file1
151 lcme_extent.e_start: 0
152 lcme_extent.e_end: EOF
154 lmm_stripe_size: 4194304
160 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }
161 - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] }
166 lcme_extent.e_start: 0
167 lcme_extent.e_end: EOF
169 lmm_stripe_size: 4194304
175 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] }
176 - 1: { l_ost_idx: 4, l_fid: [0x100040000:0x2:0x0] }
177 - 2: { l_ost_idx: 5, l_fid: [0x100050000:0x2:0x0] }
178 - 3: { l_ost_idx: 6, l_fid: [0x100060000:0x2:0x0] }
179 - 4: { l_ost_idx: 7, l_fid: [0x100070000:0x2:0x0] }
180 - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] }</screen>
181 <para> The first mirror has 4MB stripe size and two stripes across OSTs in
182 the “flash” OST pool. The second mirror has 4MB stripe size inherited
183 from the first mirror, and stripes across all of the available OSTs in
184 the “archive” OST pool.</para>
185 <para>As mentioned above, it is recommended to use the
186 <literal>--pool|-p</literal> option (one of the
187 <literal>lfs setstripe</literal> options) with OST pools configured with
188 independent fault domains to ensure different mirrors will be placed on
189 different OSTs, servers, and/or racks, thereby improving availability
190 and performance. If the setstripe options are not specified, it is
191 possible to create mirrors with objects on the same OST(s), which would
192 remove most of the benefit of using replication.</para>
193 <para>In the layout information printed by <literal>lfs getstripe</literal>,
194 <literal>lcme_mirror_id</literal> shows mirror ID, which is the unique
195 numerical identifier for a mirror. And <literal>lcme_flags</literal> shows
196 mirrored component flags. Valid flag names are:</para>
199 <para><literal>init</literal> - indicates mirrored component has been
200 initialized (has allocated OST objects).</para>
203 <para><literal>stale</literal> - indicates mirrored component does not
204 have up-to-date data. Stale components will not be used for read or
205 write operations, and need to be resynchronized by running
206 <literal>lfs mirror resync</literal> command before they can be
207 accessed again.</para>
210 <para><literal>prefer</literal> - indicates mirrored component is
211 preferred for read or write. For example, the mirror is located on
212 SSD-based OSTs or is closer, fewer hops, on the network to the
213 client. This flag can be set by users at mirror creation time.</para>
216 <para>The following command creates a mirrored file with 3 PFL mirrors:
218 <screen>client# lfs mirror create -N -E 4M -p flash --flags=prefer -E eof -c 2 \
219 -N -E 16M -S 8M -c 4 -p archive --comp-flags=prefer -E eof -c -1 \
220 -N -E 32M -c 1 -p none -E eof -c -1 /mnt/testfs/file2</screen>
221 <para>The following command displays the layout information of the
222 mirrored file <literal>/mnt/testfs/file2</literal>:</para>
223 <screen>client# lfs getstripe /mnt/testfs/file2
230 lcme_flags: init,prefer
231 lcme_extent.e_start: 0
232 lcme_extent.e_end: 4194304
234 lmm_stripe_size: 1048576
240 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] }
245 lcme_extent.e_start: 4194304
246 lcme_extent.e_end: EOF
248 lmm_stripe_size: 1048576
251 lmm_stripe_offset: -1
256 lcme_flags: init,prefer
257 lcme_extent.e_start: 0
258 lcme_extent.e_end: 16777216
260 lmm_stripe_size: 8388608
266 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x3:0x0] }
267 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x3:0x0] }
268 - 2: { l_ost_idx: 6, l_fid: [0x100060000:0x3:0x0] }
269 - 3: { l_ost_idx: 7, l_fid: [0x100070000:0x3:0x0] }
274 lcme_extent.e_start: 16777216
275 lcme_extent.e_end: EOF
277 lmm_stripe_size: 8388608
280 lmm_stripe_offset: -1
286 lcme_extent.e_start: 0
287 lcme_extent.e_end: 33554432
289 lmm_stripe_size: 8388608
294 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] }
299 lcme_extent.e_start: 33554432
300 lcme_extent.e_end: EOF
302 lmm_stripe_size: 8388608
305 lmm_stripe_offset: -1</screen>
306 <para>For the first mirror, the first component inherits the stripe count
307 and stripe size from filesystem-wide default values. The second
308 component inherits the stripe size and OST pool from the first
309 component, and has two stripes. Both of the components are allocated
310 from the “flash” OST pool. Also, the flag <literal>prefer</literal> is
311 applied to all the components of the first mirror, which tells the
312 client to read data from those components whenever they are available.
314 <para>For the second mirror, the first component has an 8MB stripe size
315 and 4 stripes across OSTs in the “archive” OST pool. The second
316 component inherits the stripe size and OST pool from the first
317 component, and stripes across all of the available OSTs in the “archive”
318 OST pool. The flag <literal>prefer</literal> is only applied to the
319 first component.</para>
320 <para>For the third mirror, the first component inherits the stripe size
321 of 8MB from the last component of the second mirror, and has one single
322 stripe. The OST pool name is cleared and inherited from the parent
323 directory (if it was set with OST pool name). The second component
324 inherits stripe size from the first component, and stripes across all of
325 the available OSTs.</para>
327 <section xml:id="flr.operations.extendmirror">
328 <title>Extending a Mirrored File</title>
329 <para><emphasis role="strong">Command:</emphasis></para>
330 <screen>lfs mirror extend [--no-verify] <--mirror-count|-N[mirror_count]
331 [setstripe_options|-f <victim_file>]> ... <filename></screen>
332 <para>The above command will append mirror(s) indicated by
333 <literal>setstripe options</literal> or just take the layout from
334 existing file <replaceable>victim_file</replaceable> into the file
335 <replaceable>filename</replaceable>. The
336 <replaceable>filename</replaceable> must be an existing file, however,
337 it can be a mirrored or regular non-mirrored file. If it is a
338 non-mirrored file, the command will convert it to a mirrored file.
342 <colspec align="left" />
343 <colspec align="left" />
346 <entry>Option</entry>
347 <entry>Description</entry>
352 <entry>--mirror-count|-N[mirror_count]</entry>
354 <para>Indicates the number of mirrors to be added with the
355 following <literal>setstripe options</literal>. It can be
356 repeated multiple times to separate mirrors that have
357 different layouts.</para>
358 <para>The <replaceable>mirror_count</replaceable> argument is
359 optional and defaults to <literal>1</literal> if it is not
360 specified; if specified, it must follow the option without a
365 <entry>setstripe_options</entry>
367 <para>Specifies a specific layout for the mirror. It can be a
368 plain layout with specific striping pattern or a composite
369 layout, such as <xref linkend="pfl"/>. The options are the
370 same as those for the <literal>lfs setstripe</literal>
372 <para>If <replaceable>setstripe_options</replaceable> are not
373 specified, then the stripe options inherited from the previous
374 component will be used. If there is no previous component,
375 then the <literal>stripe_count</literal> and
376 <literal>stripe_size</literal> options inherited from
377 filesystem-wide default values will be used, and the OST
378 <literal>pool_name</literal> inherited from parent directory
383 <entry>-f <victim_file></entry>
385 <para>If <replaceable>victim_file</replaceable> exists, the
386 command will split the layout from that file and use it as a
387 mirror added to the mirrored file. After the command is
388 finished, the <replaceable>victim_file</replaceable> will be
390 <para><emphasis role="strong">Note</emphasis>: The
391 <replaceable>setstripe_options</replaceable> cannot be
392 specified with <literal>-f <victim_file></literal>
393 option in one command line.</para>
397 <entry>--no-verify</entry>
398 <entry>If <replaceable>victim_file</replaceable> is specified, the
399 command will verify that the file contents from
400 <replaceable>victim_file</replaceable> are the same as
401 <replaceable>filename</replaceable>. Otherwise, the command
402 will return a failure. However, the option
403 <literal>--no-verify</literal> can be used to override this
404 verification. This option can save significant time on file
405 comparison if the file size is large, but use it only when the
406 file contents are known to be the same.</entry>
411 <para><emphasis role="strong">Note</emphasis>: The
412 <literal>lfs mirror extend</literal> operation won't be applied to the
414 <para><emphasis role="strong">Examples:</emphasis></para>
415 <para>The following commands create a non-mirrored file, convert it to
416 a mirrored file, and extend it with a plain layout mirror:</para>
417 <screen># lfs setstripe -p flash /mnt/testfs/file1
418 # lfs getstripe /mnt/testfs/file1
421 lmm_stripe_size: 1048576
426 obdidx objid objid group
429 # lfs mirror extend -N -S 8M -c -1 -p archive /mnt/testfs/file1
430 # lfs getstripe /mnt/testfs/file1
438 lcme_extent.e_start: 0
439 lcme_extent.e_end: EOF
441 lmm_stripe_size: 1048576
447 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
452 lcme_extent.e_start: 0
453 lcme_extent.e_end: EOF
455 lmm_stripe_size: 8388608
461 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3:0x0] }
462 - 1: { l_ost_idx: 4, l_fid: [0x100040000:0x4:0x0] }
463 - 2: { l_ost_idx: 5, l_fid: [0x100050000:0x4:0x0] }
464 - 3: { l_ost_idx: 6, l_fid: [0x100060000:0x4:0x0] }
465 - 4: { l_ost_idx: 7, l_fid: [0x100070000:0x4:0x0] }
466 - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x3:0x0] }</screen>
467 <para>The following commands split the PFL layout from a
468 <replaceable>victim_file</replaceable> and use it as a mirror added to
469 the mirrored file <literal>/mnt/testfs/file1</literal> created in the
470 above example without data verification:</para>
471 <screen># lfs setstripe -E 16M -c 2 -p none \
472 -E eof -c -1 /mnt/testfs/victim_file
473 # lfs getstripe /mnt/testfs/victim_file
474 /mnt/testfs/victim_file
481 lcme_extent.e_start: 0
482 lcme_extent.e_end: 16777216
484 lmm_stripe_size: 1048576
489 - 0: { l_ost_idx: 5, l_fid: [0x100050000:0x5:0x0] }
490 - 1: { l_ost_idx: 6, l_fid: [0x100060000:0x5:0x0] }
495 lcme_extent.e_start: 16777216
496 lcme_extent.e_end: EOF
498 lmm_stripe_size: 1048576
501 lmm_stripe_offset: -1
503 # lfs mirror extend --no-verify -N -f /mnt/testfs/victim_file \
505 # lfs getstripe /mnt/testfs/file1
513 lcme_extent.e_start: 0
514 lcme_extent.e_end: EOF
516 lmm_stripe_size: 1048576
522 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
527 lcme_extent.e_start: 0
528 lcme_extent.e_end: EOF
530 lmm_stripe_size: 8388608
536 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3:0x0] }
537 - 1: { l_ost_idx: 4, l_fid: [0x100040000:0x4:0x0] }
538 - 2: { l_ost_idx: 5, l_fid: [0x100050000:0x4:0x0] }
539 - 3: { l_ost_idx: 6, l_fid: [0x100060000:0x4:0x0] }
540 - 4: { l_ost_idx: 7, l_fid: [0x100070000:0x4:0x0] }
541 - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x3:0x0] }
546 lcme_extent.e_start: 0
547 lcme_extent.e_end: 16777216
549 lmm_stripe_size: 1048576
554 - 0: { l_ost_idx: 5, l_fid: [0x100050000:0x5:0x0] }
555 - 1: { l_ost_idx: 6, l_fid: [0x100060000:0x5:0x0] }
560 lcme_extent.e_start: 16777216
561 lcme_extent.e_end: EOF
563 lmm_stripe_size: 1048576
566 lmm_stripe_offset: -1</screen>
567 <para>After extending, the <replaceable>victim_file</replaceable> was
569 <screen># ls /mnt/testfs/victim_file
570 ls: cannot access /mnt/testfs/victim_file: No such file or directory</screen>
572 <section xml:id="flr.operations.splitmirror">
573 <title>Splitting a Mirrored File</title>
574 <para><emphasis role="strong">Command:</emphasis></para>
575 <screen>lfs mirror split <--mirror-id <mirror_id>>
576 [--destroy|-d] [-f <new_file>] <mirrored_file></screen>
577 <para>The above command will split a specified mirror with ID
578 <replaceable><mirror_id></replaceable> out of an existing mirrored
580 <replaceable>mirrored_file</replaceable>. By default, a new file named
581 <literal><mirrored_file>.mirror~<mirror_id></literal> will
582 be created with the layout of the split mirror. If the
583 <literal>--destroy|-d</literal> option is specified, then the split
584 mirror will be destroyed. If the <literal>-f <new_file></literal>
585 option is specified, then a file named
586 <replaceable>new_file</replaceable> will be created with the layout of
587 the split mirror. If <replaceable>mirrored_file</replaceable> has only
588 one mirror existing after split, it will be converted to a regular
589 non-mirrored file. If the original
590 <replaceable>mirrored_file</replaceable> is not a mirrored file, then
591 the command will return an error.</para>
594 <colspec align="left" />
595 <colspec align="left" />
598 <entry>Option</entry>
599 <entry>Description</entry>
604 <entry>--mirror-id <mirror_id></entry>
605 <entry>The unique numerical identifier for a mirror. The mirror
606 ID is unique within a mirrored file and is automatically
607 assigned at file creation or extension time. It can be fetched
608 by the <literal>lfs getstripe</literal> command.
612 <entry>--destroy|-d</entry>
613 <entry>Indicates the split mirror will be destroyed.</entry>
616 <entry>-f <new_file></entry>
617 <entry>Indicates a file named <replaceable>new_file</replaceable>
618 will be created with the layout of the split mirror.</entry>
623 <para><emphasis role="strong">Examples:</emphasis></para>
624 <para>The following commands create a mirrored file with 4 mirrors, then
625 split 3 mirrors separately from the mirrored file.</para>
626 <para>Creating a mirrored file with 4 mirrors:</para>
627 <screen># lfs mirror create -N2 -E 4M -p flash -E eof -c -1 \
628 -N2 -S 8M -c 2 -p archive /mnt/testfs/file1
629 # lfs getstripe /mnt/testfs/file1
637 lcme_extent.e_start: 0
638 lcme_extent.e_end: 4194304
640 lmm_stripe_size: 1048576
646 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x4:0x0] }
651 lcme_extent.e_start: 4194304
652 lcme_extent.e_end: EOF
654 lmm_stripe_size: 1048576
657 lmm_stripe_offset: -1
663 lcme_extent.e_start: 0
664 lcme_extent.e_end: 4194304
666 lmm_stripe_size: 1048576
672 - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
677 lcme_extent.e_start: 4194304
678 lcme_extent.e_end: EOF
680 lmm_stripe_size: 1048576
683 lmm_stripe_offset: -1
689 lcme_extent.e_start: 0
690 lcme_extent.e_end: EOF
692 lmm_stripe_size: 8388608
698 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x5:0x0] }
699 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x6:0x0] }
704 lcme_extent.e_start: 0
705 lcme_extent.e_end: EOF
707 lmm_stripe_size: 8388608
713 - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x5:0x0] }
714 - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }</screen>
715 <para>Splitting the mirror with ID <literal>1</literal> from
716 <literal>/mnt/testfs/file1</literal> and creating
717 <literal>/mnt/testfs/file1.mirror~1</literal> with the layout of the
719 <screen># lfs mirror split --mirror-id 1 /mnt/testfs/file1
720 # lfs getstripe /mnt/testfs/file1.mirror~1
721 /mnt/testfs/file1.mirror~1
728 lcme_extent.e_start: 0
729 lcme_extent.e_end: 4194304
731 lmm_stripe_size: 1048576
737 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x4:0x0] }
742 lcme_extent.e_start: 4194304
743 lcme_extent.e_end: EOF
745 lmm_stripe_size: 1048576
748 lmm_stripe_offset: -1
749 lmm_pool: flash</screen>
750 <para>Splitting the mirror with ID <literal>2</literal> from
751 <literal>/mnt/testfs/file1</literal> and destroying it:</para>
752 <screen># lfs mirror split --mirror-id 2 -d /mnt/testfs/file1
753 # lfs getstripe /mnt/testfs/file1
761 lcme_extent.e_start: 0
762 lcme_extent.e_end: EOF
764 lmm_stripe_size: 8388608
770 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x5:0x0] }
771 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x6:0x0] }
776 lcme_extent.e_start: 0
777 lcme_extent.e_end: EOF
779 lmm_stripe_size: 8388608
785 - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x5:0x0] }
786 - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }</screen>
787 <para>Splitting the mirror with ID <literal>3</literal> from
788 <literal>/mnt/testfs/file1</literal> and creating
789 <literal>/mnt/testfs/file2</literal> with the layout of the split
791 <screen># lfs mirror split --mirror-id 3 -f /mnt/testfs/file2 \
793 # lfs getstripe /mnt/testfs/file2
801 lcme_extent.e_start: 0
802 lcme_extent.e_end: EOF
804 lmm_stripe_size: 8388608
810 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x5:0x0] }
811 - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x6:0x0] }
813 # lfs getstripe /mnt/testfs/file1
821 lcme_extent.e_start: 0
822 lcme_extent.e_end: EOF
824 lmm_stripe_size: 8388608
830 - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x5:0x0] }
831 - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }</screen>
832 <para>The above layout information showed that mirrors with ID
833 <literal>1, 2, and 3</literal> were all split from the mirrored file
834 <literal>/mnt/testfs/file1</literal>.</para>
836 <section xml:id="flr.operations.resyncmirror">
837 <title>Resynchronizing out-of-sync Mirrored File(s)</title>
838 <para><emphasis role="strong">Command:</emphasis></para>
839 <screen>lfs mirror resync [--only <mirror_id[,...]>]
840 <mirrored_file> [<mirrored_file2>...]</screen>
841 <para>The above command will resynchronize out-of-sync mirrored file(s)
842 specified by <replaceable>mirrored_file</replaceable>. It
843 supports specifying multiple mirrored files in one command line.</para>
844 <para>If there is no stale mirror for the specified mirrored file(s), then
845 the command does nothing. Otherwise, it will copy data from synced
846 mirror to the stale mirror(s), and mark all successfully copied
847 mirror(s) as SYNC. If the
848 <literal>--only <mirror_id[,...]></literal> option is specified,
849 then the command will only resynchronize the mirror(s) specified by the
850 <replaceable>mirror_id(s)</replaceable>. This option cannot be used when
851 multiple mirrored files are specified.</para>
854 <colspec align="left" />
855 <colspec align="left" />
858 <entry>Option</entry>
859 <entry>Description</entry>
864 <entry>--only <mirror_id[,...]></entry>
865 <entry>Indicates which mirror(s) specified by
866 <replaceable>mirror_id(s)</replaceable> needs to be
867 resynchronized. The <replaceable>mirror_id</replaceable> is the
868 unique numerical identifier for a mirror. Multiple
869 <replaceable>mirror_ids</replaceable> are separated by comma.
870 This option cannot be used when multiple mirrored files are
876 <para><emphasis role="strong">Note:</emphasis> With delayed write
877 implemented in FLR phase 1, after writing to a mirrored file, users
878 need to run <literal>lfs mirror resync</literal> command to get all
879 mirrors synchronized.</para>
880 <para><emphasis role="strong">Examples:</emphasis></para>
881 <para>The following commands create a mirrored file with 3 mirrors, then
882 write some data into the file and resynchronizes stale mirrors.</para>
883 <para>Creating a mirrored file with 3 mirrors:</para>
884 <screen># lfs mirror create -N -E 4M -p flash -E eof \
885 -N2 -p archive /mnt/testfs/file1
886 # lfs getstripe /mnt/testfs/file1
894 lcme_extent.e_start: 0
895 lcme_extent.e_end: 4194304
897 lmm_stripe_size: 1048576
903 - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x5:0x0] }
908 lcme_extent.e_start: 4194304
909 lcme_extent.e_end: EOF
911 lmm_stripe_size: 1048576
914 lmm_stripe_offset: -1
920 lcme_extent.e_start: 0
921 lcme_extent.e_end: EOF
923 lmm_stripe_size: 1048576
929 - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x4:0x0] }
934 lcme_extent.e_start: 0
935 lcme_extent.e_end: EOF
937 lmm_stripe_size: 1048576
943 - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x6:0x0] }</screen>
944 <para>Writing some data into the mirrored file
945 <literal>/mnt/testfs/file1</literal>:</para>
946 <screen># yes | dd of=/mnt/testfs/file1 bs=1M count=2
949 2097152 bytes (2.1 MB) copied, 0.0320613 s, 65.4 MB/s
951 # lfs getstripe /mnt/testfs/file1
959 lcme_extent.e_start: 0
960 lcme_extent.e_end: 4194304
966 lcme_extent.e_start: 4194304
967 lcme_extent.e_end: EOF
972 lcme_flags: init,stale
973 lcme_extent.e_start: 0
974 lcme_extent.e_end: EOF
979 lcme_flags: init,stale
980 lcme_extent.e_start: 0
981 lcme_extent.e_end: EOF
984 <para>The above layout information showed that data were written into the
985 first component of mirror with ID <literal>1</literal>, and mirrors with
986 ID <literal>2</literal> and <literal>3</literal> were marked with
988 <para>Resynchronizing the stale mirror with ID <literal>2</literal> for
989 the mirrored file <literal>/mnt/testfs/file1</literal>:</para>
990 <screen># lfs mirror resync --only 2 /mnt/testfs/file1
991 # lfs getstripe /mnt/testfs/file1
999 lcme_extent.e_start: 0
1000 lcme_extent.e_end: 4194304
1006 lcme_extent.e_start: 4194304
1007 lcme_extent.e_end: EOF
1013 lcme_extent.e_start: 0
1014 lcme_extent.e_end: EOF
1019 lcme_flags: init,stale
1020 lcme_extent.e_start: 0
1021 lcme_extent.e_end: EOF
1024 <para>The above layout information showed that after resynchronizing, the
1025 “stale” flag was removed from mirror with ID <literal>2</literal>.</para>
1026 <para>Resynchronizing all of the stale mirrors for the mirrored file
1027 <literal>/mnt/testfs/file1</literal>:</para>
1028 <screen># lfs mirror resync /mnt/testfs/file1
1029 # lfs getstripe /mnt/testfs/file1
1037 lcme_extent.e_start: 0
1038 lcme_extent.e_end: 4194304
1044 lcme_extent.e_start: 4194304
1045 lcme_extent.e_end: EOF
1051 lcme_extent.e_start: 0
1052 lcme_extent.e_end: EOF
1058 lcme_extent.e_start: 0
1059 lcme_extent.e_end: EOF
1062 <para>The above layout information showed that after resynchronizing, none
1063 of the mirrors were marked as stale.</para>
1065 <section xml:id="flr.operations.verifymirror">
1066 <title>Verifying Mirrored File(s)</title>
1067 <para><emphasis role="strong">Command:</emphasis></para>
1068 <screen>lfs mirror verify [--only <mirror_id,mirror_id2[,...]>]
1069 [--verbose|-v] <mirrored_file> [<mirrored_file2> ...]</screen>
1070 <para>The above command will verify that each SYNC mirror (contains
1071 up-to-date data) of a mirrored file, specified by
1072 <replaceable>mirrored_file</replaceable>, has exactly the same data. It
1073 supports specifying multiple mirrored files in one command line.</para>
1074 <para>This is a scrub tool that should be run on regular basis to make
1075 sure that mirrored files are not corrupted. The command won't repair the
1076 file if it turns out to be corrupted. Usually, an administrator should
1077 check the file content from each mirror and decide which one is correct
1078 and then invoke <literal>lfs mirror resync</literal> to repair it
1082 <colspec align="left" />
1083 <colspec align="left" />
1086 <entry>Option</entry>
1087 <entry>Description</entry>
1092 <entry>--only <mirror_id,mirror_id2[,...]></entry>
1093 <entry><para>Indicates which mirrors specified by
1094 <replaceable>mirror_ids</replaceable> need to be verified. The
1095 <replaceable>mirror_id</replaceable> is the unique numerical
1096 identifier for a mirror. Multiple
1097 <replaceable>mirror_ids</replaceable> are separated by comma.
1099 <para>Note: At least two <replaceable>mirror_ids</replaceable>
1100 are required. This option cannot be used when multiple
1101 mirrored files are specified.</para>
1105 <entry>--verbose|-v</entry>
1106 <entry>Indicates the command will print where the differences are
1107 if the data do not match. Otherwise, the command will just
1108 return an error in that case. This option can be repeated for
1109 multiple times to print more information.</entry>
1114 <para><emphasis role="strong">Note:</emphasis></para>
1115 <para>Mirror components that have “stale” or “offline” flags will be
1116 skipped and not verified.</para>
1117 <para><emphasis role="strong">Examples:</emphasis></para>
1118 <para>The following command verifies that each mirror of a mirrored file
1119 contains exactly the same data:</para>
1120 <screen># lfs mirror verify /mnt/testfs/file1</screen>
1121 <para>The following command has the <literal>-v</literal> option specified
1122 to print where the differences are if the data does not match:</para>
1123 <screen># lfs mirror verify -vvv /mnt/testfs/file2
1124 Chunks to be verified in /mnt/testfs/file2:
1125 [0, 0x200000) [1, 2, 3, 4] 4
1126 [0x200000, 0x400000) [1, 2, 3, 4] 4
1127 [0x400000, 0x600000) [1, 2, 3, 4] 4
1128 [0x600000, 0x800000) [1, 2, 3, 4] 4
1129 [0x800000, 0xa00000) [1, 2, 3, 4] 4
1130 [0xa00000, 0x1000000) [1, 2, 3, 4] 4
1131 [0x1000000, 0xffffffffffffffff) [1, 2, 3, 4] 4
1133 Verifying chunk [0, 0x200000) on mirror: 1 2 3 4
1134 CRC-32 checksum value for chunk [0, 0x200000):
1135 Mirror 1: 0x207b02f1
1136 Mirror 2: 0x207b02f1
1137 Mirror 3: 0x207b02f1
1138 Mirror 4: 0x207b02f1
1140 Verifying chunk [0, 0x200000) on mirror: 1 2 3 4 PASS
1142 Verifying chunk [0x200000, 0x400000) on mirror: 1 2 3 4
1143 CRC-32 checksum value for chunk [0x200000, 0x400000):
1144 Mirror 1: 0x207b02f1
1145 Mirror 2: 0x207b02f1
1146 Mirror 3: 0x207b02f1
1147 Mirror 4: 0x207b02f1
1149 Verifying chunk [0x200000, 0x400000) on mirror: 1 2 3 4 PASS
1151 Verifying chunk [0x400000, 0x600000) on mirror: 1 2 3 4
1152 CRC-32 checksum value for chunk [0x400000, 0x600000):
1153 Mirror 1: 0x42571b66
1154 Mirror 2: 0x42571b66
1155 Mirror 3: 0x42571b66
1158 lfs mirror verify: chunk [0x400000, 0x600000) has different
1159 checksum value on mirror 1 and mirror 4.
1160 Verifying chunk [0x600000, 0x800000) on mirror: 1 2 3 4
1161 CRC-32 checksum value for chunk [0x600000, 0x800000):
1162 Mirror 1: 0x1f8ad0d8
1163 Mirror 2: 0x1f8ad0d8
1164 Mirror 3: 0x1f8ad0d8
1165 Mirror 4: 0x18975bf9
1167 lfs mirror verify: chunk [0x600000, 0x800000) has different
1168 checksum value on mirror 1 and mirror 4.
1169 Verifying chunk [0x800000, 0xa00000) on mirror: 1 2 3 4
1170 CRC-32 checksum value for chunk [0x800000, 0xa00000):
1171 Mirror 1: 0x69c17478
1172 Mirror 2: 0x69c17478
1173 Mirror 3: 0x69c17478
1174 Mirror 4: 0x69c17478
1176 Verifying chunk [0x800000, 0xa00000) on mirror: 1 2 3 4 PASS
1178 lfs mirror verify: '/mnt/testfs/file2' chunk [0xa00000, 0x1000000]
1179 exceeds file size 0xa00000: skipped</screen>
1180 <para>The following command uses the <literal>--only</literal> option to
1181 only verify the specified mirrors:</para>
1182 <screen># lfs mirror verify -v --only 1,4 /mnt/testfs/file2
1183 CRC-32 checksum value for chunk [0, 0x200000):
1184 Mirror 1: 0x207b02f1
1185 Mirror 4: 0x207b02f1
1187 CRC-32 checksum value for chunk [0x200000, 0x400000):
1188 Mirror 1: 0x207b02f1
1189 Mirror 4: 0x207b02f1
1191 CRC-32 checksum value for chunk [0x400000, 0x600000):
1192 Mirror 1: 0x42571b66
1195 lfs mirror verify: chunk [0x400000, 0x600000) has different
1196 checksum value on mirror 1 and mirror 4.
1197 CRC-32 checksum value for chunk [0x600000, 0x800000):
1198 Mirror 1: 0x1f8ad0d8
1199 Mirror 4: 0x18975bf9
1201 lfs mirror verify: chunk [0x600000, 0x800000) has different
1202 checksum value on mirror 1 and mirror 4.
1203 CRC-32 checksum value for chunk [0x800000, 0xa00000):
1204 Mirror 1: 0x69c17478
1205 Mirror 4: 0x69c17478
1207 lfs mirror verify: '/mnt/testfs/file2' chunk [0xa00000, 0x1000000]
1208 exceeds file size 0xa00000: skipped</screen>
1210 <section xml:id="flr.operations.findingmirror">
1211 <title>Finding Mirrored File(s)</title>
1212 <para>The <literal>lfs find</literal> command is used to list files and
1213 directories with specific attributes. The following two attribute
1214 parameters are specific to a mirrored file or directory:</para>
1215 <screen>lfs find <directory|filename ...>
1216 [[!] --mirror-count|-N [+-]n]
1217 [[!] --mirror-state <[^]state>]</screen>
1220 <colspec align="left" />
1221 <colspec align="left" />
1224 <entry>Option</entry>
1225 <entry>Description</entry>
1230 <entry>--mirror-count|-N [+-]n</entry>
1231 <entry>Indicates mirror count.</entry>
1234 <entry>--mirror-state <[^]state></entry>
1236 <para>Indicates mirrored file state.</para>
1237 <para>If <replaceable>^state</replaceable> is used, print only
1238 files not matching <replaceable>state</replaceable>. Only one
1239 state can be specified.</para>
1240 <para>Valid state names are:</para>
1241 <para><literal>ro</literal> – indicates the mirrored file is in
1242 read-only state. All of the mirrors contain the up-to-date
1244 <para><literal>wp</literal> – indicates the mirrored file is in
1245 a state of being written.</para>
1246 <para><literal>sp</literal> – indicates the mirrored file is in
1247 a state of being resynchronized.</para>
1253 <para><emphasis role="strong">Note:</emphasis></para>
1254 <para>Specifying <literal>!</literal> before an option negates its meaning
1255 (files NOT matching the parameter). Using <literal>+</literal> before a
1256 numeric value means 'more than n', while <literal>-</literal> before a
1257 numeric value means 'less than n'. If neither is used, it means
1258 'equal to n', within the bounds of the unit specified (if any).</para>
1259 <para><emphasis role="strong">Examples:</emphasis></para>
1260 <para>The following command recursively lists all mirrored files that have
1261 more than 2 mirrors under directory <literal>/mnt/testfs</literal>:
1263 <screen># lfs find --mirror-count +2 --type f /mnt/testfs</screen>
1264 <para>The following command recursively lists all out-of-sync mirrored
1265 files under directory <literal>/mnt/testfs</literal>:</para>
1266 <screen># lfs find --mirror-state=^ro --type f /mnt/testfs</screen>
1269 <section xml:id="flr.interop">
1270 <title>Interoperability</title>
1271 <para>Introduced in Lustre release 2.11.0, the FLR feature is based on the
1272 <xref linkend="pfl"/> feature introduced in Lustre 2.10.0</para>
1273 <para>For Lustre release 2.9 and older clients, which do not understand the
1274 PFL layout, they cannot access and open mirrored files created in the
1275 Lustre 2.11 filesystem.</para>
1276 <para>The following example shows the errors returned by accessing and
1277 opening a mirrored file (created in Lustre 2.11 filesystem) on a Lustre
1279 <screen># ls /mnt/testfs/mirrored_file
1280 ls: cannot access /mnt/testfs/mirrored_file: Invalid argument
1282 # cat /mnt/testfs/mirrored_file
1283 cat: /mnt/testfs/mirrored_file: Operation not supported</screen>
1284 <para>For Lustre release 2.10 clients, which understand the PFL layout, but
1285 do not understand a mirrored layout, they can access mirrored files
1286 created in Lustre 2.11 filesystem, however, they cannot open them. This is
1287 because the Lustre 2.10 clients do not verify overlapping components so
1288 they would read and write mirrored files just as if they were normal PFL
1289 files, which will cause a problem where synced mirrors actually contain
1290 different data.</para>
1291 <para>The following example shows the results returned by accessing and
1292 opening a mirrored file (created in Lustre 2.11 filesystem) on a Lustre
1294 <screen># ls /mnt/testfs/mirrored_file
1295 /mnt/testfs/mirrored_file
1297 # cat /mnt/testfs/mirrored_file
1298 cat: /mnt/testfs/mirrored_file: Operation not supported</screen>
1302 vim:expandtab:shiftwidth=2:tabstop=8: