1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="troubleshootingrecovery">
5 <title xml:id="troubleshootingrecovery.title">Troubleshooting
7 <para>This chapter describes what to do if something goes wrong during
8 recovery. It describes:</para>
12 <xref linkend="dbdoclet.50438225_71141" />
17 <xref linkend="dbdoclet.50438225_37365" />
22 <xref linkend="dbdoclet.50438225_12316" />
27 <xref linkend="dbdoclet.lfsckadmin" />
31 <section xml:id="dbdoclet.50438225_71141">
34 <primary>recovery</primary>
35 <secondary>corruption of backing ldiskfs file system</secondary>
36 </indexterm>Recovering from Errors or Corruption on a Backing ldiskfs File
38 <para>When an OSS, MDS, or MGS server crash occurs, it is not necessary to
39 run e2fsck on the file system.
40 <literal>ldiskfs</literal> journaling ensures that the file system remains
41 consistent over a system crash. The backing file systems are never accessed
42 directly from the client, so client crashes are not relevant for server
43 file system consistency.</para>
44 <para>The only time it is REQUIRED that
45 <literal>e2fsck</literal> be run on a device is when an event causes
46 problems that ldiskfs journaling is unable to handle, such as a hardware
47 device failure or I/O error. If the ldiskfs kernel code detects corruption
48 on the disk, it mounts the file system as read-only to prevent further
49 corruption, but still allows read access to the device. This appears as
51 <literal>EROFS</literal>) in the syslogs on the server, e.g.:</para>
52 <screen>Dec 29 14:11:32 mookie kernel: LDISKFS-fs error (device sdz):
53 ldiskfs_lookup: unlinked inode 5384166 in dir #145170469
54 Dec 29 14:11:32 mookie kernel: Remounting filesystem read-only </screen>
55 <para>In such a situation, it is normally required that e2fsck only be run
56 on the bad device before placing the device back into service.</para>
57 <para>In the vast majority of cases, the Lustre software can cope with any
58 inconsistencies found on the disk and between other devices in the file
60 <para>For problem analysis, it is strongly recommended that
61 <literal>e2fsck</literal> be run under a logger, like
62 <literal>script</literal>, to record all
63 of the output and changes that are made to the file system in case this
64 information is needed later.</para>
65 <para>If time permits, it is also a good idea to first run
66 <literal>e2fsck</literal> in non-fixing mode (-n option) to assess the type
67 and extent of damage to the file system. The drawback is that in this mode,
68 <literal>e2fsck</literal> does not recover the file system journal, so there
69 may appear to be file system corruption when none really exists.</para>
70 <para>To address concern about whether corruption is real or only due to
71 the journal not being replayed, you can briefly mount and unmount the
72 <literal>ldiskfs</literal> file system directly on the node with the Lustre
73 file system stopped, using a command similar to:</para>
74 <screen>mount -t ldiskfs /dev/{ostdev} /mnt/ost; umount /mnt/ost</screen>
75 <para>This causes the journal to be recovered.</para>
77 <literal>e2fsck</literal> utility works well when fixing file system
78 corruption (better than similar file system recovery tools and a primary
80 <literal>ldiskfs</literal> was chosen over other file systems). However, it
81 is often useful to identify the type of damage that has occurred so an
82 <literal>ldiskfs</literal> expert can make intelligent decisions about what
83 needs fixing, in place of
84 <literal>e2fsck</literal>.</para>
85 <screen>root# {stop lustre services for this device, if running}
86 root# script /tmp/e2fsck.sda
87 Script started, file is /tmp/e2fsck.sda
88 root# mount -t ldiskfs /dev/sda /mnt/ost
90 root# e2fsck -fn /dev/sda # don't fix file system, just check for corruption
94 root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually <literal>yes</literal>)</screen>
96 <section xml:id="dbdoclet.50438225_37365">
99 <primary>recovery</primary>
100 <secondary>corruption of Lustre file system</secondary>
101 </indexterm>Recovering from Corruption in the Lustre File System</title>
102 <para>In cases where an ldiskfs MDT or OST becomes corrupt, you need to run
103 <literal>e2fsck</literal> to ensure local filesystem consistency, then use
104 <literal>LFSCK</literal> to run a distributed check on the file system to
105 resolve any inconsistencies between the MDTs and OSTs, or among MDTs.</para>
108 <para>Stop the Lustre file system.</para>
112 <literal>e2fsck -f</literal> on the individual MDT/OST that had
113 problems to fix any local file system damage.</para>
114 <para>We recommend running
115 <literal>e2fsck</literal> under script, to create a log of changes made
116 to the file system in case it is needed later. After
117 <literal>e2fsck</literal> is run, bring up the file system, if
118 necessary, to reduce the outage window.</para>
121 <section xml:id="dbdoclet.50438225_13916">
124 <primary>recovery</primary>
125 <secondary>orphaned objects</secondary>
126 </indexterm>Working with Orphaned Objects</title>
127 <para>The simplest problem to resolve is that of orphaned objects. When
128 the LFSCK layout check is run, these objects are linked to new files and
130 <literal>.lustre/lost+found/MDT<replaceable>xxxx</replaceable></literal>
131 in the Lustre file system
132 (where MDTxxxx is the index of the MDT on which the orphan was found),
133 where they can be examined and saved or deleted as necessary.</para>
134 <para condition='l27'>With Lustre version 2.7 and later, LFSCK will
135 identify and process orphan objects found on MDTs as well.</para>
138 <section xml:id="dbdoclet.50438225_12316">
141 <primary>recovery</primary>
142 <secondary>unavailable OST</secondary>
143 </indexterm>Recovering from an Unavailable OST</title>
144 <para>One problem encountered in a Lustre file system environment is when
145 an OST becomes unavailable due to a network partition, OSS node crash, etc.
146 When this happens, the OST's clients pause and wait for the OST to become
147 available again, either on the primary OSS or a failover OSS. When the OST
148 comes back online, the Lustre file system starts a recovery process to
149 enable clients to reconnect to the OST. Lustre servers put a limit on the
150 time they will wait in recovery for clients to reconnect.</para>
151 <para>During recovery, clients reconnect and replay their requests
152 serially, in the same order they were done originally. Until a client
153 receives a confirmation that a given transaction has been written to stable
154 storage, the client holds on to the transaction, in case it needs to be
155 replayed. Periodically, a progress message prints to the log, stating
156 how_many/expected clients have reconnected. If the recovery is aborted,
157 this log shows how many clients managed to reconnect. When all clients have
158 completed recovery, or if the recovery timeout is reached, the recovery
159 period ends and the OST resumes normal request processing.</para>
160 <para>If some clients fail to replay their requests during the recovery
161 period, this will not stop the recovery from completing. You may have a
162 situation where the OST recovers, but some clients are not able to
163 participate in recovery (e.g. network problems or client failure), so they
164 are evicted and their requests are not replayed. This would result in any
165 operations on the evicted clients failing, including in-progress writes,
166 which would cause cached writes to be lost. This is a normal outcome; the
167 recovery cannot wait indefinitely, or the file system would be hung any
168 time a client failed. The lost transactions are an unfortunate result of
169 the recovery process.</para>
171 <para>The failure of client recovery does not indicate or lead to
172 filesystem corruption. This is a normal event that is handled by the MDT
173 and OST, and should not result in any inconsistencies between
177 <para>The version-based recovery (VBR) feature enables a failed client to
178 be ''skipped'', so remaining clients can replay their requests, resulting
179 in a more successful recovery from a downed OST. For more information
180 about the VBR feature, see
181 <xref linkend="lustrerecovery" />(Version-based Recovery).</para>
184 <section xml:id="dbdoclet.lfsckadmin">
187 <primary>recovery</primary>
188 <secondary>oiscrub</secondary>
191 <primary>recovery</primary>
192 <secondary>LFSCK</secondary>
193 </indexterm>Checking the file system with LFSCK</title>
194 <para>LFSCK is an administrative tool for checking and repair of the
195 attributes specific to a mounted Lustre file system. It is similar
196 in concept to an offline fsck repair tool for a local filesystem,
197 but LFSCK is implemented to run as part of the Lustre file system
198 while the file system is mounted and in use. This allows consistency
199 checking and repair of Lustre-specific metadata without unnecessary
200 downtime, and can be run on the largest Lustre file systems with
201 minimal impact to normal operations.</para>
202 <para>LFSCK can verify
203 and repair the Object Index (OI) table that is used internally to map
204 Lustre File Identifiers (FIDs) to MDT internal ldiskfs inode numbers, in
205 an internal table called the OI Table. An OI Scrub traverses the OI table
206 and makes corrections where necessary. An OI Scrub is required after
207 restoring from a file-level MDT backup (
208 <xref linkend="dbdoclet.backup_device" />), or in case the OI Table is
209 otherwise corrupted. Later phases of LFSCK will add further checks to the
210 Lustre distributed file system state.</para>
211 <para condition='l24'>In Lustre software release 2.4, LFSCK namespace
212 scanning can verify and repair the directory FID-in-dirent and LinkEA
214 <para condition='l26'>In Lustre software release 2.6, LFSCK layout scanning
215 can verify and repair MDT-OST file layout inconsistencies. File layout
216 inconsistencies between MDT-objects and OST-objects that are checked and
217 corrected include dangling reference, unreferenced OST-objects, mismatched
218 references and multiple references.</para>
219 <para condition='l27'>In Lustre software release 2.7, LFSCK layout scanning
220 is enhanced to support verify and repair inconsistencies between multiple
222 <para>Control and monitoring of LFSCK is through LFSCK and the
223 <literal>/proc</literal> file system interfaces. LFSCK supports three types
224 of interface: switch interface, status interface, and adjustment interface.
225 These interfaces are detailed below.</para>
227 <title>LFSCK switch interface</title>
229 <title>Manually Starting LFSCK</title>
231 <title>Description</title>
232 <para>LFSCK can be started after the MDT is mounted using the
233 <literal>lctl lfsck_start</literal> command.</para>
237 <screen>lctl lfsck_start <-M | --device <replaceable>[MDT,OST]_device</replaceable>> \
239 [-c | --create_ostobj <replaceable>on | off</replaceable>] \
240 [-C | --create_mdtobj <replaceable>on | off</replaceable>] \
241 [-d | --delay_create_ostobj <replaceable>on | off</replaceable>] \
242 [-e | --error <replaceable>{continue | abort}</replaceable>] \
244 [-n | --dryrun <replaceable>on | off</replaceable>] \
247 [-s | --speed <replaceable>ops_per_sec_limit</replaceable>] \
248 [-t | --type <replaceable>check_type[,check_type...]</replaceable>] \
249 [-w | --window_size <replaceable>size</replaceable>]</screen>
252 <title>Options</title>
254 <literal>lfsck_start</literal> options are listed and described below.
255 For a complete list of available options, type
256 <literal>lctl lfsck_start -h</literal>.</para>
257 <informaltable frame="all">
259 <colspec colname="c1" colwidth="3*" />
260 <colspec colname="c2" colwidth="7*" />
265 <emphasis role="bold">Option</emphasis>
270 <emphasis role="bold">Description</emphasis>
279 <literal>-M | --device</literal>
283 <para>The MDT or OST target to start LFSCK on.</para>
289 <literal>-A | --all</literal>
293 <para condition='l26'>Start LFSCK on all
294 targets on all servers simultaneously.
295 By default, both layout and namespace
296 consistency checking and repair are started.</para>
302 <literal>-c | --create_ostobj</literal>
306 <para condition='l26'>Create the lost OST-object for
308 <literal>off</literal>(default) or
309 <literal>on</literal>. If not specified, then the default
310 behaviour is to keep the dangling LOV EA there without
311 creating the lost OST-object.</para>
317 <literal>-C | --create_mdtobj</literal>
321 <para condition='l27'>Create the lost MDT-object for
323 <literal>off</literal>(default) or
324 <literal>on</literal>. If not specified, then the default
325 behaviour is to keep the dangling name entry there without
326 creating the lost MDT-object.</para>
332 <literal>-d | --delay_create_ostobj</literal>
336 <para condition='l29'>
337 Delay creating the lost OST-object for dangling LOV EA
338 until the orphan OST-objects are handled.
339 <literal>off</literal>(default) or
340 <literal>on</literal>.
347 <literal>-e | --error</literal>
352 <literal>continue</literal>(default) or
353 <literal>abort</literal>. Specify whether the LFSCK will
354 stop or not if fails to repair something. If it is not
355 specified, the saved value (when resuming from checkpoint)
356 will be used if present. This option cannot be changed
357 while LFSCK is running.</para>
363 <literal>-h | --help</literal>
367 <para>Operating help information.</para>
373 <literal>-n | --dryrun</literal>
377 <para>Perform a trial without making any changes.
378 <literal>off</literal>(default) or
379 <literal>on</literal>.</para>
385 <literal>-o | --orphan</literal>
389 <para condition='l26'>Repair orphan OST-objects for layout
396 <literal>-r | --reset</literal>
400 <para>Reset the start position for the object iteration to
401 the beginning for the specified MDT. By default the
402 iterator will resume scanning from the last checkpoint
403 (saved periodically by LFSCK) provided it is
410 <literal>-s | --speed</literal>
414 <para>Set the upper speed limit of LFSCK processing in
415 objects per second. If it is not specified, the saved value
416 (when resuming from checkpoint) or default value of 0 (0 =
417 run as fast as possible) is used. Speed can be adjusted
418 while LFSCK is running with the adjustment
425 <literal>-t | --type</literal>
429 <para>The type of checking/repairing that should be
430 performed. The new LFSCK framework provides a single
431 interface for a variety of system consistency
432 checking/repairing operations including:</para>
433 <para>Without a specified option, the LFSCK component(s)
434 which ran last time and did not finish or the component(s)
435 corresponding to some known system inconsistency, will be
436 started. Anytime the LFSCK is triggered, the OI scrub will
437 run automatically, so there is no need to specify
438 OI_scrub in that case.</para>
439 <para condition='l24'>
440 <literal>namespace</literal>: check and repair
441 FID-in-dirent and LinkEA consistency.</para>
442 <para condition='l27'> Lustre-2.7 enhances
443 namespace consistency verification under DNE mode.</para>
444 <para condition='l26'>
445 <literal>layout</literal>: check and repair MDT-OST
446 inconsistency.</para>
452 <literal>-w | --window_size</literal>
456 <para condition='l26'>The window size for the async request
457 pipeline. The LFSCK async request pipeline's input/output
458 may have quite different processing speeds, and there may
459 be too many requests in the pipeline as to cause abnormal
460 memory/network pressure. If not specified, then the default
461 window size for the async request pipeline is 1024.</para>
470 <title>Manually Stopping LFSCK</title>
472 <title>Description</title>
473 <para>To stop LFSCK when the MDT is mounted, use the
474 <literal>lctl lfsck_stop</literal> command.</para>
478 <screen>lctl lfsck_stop <-M | --device <replaceable>[MDT,OST]_device</replaceable>> \
480 [-h | --help]</screen>
483 <title>Options</title>
485 <literal>lfsck_stop</literal> options are listed and described below.
486 For a complete list of available options, type
487 <literal>lctl lfsck_stop -h</literal>.</para>
488 <informaltable frame="all">
490 <colspec colname="c1" colwidth="3*" />
491 <colspec colname="c2" colwidth="7*" />
496 <emphasis role="bold">Option</emphasis>
501 <emphasis role="bold">Description</emphasis>
510 <literal>-M | --device</literal>
514 <para>The MDT or OST target to stop LFSCK on.</para>
520 <literal>-A | --all</literal>
524 <para>Stop LFSCK on all targets on all servers
525 simultaneously.</para>
531 <literal>-h | --help</literal>
535 <para>Operating help information.</para>
545 <title>Check the LFSCK global status</title>
547 <title>Description</title>
548 <para>Check the LFSCK global status via a single
549 <literal>lctl lfsck_query</literal> command on the MDS.</para>
553 <screen>lctl lfsck_query <-M | --device <replaceable>MDT_device</replaceable>> \
555 [-t | --type <replaceable>lfsck_type[,lfsck_type...]</replaceable>] \
556 [-w | --wait]</screen>
559 <title>Options</title>
561 <literal>lfsck_query</literal> options are listed and described below.
562 For a complete list of available options, type
563 <literal>lctl lfsck_query -h</literal>.</para>
564 <informaltable frame="all">
566 <colspec colname="c1" colwidth="3*" />
567 <colspec colname="c2" colwidth="7*" />
572 <emphasis role="bold">Option</emphasis>
577 <emphasis role="bold">Description</emphasis>
586 <literal>-M | --device</literal>
590 <para>The device to query for LFSCK status.</para>
596 <literal>-h | --help</literal>
600 <para>Operating help information.</para>
606 <literal>-t | --type</literal>
610 <para>The LFSCK type(s) that should be queried,
611 including: layout, namespace.</para>
617 <literal>-w | --wait</literal>
621 <para>will wait if the LFSCK is in scanning.</para>
630 <title>LFSCK status interface</title>
632 <title>LFSCK status of OI Scrub via
633 <literal>procfs</literal></title>
635 <title>Description</title>
636 <para>For each LFSCK component there is a dedicated procfs interface
637 to trace the corresponding LFSCK component status. For OI Scrub, the
638 interface is the OSD layer procfs interface, named
639 <literal>oi_scrub</literal>. To display OI Scrub status, the standard
640 <literal>lctl get_param</literal> command is used as shown in the
645 <screen>lctl get_param -n osd-ldiskfs.<replaceable>FSNAME</replaceable>-[<replaceable>MDT_target|OST_target</replaceable>].oi_scrub</screen>
648 <title>Output</title>
649 <informaltable frame="all">
651 <colspec colname="c1" colwidth="3*" />
652 <colspec colname="c2" colwidth="7*" />
657 <emphasis role="bold">Information</emphasis>
662 <emphasis role="bold">Detail</emphasis>
670 <para>General Information</para>
675 <para>Name: OI_scrub.</para>
678 <para>OI scrub magic id (an identifier unique to OI
682 <para>OI files count.</para>
685 <para>Status: one of the status -
686 <literal>init</literal>,
687 <literal>scanning</literal>,
688 <literal>completed</literal>,
689 <literal>failed</literal>,
690 <literal>stopped</literal>,
691 <literal>paused</literal>, or
692 <literal>crashed</literal>.</para>
695 <para>Flags: including -
696 <literal>recreated</literal>(OI file(s) is/are
698 <literal>inconsistent</literal>(restored from
700 <literal>auto</literal>(triggered by non-UI mechanism),
702 <literal>upgrade</literal>(from Lustre software release
703 1.8 IGIF format.)</para>
706 <para>Parameters: OI scrub parameters, like
707 <literal>failout</literal>.</para>
710 <para>Time Since Last Completed.</para>
713 <para>Time Since Latest Start.</para>
716 <para>Time Since Last Checkpoint.</para>
719 <para>Latest Start Position: the position for the
720 latest scrub started from.</para>
723 <para>Last Checkpoint Position.</para>
726 <para>First Failure Position: the position for the
727 first object to be repaired.</para>
730 <para>Current Position.</para>
737 <para>Statistics</para>
743 <literal>Checked</literal> total number of objects
748 <literal>Updated</literal> total number of objects
753 <literal>Failed</literal> total number of objects that
754 failed to be repaired.</para>
758 <literal>No Scrub</literal> total number of objects
760 <literal>LDISKFS_STATE_LUSTRE_NOSCRUB and
761 skipped</literal>.</para>
765 <literal>IGIF</literal> total number of objects IGIF
770 <literal>Prior Updated</literal> how many objects have
771 been repaired which are triggered by parallel
776 <literal>Success Count</literal> total number of
777 completed OI_scrub runs on the target.</para>
781 <literal>Run Time</literal> how long the scrub has run,
782 tally from the time of scanning from the beginning of
783 the specified MDT target, not include the
784 paused/failure time among checkpoints.</para>
788 <literal>Average Speed</literal> calculated by dividing
789 <literal>Checked</literal> by
790 <literal>run_time</literal>.</para>
794 <literal>Real-Time Speed</literal> the speed since last
795 checkpoint if the OI_scrub is running.</para>
799 <literal>Scanned</literal> total number of objects under
800 /lost+found that have been scanned.</para>
804 <literal>Repaired</literal> total number of objects
805 under /lost+found that have been recovered.</para>
809 <literal>Failed</literal> total number of objects under
810 /lost+found failed to be scanned or failed to be
821 <section condition='l24'>
822 <title>LFSCK status of namespace via
823 <literal>procfs</literal></title>
825 <title>Description</title>
827 <literal>namespace</literal> component is responsible for checks
828 described in <xref linkend="dbdoclet.lfsckadmin" />. The
829 <literal>procfs</literal> interface for this component is in the
831 <literal>lfsck_namespace</literal>. To show the status of this
833 <literal>lctl get_param</literal> should be used as described in the
835 <para>The LFSCK namespace status output refers to phase 1 and phase 2.
836 Phase 1 is when the LFSCK main engine, which runs on each MDT,
837 linearly scans its local device, guaranteeing that all local objects
838 are checked. However, there are certain cases in which LFSCK cannot
839 know whether an object is consistent or cannot repair an inconsistency
840 until the phase 1 scanning is completed. During phase 2 of the
841 namespace check, objects with multiple hard-links, objects with remote
842 parents, and other objects which couldn't be verified during phase 1
843 will be checked.</para>
847 <screen>lctl get_param -n mdd. <replaceable>FSNAME</replaceable>-<replaceable>MDT_target</replaceable>.lfsck_namespace</screen>
850 <title>Output</title>
851 <informaltable frame="all">
853 <colspec colname="c1" colwidth="3*" />
854 <colspec colname="c2" colwidth="7*" />
859 <emphasis role="bold">Information</emphasis>
864 <emphasis role="bold">Detail</emphasis>
872 <para>General Information</para>
878 <literal>lfsck_namespace</literal></para>
881 <para>LFSCK namespace magic.</para>
884 <para>LFSCK namespace version..</para>
887 <para>Status: one of the status -
888 <literal>init</literal>,
889 <literal>scanning-phase1</literal>,
890 <literal>scanning-phase2</literal>,
891 <literal>completed</literal>,
892 <literal>failed</literal>,
893 <literal>stopped</literal>,
894 <literal>paused</literal>,
895 <literal>partial</literal>,
896 <literal>co-failed</literal>,
897 <literal>co-stopped</literal> or
898 <literal>co-paused</literal>.</para>
901 <para>Flags: including -
902 <literal>scanned-once</literal>(the first cycle
903 scanning has been completed),
904 <literal>inconsistent</literal>(one or more
905 inconsistent FID-in-dirent or LinkEA entries that have
907 <literal>upgrade</literal>(from Lustre software release
908 1.8 IGIF format.)</para>
911 <para>Parameters: including
912 <literal>dryrun</literal>,
913 <literal>all_targets</literal>,
914 <literal>failout</literal>,
915 <literal>broadcast</literal>,
916 <literal>orphan</literal>,
917 <literal>create_ostobj</literal> and
918 <literal>create_mdtobj</literal>.</para>
921 <para>Time Since Last Completed.</para>
924 <para>Time Since Latest Start.</para>
927 <para>Time Since Last Checkpoint.</para>
930 <para>Latest Start Position: the position the checking
931 began most recently.</para>
934 <para>Last Checkpoint Position.</para>
937 <para>First Failure Position: the position for the
938 first object to be repaired.</para>
941 <para>Current Position.</para>
948 <para>Statistics</para>
954 <literal>Checked Phase1</literal> total number of
955 objects scanned during
956 <literal>scanning-phase1</literal>.</para>
960 <literal>Checked Phase2</literal> total number of
961 objects scanned during
962 <literal>scanning-phase2</literal>.</para>
966 <literal>Updated Phase1</literal> total number of
967 objects repaired during
968 <literal>scanning-phase1</literal>.</para>
972 <literal>Updated Phase2</literal> total number of
973 objects repaired during
974 <literal>scanning-phase2</literal>.</para>
978 <literal>Failed Phase1</literal> total number of objets
979 that failed to be repaired during
980 <literal>scanning-phase1</literal>.</para>
984 <literal>Failed Phase2</literal> total number of objets
985 that failed to be repaired during
986 <literal>scanning-phase2</literal>.</para>
990 <literal>directories</literal> total number of
991 directories scanned.</para>
995 <literal>multiple_linked_checked</literal> total number
996 of multiple-linked objects that have been
1001 <literal>dirent_repaired</literal> total number of
1002 FID-in-dirent entries that have been repaired.</para>
1006 <literal>linkea_repaired</literal> total number of
1007 linkEA entries that have been repaired.</para>
1011 <literal>unknown_inconsistency</literal> total number of
1012 undefined inconsistencies found in
1013 scanning-phase2.</para>
1017 <literal>unmatched_pairs_repaired</literal> total number
1018 of unmatched pairs that have been repaired.</para>
1022 <literal>dangling_repaired</literal> total number of
1023 dangling name entries that have been
1024 found/repaired.</para>
1028 <literal>multi_referenced_repaired</literal> total
1029 number of multiple referenced name entries that have
1030 been found/repaired.</para>
1034 <literal>bad_file_type_repaired</literal> total number
1035 of name entries with bad file type that have been
1040 <literal>lost_dirent_repaired</literal> total number of
1041 lost name entries that have been re-inserted.</para>
1045 <literal>striped_dirs_scanned</literal> total number of
1046 striped directories (master) that have been
1051 <literal>striped_dirs_repaired</literal> total number of
1052 striped directories (master) that have been
1057 <literal>striped_dirs_failed</literal> total number of
1058 striped directories (master) that have failed to be
1063 <literal>striped_dirs_disabled</literal> total number of
1064 striped directories (master) that have been
1069 <literal>striped_dirs_skipped</literal> total number of
1070 striped directories (master) that have been skipped
1071 (for shards verification) because of lost master LMV
1076 <literal>striped_shards_scanned</literal> total number
1077 of striped directory shards (slave) that have been
1082 <literal>striped_shards_repaired</literal> total number
1083 of striped directory shards (slave) that have been
1088 <literal>striped_shards_failed</literal> total number of
1089 striped directory shards (slave) that have failed to be
1094 <literal>striped_shards_skipped</literal> total number
1095 of striped directory shards (slave) that have been
1096 skipped (for name hash verification) because LFSCK does
1097 not know whether the slave LMV EA is valid or
1102 <literal>name_hash_repaired</literal> total number of
1103 name entries under striped directory with bad name hash
1104 that have been repaired.</para>
1108 <literal>nlinks_repaired</literal> total number of
1109 objects with nlink fixed.</para>
1113 <literal>mul_linked_repaired</literal> total number of
1114 multiple-linked objects that have been repaired.</para>
1118 <literal>local_lost_found_scanned</literal> total number
1119 of objects under /lost+found that have been
1124 <literal>local_lost_found_moved</literal> total number
1125 of objects under /lost+found that have been moved to
1126 namespace visible directory.</para>
1130 <literal>local_lost_found_skipped</literal> total number
1131 of objects under /lost+found that have been
1136 <literal>local_lost_found_failed</literal> total number
1137 of objects under /lost+found that have failed to be
1142 <literal>Success Count</literal> the total number of
1143 completed LFSCK runs on the target.</para>
1147 <literal>Run Time Phase1</literal> the duration of the
1149 <literal>scanning-phase1</literal>. Excluding the time
1150 spent paused between checkpoints.</para>
1154 <literal>Run Time Phase2</literal> the duration of the
1156 <literal>scanning-phase2</literal>. Excluding the time
1157 spent paused between checkpoints.</para>
1161 <literal>Average Speed Phase1</literal> calculated by
1163 <literal>checked_phase1</literal> by
1164 <literal>run_time_phase1</literal>.</para>
1168 <literal>Average Speed Phase2</literal> calculated by
1170 <literal>checked_phase2</literal> by
1171 <literal>run_time_phase1</literal>.</para>
1175 <literal>Real-Time Speed Phase1</literal> the speed
1176 since the last checkpoint if the LFSCK is running
1177 <literal>scanning-phase1</literal>.</para>
1181 <literal>Real-Time Speed Phase2</literal> the speed
1182 since the last checkpoint if the LFSCK is running
1183 <literal>scanning-phase2</literal>.</para>
1193 <section condition='l26'>
1194 <title>LFSCK status of layout via
1195 <literal>procfs</literal></title>
1197 <title>Description</title>
1199 <literal>layout</literal> component is responsible for checking and
1200 repairing MDT-OST inconsistency. The
1201 <literal>procfs</literal> interface for this component is in the MDD
1203 <literal>lfsck_layout</literal>, and in the OBD layer, named
1204 <literal>lfsck_layout</literal>. To show the status of this component
1205 <literal>lctl get_param</literal> should be used as described in the
1207 <para>The LFSCK layout status output refers to phase 1 and phase 2.
1208 Phase 1 is when the LFSCK main engine, which runs on each MDT/OST,
1209 linearly scans its local device, guaranteeing that all local objects
1210 are checked. During phase 1 of layout LFSCK, the OST-objects which are
1211 not referenced by any MDT-object are recorded in a bitmap. During
1212 phase 2 of the layout check, the OST-objects in the bitmap will be
1213 re-scanned to check whether they are really orphan objects.</para>
1216 <title>Usage</title>
1217 <screen>lctl get_param -n mdd.
1218 <replaceable>FSNAME</replaceable>-
1219 <replaceable>MDT_target</replaceable>.lfsck_layout
1220 lctl get_param -n obdfilter.
1221 <replaceable>FSNAME</replaceable>-
1222 <replaceable>OST_target</replaceable>.lfsck_layout</screen>
1225 <title>Output</title>
1226 <informaltable frame="all">
1228 <colspec colname="c1" colwidth="3*" />
1229 <colspec colname="c2" colwidth="7*" />
1234 <emphasis role="bold">Information</emphasis>
1239 <emphasis role="bold">Detail</emphasis>
1247 <para>General Information</para>
1253 <literal>lfsck_layout</literal></para>
1256 <para>LFSCK namespace magic.</para>
1259 <para>LFSCK namespace version..</para>
1262 <para>Status: one of the status -
1263 <literal>init</literal>,
1264 <literal>scanning-phase1</literal>,
1265 <literal>scanning-phase2</literal>,
1266 <literal>completed</literal>,
1267 <literal>failed</literal>,
1268 <literal>stopped</literal>,
1269 <literal>paused</literal>,
1270 <literal>crashed</literal>,
1271 <literal>partial</literal>,
1272 <literal>co-failed</literal>,
1273 <literal>co-stopped</literal>, or
1274 <literal>co-paused</literal>.</para>
1277 <para>Flags: including -
1278 <literal>scanned-once</literal>(the first cycle
1279 scanning has been completed),
1280 <literal>inconsistent</literal>(one or more MDT-OST
1281 inconsistencies have been discovered),
1282 <literal>incomplete</literal>(some MDT or OST did not
1283 participate in the LFSCK or failed to finish the LFSCK)
1285 <literal>crashed_lastid</literal>(the lastid files on
1286 the OST crashed and needs to be rebuilt).</para>
1289 <para>Parameters: including
1290 <literal>dryrun</literal>,
1291 <literal>all_targets</literal> and
1292 <literal>failout</literal>.</para>
1295 <para>Time Since Last Completed.</para>
1298 <para>Time Since Latest Start.</para>
1301 <para>Time Since Last Checkpoint.</para>
1304 <para>Latest Start Position: the position the checking
1305 began most recently.</para>
1308 <para>Last Checkpoint Position.</para>
1311 <para>First Failure Position: the position for the
1312 first object to be repaired.</para>
1315 <para>Current Position.</para>
1322 <para>Statistics</para>
1328 <literal>Success Count:</literal> the total number of
1329 completed LFSCK runs on the target.</para>
1333 <literal>Repaired Dangling:</literal> total number of
1334 MDT-objects with dangling reference have been repaired
1335 in the scanning-phase1.</para>
1339 <literal>Repaired Unmatched Pairs</literal> total number
1340 of unmatched MDT and OST-object pairs have been
1341 repaired in the scanning-phase1</para>
1345 <literal>Repaired Multiple Referenced</literal> total
1346 number of OST-objects with multiple reference have been
1347 repaired in the scanning-phase1.</para>
1351 <literal>Repaired Orphan</literal> total number of
1352 orphan OST-objects have been repaired in the
1353 scanning-phase2.</para>
1357 <literal>Repaired Inconsistent Owner</literal> total
1358 number.of OST-objects with incorrect owner information
1359 have been repaired in the scanning-phase1.</para>
1363 <literal>Repaired Others</literal> total number of.other
1364 inconsistency repaired in the scanning phases.</para>
1368 <literal>Skipped</literal> Number of skipped
1373 <literal>Failed Phase1</literal> total number of objects
1374 that failed to be repaired during
1375 <literal>scanning-phase1</literal>.</para>
1379 <literal>Failed Phase2</literal> total number of objects
1380 that failed to be repaired during
1381 <literal>scanning-phase2</literal>.</para>
1385 <literal>Checked Phase1</literal> total number of
1386 objects scanned during
1387 <literal>scanning-phase1</literal>.</para>
1391 <literal>Checked Phase2</literal> total number of
1392 objects scanned during
1393 <literal>scanning-phase2</literal>.</para>
1397 <literal>Run Time Phase1</literal> the duration of the
1399 <literal>scanning-phase1</literal>. Excluding the time
1400 spent paused between checkpoints.</para>
1404 <literal>Run Time Phase2</literal> the duration of the
1406 <literal>scanning-phase2</literal>. Excluding the time
1407 spent paused between checkpoints.</para>
1411 <literal>Average Speed Phase1</literal> calculated by
1413 <literal>checked_phase1</literal> by
1414 <literal>run_time_phase1</literal>.</para>
1418 <literal>Average Speed Phase2</literal> calculated by
1420 <literal>checked_phase2</literal> by
1421 <literal>run_time_phase1</literal>.</para>
1425 <literal>Real-Time Speed Phase1</literal> the speed
1426 since the last checkpoint if the LFSCK is running
1427 <literal>scanning-phase1</literal>.</para>
1431 <literal>Real-Time Speed Phase2</literal> the speed
1432 since the last checkpoint if the LFSCK is running
1433 <literal>scanning-phase2</literal>.</para>
1445 <title>LFSCK adjustment interface</title>
1446 <section condition='l26'>
1447 <title>Rate control</title>
1449 <title>Description</title>
1450 <para>The LFSCK upper speed limit can be changed using
1451 <literal>lctl set_param</literal> as shown in the usage below.</para>
1454 <title>Usage</title>
1455 <screen>lctl set_param mdd.${FSNAME}-${MDT_target}.lfsck_speed_limit=
1456 <replaceable>N</replaceable>
1457 lctl set_param obdfilter.${FSNAME}-${OST_target}.lfsck_speed_limit=
1458 <replaceable>N</replaceable></screen>
1461 <title>Values</title>
1462 <informaltable frame="all">
1464 <colspec colname="c1" colwidth="3*" />
1465 <colspec colname="c2" colwidth="7*" />
1472 <para>No speed limit (run at maximum speed.)</para>
1477 <para>positive integer</para>
1480 <para>Maximum number of objects to scan per second.</para>
1488 <section xml:id="dbdoclet.lfsck_auto_scrub">
1489 <title>Auto scrub</title>
1491 <title>Description</title>
1493 <literal>auto_scrub</literal> parameter controls whether OI scrub will
1494 be triggered when an inconsistency is detected during OI lookup. It
1495 can be set as described in the usage and values sections
1497 <para>There is also a
1498 <literal>noscrub</literal> mount option (see
1499 <xref linkend="dbdoclet.50438219_12635" />) which can be used to
1500 disable automatic OI scrub upon detection of a file-level backup at
1502 <literal>noscrub</literal> mount option is specified,
1503 <literal>auto_scrub</literal> will also be disabled, so OI scrub will
1504 not be triggered when an OI inconsistency is detected. Auto scrub can
1505 be renabled after the mount using the command shown in the usage.
1506 Manually starting LFSCK after mounting provides finer control over
1507 the starting conditions.</para>
1510 <title>Usage</title>
1511 <screen>lctl set_param osd_ldiskfs.${FSNAME}-${MDT_target}.auto_scrub=<replaceable>N</replaceable></screen>
1513 <replaceable>N</replaceable>is an integer as described below.</para>
1514 <note condition='l25'><para>Lustre software 2.5 and later supports
1515 <literal>-P</literal> option that makes the
1516 <literal>set_param</literal> permanent.</para></note>
1519 <title>Values</title>
1520 <informaltable frame="all">
1522 <colspec colname="c1" colwidth="3*" />
1523 <colspec colname="c2" colwidth="7*" />
1530 <para>Do not start OI Scrub automatically.</para>
1535 <para>positive integer</para>
1538 <para>Automatically start OI Scrub if inconsistency is
1539 detected during OI lookup.</para>