Whamcloud - gitweb
LUDOC-394 manual: Remove extra 'held' word
[doc/manual.git] / TroubleShootingRecovery.xml
1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3  xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4  xml:id="troubleshootingrecovery">
5   <title xml:id="troubleshootingrecovery.title">Troubleshooting
6   Recovery</title>
7   <para>This chapter describes what to do if something goes wrong during
8   recovery. It describes:</para>
9   <itemizedlist>
10     <listitem>
11       <para>
12         <xref linkend="dbdoclet.50438225_71141" />
13       </para>
14     </listitem>
15     <listitem>
16       <para>
17         <xref linkend="dbdoclet.50438225_37365" />
18       </para>
19     </listitem>
20     <listitem>
21       <para>
22         <xref linkend="dbdoclet.50438225_12316" />
23       </para>
24     </listitem>
25     <listitem>
26       <para>
27         <xref linkend="dbdoclet.lfsckadmin" />
28       </para>
29     </listitem>
30   </itemizedlist>
31   <section xml:id="dbdoclet.50438225_71141">
32     <title>
33     <indexterm>
34       <primary>recovery</primary>
35       <secondary>corruption of backing ldiskfs file system</secondary>
36     </indexterm>Recovering from Errors or Corruption on a Backing ldiskfs File
37     System</title>
38     <para>When an OSS, MDS, or MGS server crash occurs, it is not necessary to
39     run e2fsck on the file system.
40     <literal>ldiskfs</literal> journaling ensures that the file system remains
41     consistent over a system crash. The backing file systems are never accessed
42     directly from the client, so client crashes are not relevant for server
43     file system consistency.</para>
44     <para>The only time it is REQUIRED that
45     <literal>e2fsck</literal> be run on a device is when an event causes
46     problems that ldiskfs journaling is unable to handle, such as a hardware
47     device failure or I/O error. If the ldiskfs kernel code detects corruption
48     on the disk, it mounts the file system as read-only to prevent further
49     corruption, but still allows read access to the device. This appears as
50     error "-30" (
51     <literal>EROFS</literal>) in the syslogs on the server, e.g.:</para>
52     <screen>Dec 29 14:11:32 mookie kernel: LDISKFS-fs error (device sdz):
53             ldiskfs_lookup: unlinked inode 5384166 in dir #145170469
54 Dec 29 14:11:32 mookie kernel: Remounting filesystem read-only </screen>
55     <para>In such a situation, it is normally required that e2fsck only be run
56     on the bad device before placing the device back into service.</para>
57     <para>In the vast majority of cases, the Lustre software can cope with any
58     inconsistencies found on the disk and between other devices in the file
59     system.</para>
60     <para>For problem analysis, it is strongly recommended that
61     <literal>e2fsck</literal> be run under a logger, like
62     <literal>script</literal>, to record all
63     of the output and changes that are made to the file system in case this
64     information is needed later.</para>
65     <para>If time permits, it is also a good idea to first run
66     <literal>e2fsck</literal> in non-fixing mode (-n option) to assess the type
67     and extent of damage to the file system. The drawback is that in this mode,
68     <literal>e2fsck</literal> does not recover the file system journal, so there
69     may appear to be file system corruption when none really exists.</para>
70     <para>To address concern about whether corruption is real or only due to
71     the journal not being replayed, you can briefly mount and unmount the
72     <literal>ldiskfs</literal> file system directly on the node with the Lustre
73     file system stopped, using a command similar to:</para>
74     <screen>mount -t ldiskfs /dev/{ostdev} /mnt/ost; umount /mnt/ost</screen>
75     <para>This causes the journal to be recovered.</para>
76     <para>The
77     <literal>e2fsck</literal> utility works well when fixing file system
78     corruption (better than similar file system recovery tools and a primary
79     reason why
80     <literal>ldiskfs</literal> was chosen over other file systems). However, it
81     is often useful to identify the type of damage that has occurred so an
82     <literal>ldiskfs</literal> expert can make intelligent decisions about what
83     needs fixing, in place of
84     <literal>e2fsck</literal>.</para>
85     <screen>root# {stop lustre services for this device, if running}
86 root# script /tmp/e2fsck.sda
87 Script started, file is /tmp/e2fsck.sda
88 root# mount -t ldiskfs /dev/sda /mnt/ost
89 root# umount /mnt/ost
90 root# e2fsck -fn /dev/sda   # don't fix file system, just check for corruption
91 :
92 [e2fsck output]
93 :
94 root# e2fsck -fp /dev/sda   # fix errors with prudent answers (usually <literal>yes</literal>)</screen>
95   </section>
96   <section xml:id="dbdoclet.50438225_37365">
97     <title>
98     <indexterm>
99       <primary>recovery</primary>
100       <secondary>corruption of Lustre file system</secondary>
101     </indexterm>Recovering from Corruption in the Lustre File System</title>
102     <para>In cases where an ldiskfs MDT or OST becomes corrupt, you need to run
103     <literal>e2fsck</literal> to ensure local filesystem consistency, then use
104     <literal>LFSCK</literal> to run a distributed check on the file system to
105     resolve any inconsistencies between the MDTs and OSTs, or among MDTs.</para>
106     <orderedlist>
107       <listitem>
108         <para>Stop the Lustre file system.</para>
109       </listitem>
110       <listitem>
111         <para>Run
112         <literal>e2fsck -f</literal> on the individual MDT/OST that had
113         problems to fix any local file system damage.</para>
114         <para>We recommend running
115         <literal>e2fsck</literal> under script, to create a log of changes made
116         to the file system in case it is needed later. After
117         <literal>e2fsck</literal> is run, bring up the file system, if
118         necessary, to reduce the outage window.</para>
119       </listitem>
120     </orderedlist>
121     <section xml:id="dbdoclet.50438225_13916">
122       <title>
123       <indexterm>
124         <primary>recovery</primary>
125         <secondary>orphaned objects</secondary>
126       </indexterm>Working with Orphaned Objects</title>
127       <para>The simplest problem to resolve is that of orphaned objects. When
128       the LFSCK layout check is run, these objects are linked to new files and
129       put into 
130       <literal>.lustre/lost+found/MDT<replaceable>xxxx</replaceable></literal> 
131       in the Lustre file system 
132       (where MDTxxxx is the index of the MDT on which the orphan was found),
133       where they can be examined and saved or deleted as necessary.</para>
134       <para condition='l27'>With Lustre version 2.7 and later, LFSCK will
135        identify and process orphan objects found on MDTs as well.</para>
136     </section>
137   </section>
138   <section xml:id="dbdoclet.50438225_12316">
139     <title>
140     <indexterm>
141       <primary>recovery</primary>
142       <secondary>unavailable OST</secondary>
143     </indexterm>Recovering from an Unavailable OST</title>
144     <para>One problem encountered in a Lustre file system environment is when
145     an OST becomes unavailable due to a network partition, OSS node crash, etc.
146     When this happens, the OST's clients pause and wait for the OST to become
147     available again, either on the primary OSS or a failover OSS. When the OST
148     comes back online, the Lustre file system starts a recovery process to
149     enable clients to reconnect to the OST. Lustre servers put a limit on the
150     time they will wait in recovery for clients to reconnect.</para>
151     <para>During recovery, clients reconnect and replay their requests
152     serially, in the same order they were done originally. Until a client
153     receives a confirmation that a given transaction has been written to stable
154     storage, the client holds on to the transaction, in case it needs to be
155     replayed. Periodically, a progress message prints to the log, stating
156     how_many/expected clients have reconnected. If the recovery is aborted,
157     this log shows how many clients managed to reconnect. When all clients have
158     completed recovery, or if the recovery timeout is reached, the recovery
159     period ends and the OST resumes normal request processing.</para>
160     <para>If some clients fail to replay their requests during the recovery
161     period, this will not stop the recovery from completing. You may have a
162     situation where the OST recovers, but some clients are not able to
163     participate in recovery (e.g. network problems or client failure), so they
164     are evicted and their requests are not replayed. This would result in any
165     operations on the evicted clients failing, including in-progress writes,
166     which would cause cached writes to be lost. This is a normal outcome; the
167     recovery cannot wait indefinitely, or the file system would be hung any
168     time a client failed. The lost transactions are an unfortunate result of
169     the recovery process.</para>
170     <note>
171       <para>The failure of client recovery does not indicate or lead to
172       filesystem corruption. This is a normal event that is handled by the MDT
173       and OST, and should not result in any inconsistencies between
174       servers.</para>
175     </note>
176     <note>
177       <para>The version-based recovery (VBR) feature enables a failed client to
178       be ''skipped'', so remaining clients can replay their requests, resulting
179       in a more successful recovery from a downed OST. For more information
180       about the VBR feature, see
181       <xref linkend="lustrerecovery" />(Version-based Recovery).</para>
182     </note>
183   </section>
184   <section xml:id="dbdoclet.lfsckadmin">
185     <title>
186     <indexterm>
187       <primary>recovery</primary>
188       <secondary>oiscrub</secondary>
189     </indexterm>
190     <indexterm>
191       <primary>recovery</primary>
192       <secondary>LFSCK</secondary>
193     </indexterm>Checking the file system with LFSCK</title>
194     <para>LFSCK is an administrative tool for checking and repair of the
195       attributes specific to a mounted Lustre file system. It is similar
196       in concept to an offline fsck repair tool for a local filesystem,
197       but LFSCK is implemented to run as part of the Lustre file system
198       while the file system is mounted and in use. This allows consistency
199       checking and repair of Lustre-specific metadata without unnecessary
200       downtime, and can be run on the largest Lustre file systems with
201       minimal impact to normal operations.</para>
202     <para>LFSCK can verify
203       and repair the Object Index (OI) table that is used internally to map
204       Lustre File Identifiers (FIDs) to MDT internal ldiskfs inode numbers, in
205       an internal table called the OI Table. An OI Scrub traverses the OI table
206       and makes corrections where necessary. An OI Scrub is required after
207       restoring from a file-level MDT backup (
208       <xref linkend="dbdoclet.backup_device" />), or in case the OI Table is
209       otherwise corrupted. Later phases of LFSCK will add further checks to the
210       Lustre distributed file system state. LFSCK namespace scanning can verify
211       and repair the directory FID-in-dirent and LinkEA consistency.</para>
212     <para condition='l26'>In Lustre software release 2.6, LFSCK layout scanning
213       can verify and repair MDT-OST file layout inconsistencies. File layout
214       inconsistencies between MDT-objects and OST-objects that are checked and
215       corrected include dangling reference, unreferenced OST-objects, mismatched
216       references and multiple references.</para>
217     <para condition='l27'>In Lustre software release 2.7, LFSCK layout scanning
218       is enhanced to support verify and repair inconsistencies between multiple
219       MDTs.</para>
220     <para>Control and monitoring of LFSCK is through LFSCK and the
221     <literal>lctl get_param</literal> command. LFSCK supports three types
222     of interface: switch interface, status interface, and adjustment interface.
223     These interfaces are detailed below.</para>
224     <section>
225       <title>LFSCK switch interface</title>
226       <section>
227         <title>Manually Starting LFSCK</title>
228         <section>
229           <title>Description</title>
230           <para>LFSCK can be started after the MDT is mounted using the
231           <literal>lctl lfsck_start</literal> command.</para>
232         </section>
233         <section>
234           <title>Usage</title>
235 <screen>lctl lfsck_start &lt;-M | --device <replaceable>[MDT,OST]_device</replaceable>&gt; \
236                     [-A | --all] \
237                     [-c | --create_ostobj <replaceable>on | off</replaceable>] \
238                     [-C | --create_mdtobj <replaceable>on | off</replaceable>] \
239                     [-d | --delay_create_ostobj <replaceable>on | off</replaceable>] \
240                     [-e | --error <replaceable>{continue | abort}</replaceable>] \
241                     [-h | --help] \
242                     [-n | --dryrun <replaceable>on | off</replaceable>] \
243                     [-o | --orphan] \
244                     [-r | --reset] \
245                     [-s | --speed <replaceable>ops_per_sec_limit</replaceable>] \
246                     [-t | --type <replaceable>check_type[,check_type...]</replaceable>] \
247                     [-w | --window_size <replaceable>size</replaceable>]</screen>
248         </section>
249         <section>
250           <title>Options</title>
251           <para>The various
252           <literal>lfsck_start</literal> options are listed and described below.
253           For a complete list of available options, type
254           <literal>lctl lfsck_start -h</literal>.</para>
255           <informaltable frame="all">
256             <tgroup cols="2">
257               <colspec colname="c1" colwidth="3*" />
258               <colspec colname="c2" colwidth="7*" />
259               <thead>
260                 <row>
261                   <entry>
262                     <para>
263                       <emphasis role="bold">Option</emphasis>
264                     </para>
265                   </entry>
266                   <entry>
267                     <para>
268                       <emphasis role="bold">Description</emphasis>
269                     </para>
270                   </entry>
271                 </row>
272               </thead>
273               <tbody>
274                 <row>
275                   <entry>
276                     <para>
277                       <literal>-M | --device</literal>
278                     </para>
279                   </entry>
280                   <entry>
281                     <para>The MDT or OST target to start LFSCK on.</para>
282                   </entry>
283                 </row>
284                 <row>
285                   <entry>
286                     <para>
287                       <literal>-A | --all</literal>
288                     </para>
289                   </entry>
290                   <entry>
291                     <para condition='l26'>Start LFSCK on all
292                     targets on all servers simultaneously.
293                     By default, both layout and namespace
294                     consistency checking and repair are started.</para>
295                   </entry>
296                 </row>
297                 <row>
298                   <entry>
299                     <para>
300                       <literal>-c | --create_ostobj</literal>
301                     </para>
302                   </entry>
303                   <entry>
304                     <para condition='l26'>Create the lost OST-object for
305                     dangling LOV EA,
306                     <literal>off</literal>(default) or
307                     <literal>on</literal>. If not specified, then the default
308                     behaviour is to keep the dangling LOV EA there without
309                     creating the lost OST-object.</para>
310                   </entry>
311                 </row>
312                 <row>
313                   <entry>
314                     <para>
315                       <literal>-C | --create_mdtobj</literal>
316                     </para>
317                   </entry>
318                   <entry>
319                     <para condition='l27'>Create the lost MDT-object for
320                     dangling name entry,
321                     <literal>off</literal>(default) or
322                     <literal>on</literal>. If not specified, then the default
323                     behaviour is to keep the dangling name entry there without
324                     creating the lost MDT-object.</para>
325                   </entry>
326                 </row>
327                 <row>
328                   <entry>
329                     <para>
330                       <literal>-d | --delay_create_ostobj</literal>
331                     </para>
332                   </entry>
333                   <entry>
334                     <para condition='l29'>
335                       Delay creating the lost OST-object for dangling LOV EA
336                       until the orphan OST-objects are handled.
337                       <literal>off</literal>(default) or
338                       <literal>on</literal>.
339                     </para>
340                   </entry>
341                 </row>
342                 <row>
343                   <entry>
344                     <para>
345                       <literal>-e | --error</literal>
346                     </para>
347                   </entry>
348                   <entry>
349                     <para>Error handle,
350                     <literal>continue</literal>(default) or
351                     <literal>abort</literal>. Specify whether the LFSCK will
352                     stop or not if fails to repair something. If it is not
353                     specified, the saved value (when resuming from checkpoint)
354                     will be used if present. This option cannot be changed
355                     while LFSCK is running.</para>
356                   </entry>
357                 </row>
358                 <row>
359                   <entry>
360                     <para>
361                       <literal>-h | --help</literal>
362                     </para>
363                   </entry>
364                   <entry>
365                     <para>Operating help information.</para>
366                   </entry>
367                 </row>
368                 <row>
369                   <entry>
370                     <para>
371                       <literal>-n | --dryrun</literal>
372                     </para>
373                   </entry>
374                   <entry>
375                     <para>Perform a trial without making any changes.
376                     <literal>off</literal>(default) or
377                     <literal>on</literal>.</para>
378                   </entry>
379                 </row>
380                 <row>
381                   <entry>
382                     <para>
383                       <literal>-o | --orphan</literal>
384                     </para>
385                   </entry>
386                   <entry>
387                     <para condition='l26'>Repair orphan OST-objects for layout
388                     LFSCK.</para>
389                   </entry>
390                 </row>
391                 <row>
392                   <entry>
393                     <para>
394                       <literal>-r | --reset</literal>
395                     </para>
396                   </entry>
397                   <entry>
398                     <para>Reset the start position for the object iteration to
399                     the beginning for the specified MDT. By default the
400                     iterator will resume scanning from the last checkpoint
401                     (saved periodically by LFSCK) provided it is
402                     available.</para>
403                   </entry>
404                 </row>
405                 <row>
406                   <entry>
407                     <para>
408                       <literal>-s | --speed</literal>
409                     </para>
410                   </entry>
411                   <entry>
412                     <para>Set the upper speed limit of LFSCK processing in
413                     objects per second. If it is not specified, the saved value
414                     (when resuming from checkpoint) or default value of 0 (0 =
415                     run as fast as possible) is used. Speed can be adjusted
416                     while LFSCK is running with the adjustment
417                     interface.</para>
418                   </entry>
419                 </row>
420                 <row>
421                   <entry>
422                     <para>
423                       <literal>-t | --type</literal>
424                     </para>
425                   </entry>
426                   <entry>
427                     <para>The type of checking/repairing that should be
428                     performed. The new LFSCK framework provides a single
429                     interface for a variety of system consistency
430                     checking/repairing operations including:</para>
431                     <para>Without a specified option, the LFSCK component(s)
432                     which ran last time and did not finish or the component(s)
433                     corresponding to some known system inconsistency, will be
434                     started. Anytime the LFSCK is triggered, the OI scrub will
435                     run automatically, so there is no need to specify
436                     OI_scrub in that case.</para>
437                     <para><literal>namespace</literal>: check and repair
438                     FID-in-dirent and LinkEA consistency.</para>
439                     <para condition='l27'> Lustre-2.7 enhances
440                     namespace consistency verification under DNE mode.</para>
441                     <para condition='l26'>
442                     <literal>layout</literal>: check and repair MDT-OST
443                     inconsistency.</para>
444                   </entry>
445                 </row>
446                 <row>
447                   <entry>
448                     <para>
449                       <literal>-w | --window_size</literal>
450                     </para>
451                   </entry>
452                   <entry>
453                     <para condition='l26'>The window size for the async request
454                     pipeline. The LFSCK async request pipeline's input/output
455                     may have quite different processing speeds, and there may
456                     be too many requests in the pipeline as to cause abnormal
457                     memory/network pressure. If not specified, then the default
458                     window size for the async request pipeline is 1024.</para>
459                   </entry>
460                 </row>
461               </tbody>
462             </tgroup>
463           </informaltable>
464         </section>
465       </section>
466       <section>
467         <title>Manually Stopping LFSCK</title>
468         <section>
469           <title>Description</title>
470           <para>To stop LFSCK when the MDT is mounted, use the
471           <literal>lctl lfsck_stop</literal> command.</para>
472         </section>
473         <section>
474           <title>Usage</title>
475 <screen>lctl lfsck_stop &lt;-M | --device <replaceable>[MDT,OST]_device</replaceable>&gt; \
476                     [-A | --all] \
477                     [-h | --help]</screen>
478         </section>
479         <section>
480           <title>Options</title>
481           <para>The various
482           <literal>lfsck_stop</literal> options are listed and described below.
483           For a complete list of available options, type
484           <literal>lctl lfsck_stop -h</literal>.</para>
485           <informaltable frame="all">
486             <tgroup cols="2">
487               <colspec colname="c1" colwidth="3*" />
488               <colspec colname="c2" colwidth="7*" />
489               <thead>
490                 <row>
491                   <entry>
492                     <para>
493                       <emphasis role="bold">Option</emphasis>
494                     </para>
495                   </entry>
496                   <entry>
497                     <para>
498                       <emphasis role="bold">Description</emphasis>
499                     </para>
500                   </entry>
501                 </row>
502               </thead>
503               <tbody>
504                 <row>
505                   <entry>
506                     <para>
507                       <literal>-M | --device</literal>
508                     </para>
509                   </entry>
510                   <entry>
511                     <para>The MDT or OST target to stop LFSCK on.</para>
512                   </entry>
513                 </row>
514                 <row>
515                   <entry>
516                     <para>
517                       <literal>-A | --all</literal>
518                     </para>
519                   </entry>
520                   <entry>
521                     <para>Stop LFSCK on all targets on all servers
522                     simultaneously.</para>
523                   </entry>
524                 </row>
525                 <row>
526                   <entry>
527                     <para>
528                       <literal>-h | --help</literal>
529                     </para>
530                   </entry>
531                   <entry>
532                     <para>Operating help information.</para>
533                   </entry>
534                 </row>
535               </tbody>
536             </tgroup>
537           </informaltable>
538         </section>
539       </section>
540     </section>
541     <section condition="l29">
542       <title>Check the LFSCK global status</title>
543       <section>
544         <title>Description</title>
545         <para>Check the LFSCK global status via a single
546         <literal>lctl lfsck_query</literal> command on the MDS.</para>
547       </section>
548       <section>
549         <title>Usage</title>
550 <screen>lctl lfsck_query &lt;-M | --device <replaceable>MDT_device</replaceable>&gt; \
551                     [-h | --help] \
552                     [-t | --type <replaceable>lfsck_type[,lfsck_type...]</replaceable>] \
553                     [-w | --wait]</screen>
554       </section>
555       <section>
556         <title>Options</title>
557         <para>The various
558         <literal>lfsck_query</literal> options are listed and described below.
559         For a complete list of available options, type
560         <literal>lctl lfsck_query -h</literal>.</para>
561         <informaltable frame="all">
562           <tgroup cols="2">
563             <colspec colname="c1" colwidth="3*" />
564             <colspec colname="c2" colwidth="7*" />
565             <thead>
566               <row>
567                 <entry>
568                   <para>
569                     <emphasis role="bold">Option</emphasis>
570                   </para>
571                 </entry>
572                 <entry>
573                   <para>
574                     <emphasis role="bold">Description</emphasis>
575                   </para>
576                 </entry>
577               </row>
578             </thead>
579             <tbody>
580               <row>
581                 <entry>
582                   <para>
583                     <literal>-M | --device</literal>
584                   </para>
585                 </entry>
586                 <entry>
587                   <para>The device to query for LFSCK status.</para>
588                 </entry>
589               </row>
590               <row>
591                 <entry>
592                   <para>
593                     <literal>-h | --help</literal>
594                   </para>
595                 </entry>
596                 <entry>
597                   <para>Operating help information.</para>
598                 </entry>
599               </row>
600               <row>
601                 <entry>
602                   <para>
603                     <literal>-t | --type</literal>
604                   </para>
605                 </entry>
606                 <entry>
607                   <para>The LFSCK type(s) that should be queried,
608                   including: layout, namespace.</para>
609                 </entry>
610               </row>
611               <row>
612                 <entry>
613                   <para>
614                     <literal>-w | --wait</literal>
615                   </para>
616                 </entry>
617                 <entry>
618                   <para>will wait if the LFSCK is in scanning.</para>
619                 </entry>
620               </row>
621             </tbody>
622             </tgroup>
623           </informaltable>
624       </section>
625     </section>
626     <section>
627       <title>LFSCK status interface</title>
628       <section>
629         <title>LFSCK status of OI Scrub via
630         <literal>procfs</literal></title>
631         <section>
632           <title>Description</title>
633           <para>For each LFSCK component there is a dedicated procfs interface
634           to trace the corresponding LFSCK component status. For OI Scrub, the
635           interface is the OSD layer procfs interface, named
636           <literal>oi_scrub</literal>. To display OI Scrub status, the standard
637           <literal>lctl get_param</literal> command is used as shown in the
638           usage below.</para>
639         </section>
640         <section>
641           <title>Usage</title>
642           <screen>lctl get_param -n osd-ldiskfs.<replaceable>FSNAME</replaceable>-[<replaceable>MDT_target|OST_target</replaceable>].oi_scrub</screen>
643         </section>
644         <section>
645           <title>Output</title>
646           <informaltable frame="all">
647             <tgroup cols="2">
648               <colspec colname="c1" colwidth="3*" />
649               <colspec colname="c2" colwidth="7*" />
650               <thead>
651                 <row>
652                   <entry>
653                     <para>
654                       <emphasis role="bold">Information</emphasis>
655                     </para>
656                   </entry>
657                   <entry>
658                     <para>
659                       <emphasis role="bold">Detail</emphasis>
660                     </para>
661                   </entry>
662                 </row>
663               </thead>
664               <tbody>
665                 <row>
666                   <entry>
667                     <para>General Information</para>
668                   </entry>
669                   <entry>
670                     <itemizedlist>
671                       <listitem>
672                         <para>Name: OI_scrub.</para>
673                       </listitem>
674                       <listitem>
675                         <para>OI scrub magic id (an identifier unique to OI
676                         scrub).</para>
677                       </listitem>
678                       <listitem>
679                         <para>OI files count.</para>
680                       </listitem>
681                       <listitem>
682                         <para>Status: one of the status -
683                         <literal>init</literal>,
684                         <literal>scanning</literal>,
685                         <literal>completed</literal>,
686                         <literal>failed</literal>,
687                         <literal>stopped</literal>,
688                         <literal>paused</literal>, or
689                         <literal>crashed</literal>.</para>
690                       </listitem>
691                       <listitem>
692                         <para>Flags: including -
693                         <literal>recreated</literal>(OI file(s) is/are
694                         removed/recreated),
695                         <literal>inconsistent</literal>(restored from
696                         file-level backup),
697                         <literal>auto</literal>(triggered by non-UI mechanism),
698                         and
699                         <literal>upgrade</literal>(from Lustre software release
700                         1.8 IGIF format.)</para>
701                       </listitem>
702                       <listitem>
703                         <para>Parameters: OI scrub parameters, like
704                         <literal>failout</literal>.</para>
705                       </listitem>
706                       <listitem>
707                         <para>Time Since Last Completed.</para>
708                       </listitem>
709                       <listitem>
710                         <para>Time Since Latest Start.</para>
711                       </listitem>
712                       <listitem>
713                         <para>Time Since Last Checkpoint.</para>
714                       </listitem>
715                       <listitem>
716                         <para>Latest Start Position: the position for the
717                         latest scrub started from.</para>
718                       </listitem>
719                       <listitem>
720                         <para>Last Checkpoint Position.</para>
721                       </listitem>
722                       <listitem>
723                         <para>First Failure Position: the position for the
724                         first object to be repaired.</para>
725                       </listitem>
726                       <listitem>
727                         <para>Current Position.</para>
728                       </listitem>
729                     </itemizedlist>
730                   </entry>
731                 </row>
732                 <row>
733                   <entry>
734                     <para>Statistics</para>
735                   </entry>
736                   <entry>
737                     <itemizedlist>
738                       <listitem>
739                         <para>
740                         <literal>Checked</literal> total number of objects
741                         scanned.</para>
742                       </listitem>
743                       <listitem>
744                         <para>
745                         <literal>Updated</literal> total number of objects
746                         repaired.</para>
747                       </listitem>
748                       <listitem>
749                         <para>
750                         <literal>Failed</literal> total number of objects that
751                         failed to be repaired.</para>
752                       </listitem>
753                       <listitem>
754                         <para>
755                         <literal>No Scrub</literal> total number of objects
756                         marked
757                         <literal>LDISKFS_STATE_LUSTRE_NOSCRUB and
758                         skipped</literal>.</para>
759                       </listitem>
760                       <listitem>
761                         <para>
762                         <literal>IGIF</literal> total number of objects IGIF
763                         scanned.</para>
764                       </listitem>
765                       <listitem>
766                         <para>
767                         <literal>Prior Updated</literal> how many objects have
768                         been repaired which are triggered by parallel
769                         RPC.</para>
770                       </listitem>
771                       <listitem>
772                         <para>
773                         <literal>Success Count</literal> total number of
774                         completed OI_scrub runs on the target.</para>
775                       </listitem>
776                       <listitem>
777                         <para>
778                         <literal>Run Time</literal> how long the scrub has run,
779                         tally from the time of scanning from the beginning of
780                         the specified MDT target, not include the
781                         paused/failure time among checkpoints.</para>
782                       </listitem>
783                       <listitem>
784                         <para>
785                         <literal>Average Speed</literal> calculated by dividing
786                         <literal>Checked</literal> by
787                         <literal>run_time</literal>.</para>
788                       </listitem>
789                       <listitem>
790                         <para>
791                         <literal>Real-Time Speed</literal> the speed since last
792                         checkpoint if the OI_scrub is running.</para>
793                       </listitem>
794                       <listitem>
795                         <para>
796                         <literal>Scanned</literal> total number of objects under
797                         /lost+found that have been scanned.</para>
798                       </listitem>
799                       <listitem>
800                         <para>
801                         <literal>Repaired</literal> total number of objects
802                         under /lost+found that have been recovered.</para>
803                       </listitem>
804                       <listitem>
805                         <para>
806                         <literal>Failed</literal> total number of objects under
807                         /lost+found failed to be scanned or failed to be
808                         recovered.</para>
809                       </listitem>
810                     </itemizedlist>
811                   </entry>
812                 </row>
813               </tbody>
814             </tgroup>
815           </informaltable>
816         </section>
817       </section>
818       <section>
819         <title>LFSCK status of namespace via
820         <literal>procfs</literal></title>
821         <section>
822           <title>Description</title>
823           <para>The
824           <literal>namespace</literal> component is responsible for checks
825           described in <xref linkend="dbdoclet.lfsckadmin" />. The
826           <literal>procfs</literal> interface for this component is in the
827           MDD layer, named
828           <literal>lfsck_namespace</literal>. To show the status of this
829           component,
830           <literal>lctl get_param</literal> should be used as described in the
831           usage below.</para>
832           <para>The LFSCK namespace status output refers to phase 1 and phase 2.
833           Phase 1 is when the LFSCK main engine, which runs on each MDT,
834           linearly scans its local device, guaranteeing that all local objects
835           are checked.  However, there are certain cases in which LFSCK cannot
836           know whether an object is consistent or cannot repair an inconsistency
837           until the phase 1 scanning is completed. During phase 2 of the
838           namespace check, objects with multiple hard-links, objects with remote
839           parents, and other objects which couldn't be verified during phase 1
840           will be checked.</para>
841         </section>
842         <section>
843           <title>Usage</title>
844           <screen>lctl get_param -n mdd. <replaceable>FSNAME</replaceable>-<replaceable>MDT_target</replaceable>.lfsck_namespace</screen>
845         </section>
846         <section>
847           <title>Output</title>
848           <informaltable frame="all">
849             <tgroup cols="2">
850               <colspec colname="c1" colwidth="3*" />
851               <colspec colname="c2" colwidth="7*" />
852               <thead>
853                 <row>
854                   <entry>
855                     <para>
856                       <emphasis role="bold">Information</emphasis>
857                     </para>
858                   </entry>
859                   <entry>
860                     <para>
861                       <emphasis role="bold">Detail</emphasis>
862                     </para>
863                   </entry>
864                 </row>
865               </thead>
866               <tbody>
867                 <row>
868                   <entry>
869                     <para>General Information</para>
870                   </entry>
871                   <entry>
872                     <itemizedlist>
873                       <listitem>
874                         <para>Name:
875                         <literal>lfsck_namespace</literal></para>
876                       </listitem>
877                       <listitem>
878                         <para>LFSCK namespace magic.</para>
879                       </listitem>
880                       <listitem>
881                         <para>LFSCK namespace version..</para>
882                       </listitem>
883                       <listitem>
884                         <para>Status: one of the status -
885                         <literal>init</literal>,
886                         <literal>scanning-phase1</literal>,
887                         <literal>scanning-phase2</literal>,
888                         <literal>completed</literal>,
889                         <literal>failed</literal>,
890                         <literal>stopped</literal>,
891                         <literal>paused</literal>,
892                         <literal>partial</literal>,
893                         <literal>co-failed</literal>,
894                         <literal>co-stopped</literal> or
895                         <literal>co-paused</literal>.</para>
896                       </listitem>
897                       <listitem>
898                         <para>Flags: including -
899                         <literal>scanned-once</literal>(the first cycle
900                         scanning has been completed),
901                         <literal>inconsistent</literal>(one or more
902                         inconsistent FID-in-dirent or LinkEA entries that have
903                         been discovered),
904                         <literal>upgrade</literal>(from Lustre software release
905                         1.8 IGIF format.)</para>
906                       </listitem>
907                       <listitem>
908                         <para>Parameters: including
909                         <literal>dryrun</literal>,
910                         <literal>all_targets</literal>,
911                         <literal>failout</literal>,
912                         <literal>broadcast</literal>,
913                         <literal>orphan</literal>,
914                         <literal>create_ostobj</literal> and
915                         <literal>create_mdtobj</literal>.</para>
916                       </listitem>
917                       <listitem>
918                         <para>Time Since Last Completed.</para>
919                       </listitem>
920                       <listitem>
921                         <para>Time Since Latest Start.</para>
922                       </listitem>
923                       <listitem>
924                         <para>Time Since Last Checkpoint.</para>
925                       </listitem>
926                       <listitem>
927                         <para>Latest Start Position: the position the checking
928                         began most recently.</para>
929                       </listitem>
930                       <listitem>
931                         <para>Last Checkpoint Position.</para>
932                       </listitem>
933                       <listitem>
934                         <para>First Failure Position: the position for the
935                         first object to be repaired.</para>
936                       </listitem>
937                       <listitem>
938                         <para>Current Position.</para>
939                       </listitem>
940                     </itemizedlist>
941                   </entry>
942                 </row>
943                 <row>
944                   <entry>
945                     <para>Statistics</para>
946                   </entry>
947                   <entry>
948                     <itemizedlist>
949                       <listitem>
950                         <para>
951                         <literal>Checked Phase1</literal> total number of
952                         objects scanned during
953                         <literal>scanning-phase1</literal>.</para>
954                       </listitem>
955                       <listitem>
956                         <para>
957                         <literal>Checked Phase2</literal> total number of
958                         objects scanned during
959                         <literal>scanning-phase2</literal>.</para>
960                       </listitem>
961                       <listitem>
962                         <para>
963                         <literal>Updated Phase1</literal> total number of
964                         objects repaired during
965                         <literal>scanning-phase1</literal>.</para>
966                       </listitem>
967                       <listitem>
968                         <para>
969                         <literal>Updated Phase2</literal> total number of
970                         objects repaired during
971                         <literal>scanning-phase2</literal>.</para>
972                       </listitem>
973                       <listitem>
974                         <para>
975                         <literal>Failed Phase1</literal> total number of objets
976                         that failed to be repaired during
977                         <literal>scanning-phase1</literal>.</para>
978                       </listitem>
979                       <listitem>
980                         <para>
981                         <literal>Failed Phase2</literal> total number of objets
982                         that failed to be repaired during
983                         <literal>scanning-phase2</literal>.</para>
984                       </listitem>
985                       <listitem>
986                         <para>
987                         <literal>directories</literal> total number of
988                         directories scanned.</para>
989                       </listitem>
990                       <listitem>
991                         <para>
992                         <literal>multiple_linked_checked</literal> total number
993                         of multiple-linked objects that have been
994                         scanned.</para>
995                       </listitem>
996                       <listitem>
997                         <para>
998                         <literal>dirent_repaired</literal> total number of
999                         FID-in-dirent entries that have been repaired.</para>
1000                       </listitem>
1001                       <listitem>
1002                         <para>
1003                         <literal>linkea_repaired</literal> total number of
1004                         linkEA entries that have been repaired.</para>
1005                       </listitem>
1006                       <listitem>
1007                         <para>
1008                         <literal>unknown_inconsistency</literal> total number of
1009                         undefined inconsistencies found in
1010                         scanning-phase2.</para>
1011                       </listitem>
1012                       <listitem>
1013                         <para>
1014                         <literal>unmatched_pairs_repaired</literal> total number
1015                         of unmatched pairs that have been repaired.</para>
1016                       </listitem>
1017                       <listitem>
1018                         <para>
1019                         <literal>dangling_repaired</literal> total number of
1020                         dangling name entries that have been
1021                         found/repaired.</para>
1022                       </listitem>
1023                       <listitem>
1024                         <para>
1025                         <literal>multi_referenced_repaired</literal> total
1026                         number of multiple referenced name entries that have
1027                         been found/repaired.</para>
1028                       </listitem>
1029                       <listitem>
1030                         <para>
1031                         <literal>bad_file_type_repaired</literal> total number
1032                         of name entries with bad file type that have been
1033                         repaired.</para>
1034                       </listitem>
1035                       <listitem>
1036                         <para>
1037                         <literal>lost_dirent_repaired</literal> total number of
1038                         lost name entries that have been re-inserted.</para>
1039                       </listitem>
1040                       <listitem>
1041                         <para>
1042                         <literal>striped_dirs_scanned</literal> total number of
1043                         striped directories (master) that have been
1044                         scanned.</para>
1045                       </listitem>
1046                       <listitem>
1047                         <para>
1048                         <literal>striped_dirs_repaired</literal> total number of
1049                         striped directories (master) that have been
1050                         repaired.</para>
1051                       </listitem>
1052                       <listitem>
1053                         <para>
1054                         <literal>striped_dirs_failed</literal> total number of
1055                         striped directories (master) that have failed to be
1056                         verified.</para>
1057                       </listitem>
1058                       <listitem>
1059                         <para>
1060                         <literal>striped_dirs_disabled</literal> total number of
1061                         striped directories (master) that have been
1062                         disabled.</para>
1063                       </listitem>
1064                       <listitem>
1065                         <para>
1066                         <literal>striped_dirs_skipped</literal> total number of
1067                         striped directories (master) that have been skipped
1068                         (for shards verification) because of lost master LMV
1069                         EA.</para>
1070                       </listitem>
1071                       <listitem>
1072                         <para>
1073                         <literal>striped_shards_scanned</literal> total number
1074                         of striped directory shards (slave) that have been
1075                         scanned.</para>
1076                       </listitem>
1077                       <listitem>
1078                         <para>
1079                         <literal>striped_shards_repaired</literal> total number
1080                         of striped directory shards (slave) that have been
1081                         repaired.</para>
1082                       </listitem>
1083                       <listitem>
1084                         <para>
1085                         <literal>striped_shards_failed</literal> total number of
1086                         striped directory shards (slave) that have failed to be
1087                         verified.</para>
1088                       </listitem>
1089                       <listitem>
1090                         <para>
1091                         <literal>striped_shards_skipped</literal> total number
1092                         of striped directory shards (slave) that have been
1093                         skipped (for name hash verification) because LFSCK does
1094                         not know whether the slave LMV EA is valid or
1095                         not.</para>
1096                       </listitem>
1097                       <listitem>
1098                         <para>
1099                         <literal>name_hash_repaired</literal> total number of
1100                         name entries under striped directory with bad name hash
1101                         that have been repaired.</para>
1102                       </listitem>
1103                       <listitem>
1104                         <para>
1105                         <literal>nlinks_repaired</literal> total number of
1106                         objects with nlink fixed.</para>
1107                       </listitem>
1108                       <listitem>
1109                         <para>
1110                         <literal>mul_linked_repaired</literal> total number of
1111                         multiple-linked objects that have been repaired.</para>
1112                       </listitem>
1113                       <listitem>
1114                         <para>
1115                         <literal>local_lost_found_scanned</literal> total number
1116                         of objects under /lost+found that have been
1117                         scanned.</para>
1118                       </listitem>
1119                       <listitem>
1120                         <para>
1121                         <literal>local_lost_found_moved</literal> total number
1122                         of objects under /lost+found that have been moved to
1123                         namespace visible directory.</para>
1124                       </listitem>
1125                       <listitem>
1126                         <para>
1127                         <literal>local_lost_found_skipped</literal> total number
1128                         of objects under /lost+found that have been
1129                         skipped.</para>
1130                       </listitem>
1131                       <listitem>
1132                         <para>
1133                         <literal>local_lost_found_failed</literal> total number
1134                         of objects under /lost+found that have failed to be
1135                         processed.</para>
1136                       </listitem>
1137                       <listitem>
1138                         <para>
1139                         <literal>Success Count</literal> the total number of
1140                         completed LFSCK runs on the target.</para>
1141                       </listitem>
1142                       <listitem>
1143                         <para>
1144                         <literal>Run Time Phase1</literal> the duration of the
1145                         LFSCK run during
1146                         <literal>scanning-phase1</literal>. Excluding the time
1147                         spent paused between checkpoints.</para>
1148                       </listitem>
1149                       <listitem>
1150                         <para>
1151                         <literal>Run Time Phase2</literal> the duration of the
1152                         LFSCK run during
1153                         <literal>scanning-phase2</literal>. Excluding the time
1154                         spent paused between checkpoints.</para>
1155                       </listitem>
1156                       <listitem>
1157                         <para>
1158                         <literal>Average Speed Phase1</literal> calculated by
1159                         dividing
1160                         <literal>checked_phase1</literal> by
1161                         <literal>run_time_phase1</literal>.</para>
1162                       </listitem>
1163                       <listitem>
1164                         <para>
1165                         <literal>Average Speed Phase2</literal> calculated by
1166                         dividing
1167                         <literal>checked_phase2</literal> by
1168                         <literal>run_time_phase1</literal>.</para>
1169                       </listitem>
1170                       <listitem>
1171                         <para>
1172                         <literal>Real-Time Speed Phase1</literal> the speed
1173                         since the last checkpoint if the LFSCK is running
1174                         <literal>scanning-phase1</literal>.</para>
1175                       </listitem>
1176                       <listitem>
1177                         <para>
1178                         <literal>Real-Time Speed Phase2</literal> the speed
1179                         since the last checkpoint if the LFSCK is running
1180                         <literal>scanning-phase2</literal>.</para>
1181                       </listitem>
1182                     </itemizedlist>
1183                   </entry>
1184                 </row>
1185               </tbody>
1186             </tgroup>
1187           </informaltable>
1188         </section>
1189       </section>
1190       <section condition='l26'>
1191         <title>LFSCK status of layout via
1192         <literal>procfs</literal></title>
1193         <section>
1194           <title>Description</title>
1195           <para>The
1196           <literal>layout</literal> component is responsible for checking and
1197           repairing MDT-OST inconsistency. The
1198           <literal>procfs</literal> interface for this component is in the MDD
1199           layer, named
1200           <literal>lfsck_layout</literal>, and in the OBD layer, named
1201           <literal>lfsck_layout</literal>. To show the status of this component
1202           <literal>lctl get_param</literal> should be used as described in the
1203           usage below.</para>
1204           <para>The LFSCK layout status output refers to phase 1 and phase 2.
1205           Phase 1 is when the LFSCK main engine, which runs on each MDT/OST,
1206           linearly scans its local device, guaranteeing that all local objects
1207           are checked. During phase 1 of layout LFSCK, the OST-objects which are
1208           not referenced by any MDT-object are recorded in a bitmap. During
1209           phase 2 of the layout check, the OST-objects in the bitmap will be
1210           re-scanned to check whether they are really orphan objects.</para>
1211         </section>
1212         <section>
1213           <title>Usage</title>
1214           <screen>lctl get_param -n mdd.
1215 <replaceable>FSNAME</replaceable>-
1216 <replaceable>MDT_target</replaceable>.lfsck_layout
1217 lctl get_param -n obdfilter.
1218 <replaceable>FSNAME</replaceable>-
1219 <replaceable>OST_target</replaceable>.lfsck_layout</screen>
1220         </section>
1221         <section>
1222           <title>Output</title>
1223           <informaltable frame="all">
1224             <tgroup cols="2">
1225               <colspec colname="c1" colwidth="3*" />
1226               <colspec colname="c2" colwidth="7*" />
1227               <thead>
1228                 <row>
1229                   <entry>
1230                     <para>
1231                       <emphasis role="bold">Information</emphasis>
1232                     </para>
1233                   </entry>
1234                   <entry>
1235                     <para>
1236                       <emphasis role="bold">Detail</emphasis>
1237                     </para>
1238                   </entry>
1239                 </row>
1240               </thead>
1241               <tbody>
1242                 <row>
1243                   <entry>
1244                     <para>General Information</para>
1245                   </entry>
1246                   <entry>
1247                     <itemizedlist>
1248                       <listitem>
1249                         <para>Name:
1250                         <literal>lfsck_layout</literal></para>
1251                       </listitem>
1252                       <listitem>
1253                         <para>LFSCK namespace magic.</para>
1254                       </listitem>
1255                       <listitem>
1256                         <para>LFSCK namespace version..</para>
1257                       </listitem>
1258                       <listitem>
1259                         <para>Status: one of the status -
1260                         <literal>init</literal>,
1261                         <literal>scanning-phase1</literal>,
1262                         <literal>scanning-phase2</literal>,
1263                         <literal>completed</literal>,
1264                         <literal>failed</literal>,
1265                         <literal>stopped</literal>,
1266                         <literal>paused</literal>,
1267                         <literal>crashed</literal>,
1268                         <literal>partial</literal>,
1269                         <literal>co-failed</literal>,
1270                         <literal>co-stopped</literal>, or
1271                         <literal>co-paused</literal>.</para>
1272                       </listitem>
1273                       <listitem>
1274                         <para>Flags: including -
1275                         <literal>scanned-once</literal>(the first cycle
1276                         scanning has been completed),
1277                         <literal>inconsistent</literal>(one or more MDT-OST
1278                         inconsistencies have been discovered),
1279                         <literal>incomplete</literal>(some MDT or OST did not
1280                         participate in the LFSCK or failed to finish the LFSCK)
1281                         or
1282                         <literal>crashed_lastid</literal>(the lastid files on
1283                         the OST crashed and needs to be rebuilt).</para>
1284                       </listitem>
1285                       <listitem>
1286                         <para>Parameters: including
1287                         <literal>dryrun</literal>,
1288                         <literal>all_targets</literal> and
1289                         <literal>failout</literal>.</para>
1290                       </listitem>
1291                       <listitem>
1292                         <para>Time Since Last Completed.</para>
1293                       </listitem>
1294                       <listitem>
1295                         <para>Time Since Latest Start.</para>
1296                       </listitem>
1297                       <listitem>
1298                         <para>Time Since Last Checkpoint.</para>
1299                       </listitem>
1300                       <listitem>
1301                         <para>Latest Start Position: the position the checking
1302                         began most recently.</para>
1303                       </listitem>
1304                       <listitem>
1305                         <para>Last Checkpoint Position.</para>
1306                       </listitem>
1307                       <listitem>
1308                         <para>First Failure Position: the position for the
1309                         first object to be repaired.</para>
1310                       </listitem>
1311                       <listitem>
1312                         <para>Current Position.</para>
1313                       </listitem>
1314                     </itemizedlist>
1315                   </entry>
1316                 </row>
1317                 <row>
1318                   <entry>
1319                     <para>Statistics</para>
1320                   </entry>
1321                   <entry>
1322                     <itemizedlist>
1323                       <listitem>
1324                         <para>
1325                         <literal>Success Count:</literal> the total number of
1326                         completed LFSCK runs on the target.</para>
1327                       </listitem>
1328                       <listitem>
1329                         <para>
1330                         <literal>Repaired Dangling:</literal> total number of
1331                         MDT-objects with dangling reference have been repaired
1332                         in the scanning-phase1.</para>
1333                       </listitem>
1334                       <listitem>
1335                         <para>
1336                         <literal>Repaired Unmatched Pairs</literal> total number
1337                         of unmatched MDT and OST-object pairs have been
1338                         repaired in the scanning-phase1</para>
1339                       </listitem>
1340                       <listitem>
1341                         <para>
1342                         <literal>Repaired Multiple Referenced</literal> total
1343                         number of OST-objects with multiple reference have been
1344                         repaired in the scanning-phase1.</para>
1345                       </listitem>
1346                       <listitem>
1347                         <para>
1348                         <literal>Repaired Orphan</literal> total number of
1349                         orphan OST-objects have been repaired in the
1350                         scanning-phase2.</para>
1351                       </listitem>
1352                       <listitem>
1353                         <para>
1354                         <literal>Repaired Inconsistent Owner</literal> total
1355                         number.of OST-objects with incorrect owner information
1356                         have been repaired in the scanning-phase1.</para>
1357                       </listitem>
1358                       <listitem>
1359                         <para>
1360                         <literal>Repaired Others</literal> total number of.other
1361                         inconsistency repaired in the scanning phases.</para>
1362                       </listitem>
1363                       <listitem>
1364                         <para>
1365                         <literal>Skipped</literal> Number of skipped
1366                         objects.</para>
1367                       </listitem>
1368                       <listitem>
1369                         <para>
1370                         <literal>Failed Phase1</literal> total number of objects
1371                         that failed to be repaired during
1372                         <literal>scanning-phase1</literal>.</para>
1373                       </listitem>
1374                       <listitem>
1375                         <para>
1376                         <literal>Failed Phase2</literal> total number of objects
1377                         that failed to be repaired during
1378                         <literal>scanning-phase2</literal>.</para>
1379                       </listitem>
1380                       <listitem>
1381                         <para>
1382                         <literal>Checked Phase1</literal> total number of
1383                         objects scanned during
1384                         <literal>scanning-phase1</literal>.</para>
1385                       </listitem>
1386                       <listitem>
1387                         <para>
1388                         <literal>Checked Phase2</literal> total number of
1389                         objects scanned during
1390                         <literal>scanning-phase2</literal>.</para>
1391                       </listitem>
1392                       <listitem>
1393                         <para>
1394                         <literal>Run Time Phase1</literal> the duration of the
1395                         LFSCK run during
1396                         <literal>scanning-phase1</literal>. Excluding the time
1397                         spent paused between checkpoints.</para>
1398                       </listitem>
1399                       <listitem>
1400                         <para>
1401                         <literal>Run Time Phase2</literal> the duration of the
1402                         LFSCK run during
1403                         <literal>scanning-phase2</literal>. Excluding the time
1404                         spent paused between checkpoints.</para>
1405                       </listitem>
1406                       <listitem>
1407                         <para>
1408                         <literal>Average Speed Phase1</literal> calculated by
1409                         dividing
1410                         <literal>checked_phase1</literal> by
1411                         <literal>run_time_phase1</literal>.</para>
1412                       </listitem>
1413                       <listitem>
1414                         <para>
1415                         <literal>Average Speed Phase2</literal> calculated by
1416                         dividing
1417                         <literal>checked_phase2</literal> by
1418                         <literal>run_time_phase1</literal>.</para>
1419                       </listitem>
1420                       <listitem>
1421                         <para>
1422                         <literal>Real-Time Speed Phase1</literal> the speed
1423                         since the last checkpoint if the LFSCK is running
1424                         <literal>scanning-phase1</literal>.</para>
1425                       </listitem>
1426                       <listitem>
1427                         <para>
1428                         <literal>Real-Time Speed Phase2</literal> the speed
1429                         since the last checkpoint if the LFSCK is running
1430                         <literal>scanning-phase2</literal>.</para>
1431                       </listitem>
1432                     </itemizedlist>
1433                   </entry>
1434                 </row>
1435               </tbody>
1436             </tgroup>
1437           </informaltable>
1438         </section>
1439       </section>
1440     </section>
1441     <section>
1442       <title>LFSCK adjustment interface</title>
1443       <section condition='l26'>
1444         <title>Rate control</title>
1445         <section>
1446           <title>Description</title>
1447           <para>The LFSCK upper speed limit can be changed using
1448           <literal>lctl set_param</literal> as shown in the usage below.</para>
1449         </section>
1450         <section>
1451           <title>Usage</title>
1452           <screen>lctl set_param mdd.${FSNAME}-${MDT_target}.lfsck_speed_limit=
1453 <replaceable>N</replaceable>
1454 lctl set_param obdfilter.${FSNAME}-${OST_target}.lfsck_speed_limit=
1455 <replaceable>N</replaceable></screen>
1456         </section>
1457         <section>
1458           <title>Values</title>
1459           <informaltable frame="all">
1460             <tgroup cols="2">
1461               <colspec colname="c1" colwidth="3*" />
1462               <colspec colname="c2" colwidth="7*" />
1463               <tbody>
1464                 <row>
1465                   <entry>
1466                     <para>0</para>
1467                   </entry>
1468                   <entry>
1469                     <para>No speed limit (run at maximum speed.)</para>
1470                   </entry>
1471                 </row>
1472                 <row>
1473                   <entry>
1474                     <para>positive integer</para>
1475                   </entry>
1476                   <entry>
1477                     <para>Maximum number of objects to scan per second.</para>
1478                   </entry>
1479                 </row>
1480               </tbody>
1481             </tgroup>
1482           </informaltable>
1483         </section>
1484       </section>
1485       <section xml:id="dbdoclet.lfsck_auto_scrub">
1486         <title>Auto scrub</title>
1487         <section>
1488           <title>Description</title>
1489           <para>The
1490           <literal>auto_scrub</literal> parameter controls whether OI scrub will
1491           be triggered when an inconsistency is detected during OI lookup. It
1492           can be set as described in the usage and values sections
1493           below.</para>
1494           <para>There is also a
1495           <literal>noscrub</literal> mount option (see
1496           <xref linkend="dbdoclet.50438219_12635" />) which can be used to
1497           disable automatic OI scrub upon detection of a file-level backup at
1498           mount time. If the
1499           <literal>noscrub</literal> mount option is specified,
1500           <literal>auto_scrub</literal> will also be disabled, so OI scrub will
1501           not be triggered when an OI inconsistency is detected. Auto scrub can
1502           be renabled after the mount using the command shown in the usage.
1503           Manually starting LFSCK after mounting provides finer control over
1504           the starting conditions.</para>
1505         </section>
1506         <section>
1507           <title>Usage</title>
1508           <screen>lctl set_param osd_ldiskfs.${FSNAME}-${MDT_target}.auto_scrub=<replaceable>N</replaceable></screen>
1509           <para>where
1510           <replaceable>N</replaceable>is an integer as described below.</para>
1511           <note condition='l25'><para>Lustre software 2.5 and later supports
1512           <literal>-P</literal> option that makes the
1513           <literal>set_param</literal> permanent.</para></note>
1514         </section>
1515         <section>
1516           <title>Values</title>
1517           <informaltable frame="all">
1518             <tgroup cols="2">
1519               <colspec colname="c1" colwidth="3*" />
1520               <colspec colname="c2" colwidth="7*" />
1521               <tbody>
1522                 <row>
1523                   <entry>
1524                     <para>0</para>
1525                   </entry>
1526                   <entry>
1527                     <para>Do not start OI Scrub automatically.</para>
1528                   </entry>
1529                 </row>
1530                 <row>
1531                   <entry>
1532                     <para>positive integer</para>
1533                   </entry>
1534                   <entry>
1535                     <para>Automatically start OI Scrub if inconsistency is
1536                     detected during OI lookup.</para>
1537                   </entry>
1538                 </row>
1539               </tbody>
1540             </tgroup>
1541           </informaltable>
1542         </section>
1543       </section>
1544     </section>
1545   </section>
1546 </chapter>
1547 <!--
1548   vim:expandtab:shiftwidth=2:tabstop=8:
1549   -->