Whamcloud - gitweb
LUDOC-155 lfsck: LFSCK Phase II Additions
[doc/manual.git] / TroubleShootingRecovery.xml
1 <?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="troubleshootingrecovery">
2     <title xml:id="troubleshootingrecovery.title">Troubleshooting Recovery</title>
3     <para>This chapter describes what to do if something goes wrong during recovery. It describes:</para>
4     <itemizedlist>
5         <listitem>
6             <para><xref linkend="dbdoclet.50438225_71141"/></para>
7         </listitem>
8         <listitem>
9             <para><xref linkend="dbdoclet.50438225_37365"/></para>
10         </listitem>
11         <listitem>
12             <para><xref linkend="dbdoclet.50438225_12316"/></para>
13         </listitem>
14         <listitem>
15             <para><xref linkend="dbdoclet.lfsckadmin"/></para>
16         </listitem>
17     </itemizedlist>
18     <section xml:id="dbdoclet.50438225_71141">
19         <title><indexterm><primary>recovery</primary><secondary>corruption of backing file system</secondary></indexterm>Recovering from Errors or Corruption on a Backing File System</title>
20         <para>When an OSS, MDS, or MGS server crash occurs, it is not necessary to run e2fsck on the
21             file system. <literal>ldiskfs</literal> journaling ensures that the file system remains
22             consistent over a system crash. The backing file systems are never accessed directly
23             from the client, so client crashes are not relevant for server file system
24             consistency.</para>
25         <para>The only time it is REQUIRED that <literal>e2fsck</literal> be run on a device is when an event causes problems that ldiskfs journaling is unable to handle, such as a hardware device failure or I/O error. If the ldiskfs kernel code detects corruption on the disk, it mounts the file system as read-only to prevent further corruption, but still allows read access to the device. This appears as error &quot;-30&quot; (<literal>EROFS</literal>) in the syslogs on the server, e.g.:</para>
26         <screen>Dec 29 14:11:32 mookie kernel: LDISKFS-fs error (device sdz):
27             ldiskfs_lookup: unlinked inode 5384166 in dir #145170469
28 Dec 29 14:11:32 mookie kernel: Remounting filesystem read-only</screen>
29         <para>In such a situation, it is normally required that e2fsck only be run on the bad device before placing the device back into service.</para>
30         <para>In the vast majority of cases, the Lustre software can cope with any inconsistencies
31             found on the disk and between other devices in the file system.</para>
32         <note>
33             <para>The offline LFSCK tool included with e2fsprogs is rarely required for Lustre file
34                 system operation.</para>
35         </note>
36         <para>For problem analysis, it is strongly recommended that <literal>e2fsck</literal> be run under a logger, like script, to record all of the output and changes that are made to the file system in case this information is needed later.</para>
37         <para>If time permits, it is also a good idea to first run <literal>e2fsck</literal> in non-fixing mode (-n option) to assess the type and extent of damage to the file system. The drawback is that in this mode, <literal>e2fsck</literal> does not recover the file system journal, so there may appear to be file system corruption when none really exists.</para>
38         <para>To address concern about whether corruption is real or only due to the journal not
39             being replayed, you can briefly mount and unmount the <literal>ldiskfs</literal> file
40             system directly on the node with the Lustre file system stopped, using a command similar
41             to:</para>
42         <screen>mount -t ldiskfs /dev/{ostdev} /mnt/ost; umount /mnt/ost</screen>
43         <para>This causes the journal to be recovered.</para>
44         <para>The <literal>e2fsck</literal> utility works well when fixing file system corruption
45             (better than similar file system recovery tools and a primary reason why
46                 <literal>ldiskfs</literal> was chosen over other file systems). However, it is often
47             useful to identify the type of damage that has occurred so an <literal>ldiskfs</literal>
48             expert can make intelligent decisions about what needs fixing, in place of
49                 <literal>e2fsck</literal>.</para>
50         <screen>root# {stop lustre services for this device, if running}
51 root# script /tmp/e2fsck.sda
52 Script started, file is /tmp/e2fsck.sda
53 root# mount -t ldiskfs /dev/sda /mnt/ost
54 root# umount /mnt/ost
55 root# e2fsck -fn /dev/sda   # don&apos;t fix file system, just check for corruption
56 :
57 [e2fsck output]
58 :
59 root# e2fsck -fp /dev/sda   # fix errors with prudent answers (usually <literal>yes</literal>)
60         </screen>
61         <para>In addition, the <literal>e2fsprogs</literal> package contains the LFSCK tool, which
62             does distributed coherency checking for the Lustre file system after
63                 <literal>e2fsck</literal> has been run. Running LFSCK is NOT required in a large
64             majority of cases, at a small risk of having some leaked space in the file system. To
65             avoid a lengthy downtime, it can be run (with care) after the Lustre file system is
66             started.</para>
67     </section>
68     <section xml:id="dbdoclet.50438225_37365">
69         <title><indexterm><primary>recovery</primary><secondary>corruption of Lustre file system</secondary></indexterm>Recovering from Corruption in the Lustre File System</title>
70         <para>In cases where the MDS or an OST becomes corrupt, you can run a distributed check on the file system to determine what sort of problems exist. Use LFSCK to correct any defects found.</para>
71         <orderedlist>
72             <listitem>
73                 <para>Stop the Lustre file system.</para>
74             </listitem>
75             <listitem>
76                 <para>Run <literal>e2fsck -f</literal> on the individual MDS / OST that had problems to fix any local file system damage.</para>
77                 <para>We recommend running <literal>e2fsck</literal> under script, to create a log of changes made to the file system in case it is needed later. After <literal>e2fsck</literal> is run, bring up the file system, if necessary, to reduce the outage window.</para>
78             </listitem>
79             <listitem>
80                 <para>Run a full <literal>e2fsck</literal> of the MDS to create a database for LFSCK. You <emphasis>must</emphasis> use the <literal>-n</literal> option for a mounted file system, otherwise you will corrupt the file system.</para>
81                 <screen>e2fsck -n -v --mdsdb /tmp/mdsdb /dev/{mdsdev}
82                 </screen>
83                 <para>The <literal>mds</literal>db file can grow fairly large, depending on the number of files in the file system (10 GB or more for millions of files, though the actual file size is larger because the file is sparse). It is quicker to write the file to a local file system due to seeking and small writes. Depending on the number of files, this step can take several hours to complete.</para>
84                 <para><emphasis role="bold">Example</emphasis></para>
85                 <screen>e2fsck -n -v --mdsdb /tmp/mdsdb /dev/sdb
86 e2fsck 1.42.5.wc3 (15-Sep-2012)
87 Warning: skipping journal recovery because doing a read-only filesystem check.
88 lustre-MDT0000 contains a file system with errors, check forced.
89 Pass 1: Checking inodes, blocks, and sizes
90 MDS: ost_idx 0 max_id 288
91 MDS: got 8 bytes = 1 entries in lov_objids
92 MDS: max_files = 13
93 MDS: num_osts = 1
94 mds info db file written
95 Pass 2: Checking directory structure
96 Pass 3: Checking directory connectivity
97 Pass 4: Checking reference counts
98 Pass 5: Checking group summary information
99 Free blocks count wrong (656160, counted=656058).
100 Fix? no
101
102 Free inodes count wrong (786419, counted=786036).
103 Fix? no
104
105 Pass 6: Acquiring information for lfsck
106 MDS: max_files = 13
107 MDS: num_osts = 1
108 MDS: &apos;lustre-MDT0000_UUID&apos; mdt idx 0: compat 0x4 rocomp 0x1 incomp 0x4
109 lustre-MDT0000: ******* WARNING: Filesystem still has errors *******
110 13 inodes used (0%)
111 2 non-contiguous inodes (15.4%)
112 # of inodes with ind/dind/tind blocks: 0/0/0
113 130272 blocks used (16%)
114 0 bad blocks
115 1 large file
116 296 regular files
117 91 directories
118 0 character device files
119 0 block device files
120 0 fifos
121 0 links
122 0 symbolic links (0 fast symbolic links)
123 0 sockets
124 --------
125 387 files
126                 </screen>
127             </listitem>
128             <listitem>
129                 <para>Make this file accessible on all OSTs, either by using a shared file system or copying the file to the OSTs. The <literal>pdcp</literal> command is useful here.</para>
130                 <para>The <literal>pdcp</literal> command (installed with <literal>pdsh</literal>), can be used to copy files to groups of hosts. <literal>pdcp</literal> is available here:</para>
131                 <para><link xl:href="http://sourceforge.net/projects/pdsh">http://sourceforge.net/projects/pdsh</link></para>
132             </listitem>
133             <listitem>
134                 <para>Run a similar <literal>e2fsck</literal> step on the OSTs. The <literal>e2fsck --ostdb</literal> command can be run in parallel on all OSTs.</para>
135                 <screen>e2fsck -n -v --mdsdb /tmp/mdsdb --ostdb /tmp/{ostNdb} \/dev/{ostNdev}
136                 </screen>
137                 <para>The <literal>mdsdb</literal> file is read-only in this step; a single copy can be shared by all OSTs.</para>
138                 <note>
139                     <para>If the OSTs do not have shared file system access to the MDS, a stub <literal>mdsdb</literal> file, <literal>{mdsdb}.mdshdr</literal>, is generated. This can be used instead of the full <literal>mdsdb</literal> file.</para>
140                 </note>
141                 <para><emphasis role="bold">Example:</emphasis></para>
142                 <screen>[root@oss161 ~]# e2fsck -n -v --mdsdb /tmp/mdsdb --ostdb \ /tmp/ostdb /dev/sda
143 e2fsck 1.42.5.wc3 (15-Sep-2012)
144 Warning: skipping journal recovery because doing a read-only filesystem check.
145 lustre-OST0000 contains a file system with errors, check forced.
146 Pass 1: Checking inodes, blocks, and sizes
147 Pass 2: Checking directory structure
148 Pass 3: Checking directory connectivity
149 Pass 4: Checking reference counts
150 Pass 5: Checking group summary information
151 Free blocks count wrong (989015, counted=817968).
152 Fix? no
153
154 Free inodes count wrong (262088, counted=261767).
155 Fix? no
156
157 Pass 6: Acquiring information for lfsck
158 OST: &apos;lustre-OST0000_UUID&apos; ost idx 0: compat 0x2 rocomp 0 incomp 0x2
159 OST: num files = 321
160 OST: last_id = 321
161
162 lustre-OST0000: ******* WARNING: Filesystem still has errors *******
163
164 56 inodes used (0%)
165 27 non-contiguous inodes (48.2%)
166 # of inodes with ind/dind/tind blocks: 13/0/0
167 59561 blocks used (5%)
168 0 bad blocks
169 1 large file
170 329 regular files
171 39 directories
172 0 character device files
173 0 block device files
174 0 fifos
175 0 links
176 0 symbolic links (0 fast symbolic links)
177 0 sockets
178 --------
179 368 files
180                 </screen>
181             </listitem>
182             <listitem>
183                 <para>Make the <literal>mdsdb</literal> file and all <literal>ostdb</literal> files available on a mounted client and run LFSKC to examine the file system. Optionally, correct the defects found by LFSCK.</para>
184                 <screen>script /root/lfsck.lustre.log
185                     lfsck -n -v --mdsdb /tmp/mdsdb --ostdb /tmp/{ost1db} /tmp/{ost2db} ... /lustre/mount/point\</screen>
186                 <para><emphasis role="bold">Example:</emphasis></para>
187                 <screen>script /root/lfsck.lustre.log
188 lfsck -n -v --mdsdb /home/mdsdb --ostdb /home/{ost1db} /mnt/lustre/client/
189 MDSDB: /home/mdsdb
190 OSTDB[0]: /home/ostdb
191 MOUNTPOINT: /mnt/lustre/client/
192 MDS: max_id 288 OST: max_id 321
193 lfsck: ost_idx 0: pass1: check for duplicate objects
194 lfsck: ost_idx 0: pass1 OK (287 files total)
195 lfsck: ost_idx 0: pass2: check for missing inode objects
196 lfsck: ost_idx 0: pass2 OK (287 objects)
197 lfsck: ost_idx 0: pass3: check for orphan objects
198 [0] uuid lustre-OST0000_UUID
199 [0] last_id 288
200 [0] zero-length orphan objid 1
201 lfsck: ost_idx 0: pass3 OK (321 files total)
202 lfsck: pass4: check for duplicate object references
203 lfsck: pass4 OK (no duplicates)
204 lfsck: fixed 0 errors</screen>
205                 <para>By default, LFSCK reports errors, but it does not repair any inconsistencies found. LFSCK checks for three kinds of inconsistencies:</para>
206                 <itemizedlist>
207                     <listitem>
208                         <para>Inode exists but has missing objects (dangling inode). This normally happens if there was a problem with an OST.</para>
209                     </listitem>
210                     <listitem>
211                         <para>Inode is missing but OST has unreferenced objects (orphan object). Normally, this happens if there was a problem with the MDS.</para>
212                     </listitem>
213                     <listitem>
214                         <para>Multiple inodes reference the same objects. This can happen if the MDS is corrupted or if the MDS storage is cached and loses some, but not all, writes.</para>
215                     </listitem>
216                 </itemizedlist>
217                 <para>If the file system is in use and being modified while the <literal>--mdsdb</literal> and <literal>--ostdb</literal> steps are running, LFSCK may report inconsistencies where none exist due to files and objects being created/removed after the database files were collected. Examine the LFSCK results closely. You may want to re-run the test.</para>
218             </listitem>
219         </orderedlist>
220         <section xml:id="dbdoclet.50438225_13916">
221             <title><indexterm><primary>recovery</primary><secondary>orphaned objects</secondary></indexterm>Working with Orphaned Objects</title>
222             <para>The easiest problem to resolve is that of orphaned objects. When the <literal>-l</literal> option for LFSCK is used, these objects are linked to new files and put into <literal>lost+found</literal> in the Lustre file system, where they can be examined and saved or deleted as necessary. If you are certain the objects are not useful, run LFSCK with the <literal>-d</literal> option to delete orphaned objects and free up any space they are using.</para>
223             <para>To fix dangling inodes, use LFSCK with the <literal>-c</literal> option to create new, zero-length objects on the OSTs. These files read back with binary zeros for stripes that had objects re-created. Even without LFSCK repair, these files can be read by entering:</para>
224             <screen>dd if=/lustre/bad/file of=/new/file bs=4k conv=sync,noerror</screen>
225             <para>Because it is rarely useful to have files with large holes in them, most users delete these files after reading them (if useful) and/or restoring them from backup.</para>
226             <note>
227                 <para>You cannot write to the holes of such files without having LFSCK re-create the objects. Generally, it is easier to delete these files and restore them from backup.</para>
228             </note>
229             <para>To fix inodes with duplicate objects, use LFSCK with the <literal>-c</literal> option to copy the duplicate object to a new object and assign it to a file. One file will be okay and the duplicate will likely contain garbage. By itself, LFSCK cannot tell which file is the usable one.</para>
230         </section>
231     </section>
232     <section xml:id="dbdoclet.50438225_12316">
233         <title><indexterm><primary>recovery</primary><secondary>unavailable OST</secondary></indexterm>Recovering from an Unavailable OST</title>
234         <para>One of the most common problems encountered in a Lustre file system environment is
235             when an OST becomes unavailable due to a network partition, OSS node crash, etc. When
236             this happens, the OST&apos;s clients pause and wait for the OST to become available
237             again, either on the primary OSS or a failover OSS. When the OST comes back online, the
238             Lustre file system starts a recovery process to enable clients to reconnect to the OST.
239             Lustre servers put a limit on the time they will wait in recovery for clients to
240             reconnect. The timeout length is determined by the <literal>obd_timeout</literal>
241             parameter.</para>
242         <para>During recovery, clients reconnect and replay their requests serially, in the same order they were done originally. Until a client receives a confirmation that a given transaction has been written to stable storage, the client holds on to the transaction, in case it needs to be replayed. Periodically, a progress message prints to the log, stating how_many/expected clients have reconnected. If the recovery is aborted, this log shows how many clients managed to reconnect. When all clients have completed recovery, or if the recovery timeout is reached, the recovery period ends and the OST resumes normal request processing.</para>
243         <para>If some clients fail to replay their requests during the recovery period, this will not stop the recovery from completing. You may have a situation where the OST recovers, but some clients are not able to participate in recovery (e.g. network problems or client failure), so they are evicted and their requests are not replayed. This would result in any operations on the evicted clients failing, including in-progress writes, which would cause cached writes to be lost. This is a normal outcome; the recovery cannot wait indefinitely, or the file system would be hung any time a client failed. The lost transactions are an unfortunate result of the recovery process.</para>
244         <note>
245             <para>The version-based recovery (VBR) feature enables a failed client to be &apos;&apos;skipped&apos;&apos;, so remaining clients can replay their requests, resulting in a more successful recovery from a downed OST. For more information about the VBR feature, see <xref linkend="lustrerecovery"/>(Version-based Recovery).</para>
246         </note>
247     </section>
248     <section xml:id="dbdoclet.lfsckadmin" condition='l23'>
249         <title><indexterm><primary>recovery</primary><secondary>oiscrub</secondary></indexterm><indexterm><primary>recovery</primary><secondary>lfsck</secondary></indexterm>Checking the file system with LFSCK</title>
250         <para>LFSCK is an administrative tool introduced in Lustre software release 2.3 for checking
251             and repair of the attributes specific to a mounted Lustre file system. It is similar in
252             concept to the offline LFSCK Lustre repair tool that is included with the Lustre
253                 <literal>e2fsprogs</literal> package (see <xref linkend="dbdoclet.50438225_37365"
254             />), but LFSCK is implemented to run as part of the Lustre file system while the file
255             system is mounted and in use. This allows consistency of checking and repair by the
256             Lustre software without unnecessary downtime, and can be run on the largest Lustre file
257             systems.</para>
258         <para>In Lustre software release 2.3, LFSCK can verify and repair the Object Index (OI)
259             table that is used internally to map Lustre File Identifiers (FIDs) to MDT internal
260             inode numbers, through a process called OI Scrub. An OI Scrub is required after
261             restoring from a file-level MDT backup (<xref linkend="dbdoclet.50438207_71633"/>), or
262             in case the OI table is otherwise corrupted. Later phases of LFSCK will add further
263             checks to the Lustre distributed file system state.</para>
264         <para condition='l24'>In Lustre software release 2.4, LFSCK can verify and repairing FID-in-Dirent and LinkEA consistency.
265 </para>
266         <para condition='l26'>In Lustre software release 2.6, LFSCK can verify and repair MDT-OST file layout inconsistency. File layout inconsistencies between MDT-objects and OST-objects that are checked and corrected include dangling reference, unreferenced OST-objects, mismatched references and multiple references.
267 </para>
268         <para>Control and monitoring of LFSCK is through LFSCK and the <literal>/proc</literal> file system
269             interfaces. LFSCK supports three types of interface: switch interfaces, status
270             interfaces and adjustments interfaces. These interfaces are detailed below.</para>
271     <section>
272         <title>LFSCK switch interface</title>
273         <section>
274             <title>Manually Starting LFSCK</title>
275             <section>
276                 <title>Synopsis</title>
277                 <screen>lctl lfsck_start -M | --device <replaceable>[MDT,OST]_device</replaceable> \
278                     [-e | --error <replaceable>error_handle</replaceable>] \
279                     [-h | --help] \
280                     [-n | --dryrun <replaceable>switch</replaceable>] \
281                     [-r | --reset] \
282                     [-s | --speed <replaceable>speed_limit</replaceable>] \
283                     [-A | --all] \
284                     [-t | --type <replaceable>lfsck_type[,lfsck_type...]</replaceable>] \
285                     [-w | --windows <replaceable>win_size</replaceable>] \
286                     [-o | --orphan]
287                 </screen>
288             </section>
289             <section>
290                 <title>Description</title>
291                 <para>This command is used by LFSCK after the MDT is mounted.</para>
292             </section>
293             <section>
294                 <title>Options</title>
295                 <para>The various <literal>lfsck_start</literal> options are listed and described below. For a complete list of available options, type <literal>lctl lfsck_start -h</literal>.</para>
296                 <informaltable frame="all">
297                     <tgroup cols="2">
298                         <colspec colname="c1" colwidth="3*"/>
299                         <colspec colname="c2" colwidth="7*"/>
300                         <thead>
301                             <row>
302                                 <entry>
303                                     <para><emphasis role="bold">Option</emphasis></para>
304                                 </entry>
305                                 <entry>
306                                     <para><emphasis role="bold">Description</emphasis></para>
307                                 </entry>
308                             </row>
309                         </thead>
310                         <tbody>
311                             <row>
312                                 <entry>
313                                     <para><literal>-M | --device</literal> </para>
314                                 </entry>
315                                 <entry>
316                                     <para>The MDT or OST device to start LFSCK/scrub on.</para>
317                                 </entry>
318                             </row>
319                             <row>
320                                 <entry>
321                                     <para><literal>-e | --error</literal> </para>
322                                 </entry>
323                                 <entry>
324                                     <para>Error handle, <literal>continue</literal> (default) or <literal>abort</literal>. Specify whether the LFSCK will stop or not if fail to repair something. If it is not specified, the saved value (when resuming from checkpoint) will be used if present. This option cannot be changed if LFSCK is running.</para>
325                                 </entry>
326                             </row>
327                             <row>
328                                 <entry>
329                                     <para><literal>-h | --help</literal> </para>
330                                 </entry>
331                                 <entry>
332                                     <para>Operating help information.</para>
333                                 </entry>
334                             </row>
335                             <row>
336                                 <entry>
337                                     <para><literal>-n | --dryrun</literal> </para>
338                                 </entry>
339                                 <entry>
340                                     <para>Perform a trial without making any changes. <literal>off</literal> (default) or <literal>on</literal>.</para>
341                                 </entry>
342                             </row>
343                             <row>
344                                 <entry>
345                                     <para><literal>-r | --reset</literal> </para>
346                                 </entry>
347                                 <entry>
348                                     <para>Reset the start position for the object iteration to the beginning for the specified MDT. By default the iterator will resume scanning from the last checkpoint (saved periodically by LFSCK) provided it is available.</para>
349                                 </entry>
350                             </row>
351                             <row>
352                                 <entry>
353                                     <para><literal>-s | --speed</literal> </para>
354                                 </entry>
355                                 <entry>
356                                     <para>Set the upper speed limit of LFSCK processing in objects per second. If it is not specified, the saved value (when resuming from checkpoint) or default value of 0 (0 = run as fast as possible) is used. Speed can be adjusted while LFSCK is running with the adjustment interface.</para>
357                                 </entry>
358                             </row>
359                             <row>
360                                 <entry>
361                                     <para><literal>-A | --all</literal> </para>
362                                 </entry>
363                                 <entry>
364                                     <para condition='l26'>Start LFSCK on all devices via a single lctl command. It is not only used for layout consistency check/repair, but also for other LFSCK components, such as LFSCK for namespace consistency (LFSCK 1.5) and for DNE consistency check/repair in the future.</para>
365                                 </entry>
366                             </row>
367                             <row>
368                                 <entry>
369                                     <para><literal>-t | --type</literal> </para>
370                                 </entry>
371                                 <entry>
372                                     <para>The type of checking/repairing that should be performed. The new LFSCK framework provides a single interface for a variety of system consistency checking/repairing operations including:</para>
373 <para>Without a specified option, the LFSCK component(s) which ran last time and did not finish or the component(s) corresponding to some known system inconsistency, will be started. Anytime the LFSCK is triggered, the OI scrub will run automatically, so there is no need to specify OI_scrub.</para>
374 <para condition='l24'><literal>namespace</literal>: check and repair FID-in-Dirent and LinkEA consistency.</para>
375 <para condition='l26'><literal>layout</literal>: check and repair MDT-OST inconsistency.</para>
376                                 </entry>
377                             </row>
378                             <row>
379                                 <entry>
380                                     <para><literal>-w | --windows</literal> </para>
381                                 </entry>
382                                 <entry>
383                                     <para condition='l26'>The windows size for async requests pipeline.</para>
384                                 </entry>
385                             </row>
386                             <row>
387                                 <entry>
388                                     <para><literal>-o | --orphan</literal> </para>
389                                 </entry>
390                                 <entry>
391                                     <para condition='l26'>Handle orphan objects, such as orphan OST-objects for layout LFSCK.</para>
392                                 </entry>
393                             </row>
394                         </tbody>
395                     </tgroup>
396                 </informaltable>
397             </section>
398         </section>
399         <section>
400             <title>Manually Stopping LFSCK</title>
401             <section>
402                 <title>Synopsis</title>
403                 <screen>lctl lfsck_stop -M | --device <replaceable>[MDT,OST]_device</replaceable> \
404                     [-A | --all] \
405                     [-h | --help]
406                 </screen>
407             </section>
408             <section>
409                 <title>Description</title>
410                 <para>This command is used by LFSCK after the MDT is mounted.</para>
411             </section>
412             <section>
413                 <title>Options</title>
414                 <para>The various <literal>lfsck_stop</literal> options are listed and described below. For a complete list of available options, type <literal>lctl lfsck_stop -h</literal>.</para>
415                 <informaltable frame="all">
416                     <tgroup cols="2">
417                         <colspec colname="c1" colwidth="3*"/>
418                         <colspec colname="c2" colwidth="7*"/>
419                         <thead>
420                             <row>
421                                 <entry>
422                                     <para><emphasis role="bold">Option</emphasis></para>
423                                 </entry>
424                                 <entry>
425                                     <para><emphasis role="bold">Description</emphasis></para>
426                                 </entry>
427                             </row>
428                         </thead>
429                         <tbody>
430                             <row>
431                                 <entry>
432                                     <para><literal>-M | --device</literal> </para>
433                                 </entry>
434                                 <entry>
435                                     <para>The MDT or OST device to stop LFSCK/scrub on.</para>
436                                 </entry>
437                             </row>
438                             <row>
439                                 <entry>
440                                     <para><literal>-A | --all</literal> </para>
441                                 </entry>
442                                 <entry>
443                                     <para>Stop LFSCK on all devices.</para>
444                                 </entry>
445                             </row>
446                             <row>
447                                 <entry>
448                                     <para><literal>-h | --help</literal> </para>
449                                 </entry>
450                                 <entry>
451                                     <para>Operating help information.</para>
452                                 </entry>
453                             </row>
454                         </tbody>
455                     </tgroup>
456                 </informaltable>
457             </section>
458         </section>
459     </section>
460     <section>
461         <title>LFSCK status interface</title>
462         <section>
463             <title>LFSCK status of OI Scrub via <literal>procfs</literal></title>
464             <section >
465                 <title>Synopsis</title>
466                 <screen>lctl get_param -n osd-ldiskfs.<replaceable>FSNAME</replaceable>-<replaceable>MDT_device</replaceable>.oi_scrub
467                 </screen>
468             </section>
469             <section>
470                 <title>Description</title>
471                 <para>For each LFSCK component there is a dedicated procfs interface to trace corresponding LFSCK component status. For OI Scrub, the interface is the OSD layer procfs interface, named <literal>oi_scrub</literal>. To display OI Scrub status, the standard <literal>lctl get_param</literal> command is used as described in the synopsis.</para>
472             </section>
473             <section>
474                 <title>Output</title>
475                 <informaltable frame="all">
476                     <tgroup cols="2">
477                         <colspec colname="c1" colwidth="3*"/>
478                         <colspec colname="c2" colwidth="7*"/>
479                         <thead>
480                             <row>
481                                 <entry>
482                                     <para><emphasis role="bold">Information</emphasis></para>
483                                 </entry>
484                                 <entry>
485                                     <para><emphasis role="bold">Detail</emphasis></para>
486                                 </entry>
487                             </row>
488                         </thead>
489                         <tbody>
490                             <row>
491                                 <entry>
492                                     <para>General Information</para>
493                                 </entry>
494                                 <entry>
495                                     <itemizedlist>
496                                         <listitem><para>Name: OI_scrub.</para></listitem>
497                                         <listitem><para>OI scrub magic id (an identifier unique to OI scrub).</para></listitem>
498                                         <listitem><para>OI files count.</para></listitem>
499                                         <listitem><para>Status: one of the status - <literal>init</literal>, <literal>scanning</literal>, <literal>completed</literal>, <literal>failed</literal>, <literal>stopped</literal>, <literal>paused</literal>, or <literal>crashed</literal>.</para></listitem>
500                                         <listitem><para>Flags: including - <literal>recreated</literal> (OI file(s) is/are removed/recreated),
501                                                   <literal>inconsistent</literal> (restored from
502                                                   file-level backup), <literal>auto</literal>
503                                                   (triggered by non-UI mechanism), and
504                                                   <literal>upgrade</literal> (from Lustre software
505                                                   release 1.8 IGIF format.)</para></listitem>
506                                         <listitem><para>Parameters: OI scrub parameters, like <literal>failout</literal>.</para></listitem>
507                                         <listitem><para>Time Since Last Completed.</para></listitem>
508                                         <listitem><para>Time Since Latest Start.</para></listitem>
509                                         <listitem><para>Time Since Last Checkpoint.</para></listitem>
510                                         <listitem><para>Latest Start Position: the position for the latest scrub started from.</para></listitem>
511                                         <listitem><para>Last Checkpoint Position.</para></listitem>
512                                         <listitem><para>First Failure Position: the position for the first object to be repaired.</para></listitem>
513                                         <listitem><para>Current Position.</para></listitem>
514                                     </itemizedlist>
515                                 </entry>
516                             </row>
517                             <row>
518                                 <entry>
519                                     <para>Statistics</para>
520                                 </entry>
521                                 <entry>
522                                     <itemizedlist>
523                                         <listitem><para><literal>Checked</literal> total number of objects scanned.</para></listitem>
524                                         <listitem><para><literal>Updated</literal> total number of objects repaired.</para></listitem>
525                                         <listitem><para><literal>Failed</literal> total number of objects that failed to be repaired.</para></listitem>
526                                         <listitem><para><literal>No Scrub</literal> total number of objects marked <literal>LDISKFS_STATE_LUSTRE_NOSCRUB and skipped</literal>.</para></listitem>
527                                         <listitem><para><literal>IGIF</literal> total number of objects IGIF scanned.</para></listitem>
528                                         <listitem><para><literal>Prior Updated</literal> how many objects have been repaired which are triggered by parallel RPC.</para></listitem>
529                                         <listitem><para><literal>Success Count</literal> total number of completed OI_scrub runs on the device.</para></listitem>
530                                         <listitem><para><literal>Run Time</literal> how long the scrub has run, tally from the time of scanning from the beginning of the specified MDT device, not include the paused/failure time among checkpoints.</para></listitem>
531                                         <listitem><para><literal>Average Speed</literal> calculated by dividing <literal>Checked</literal> by <literal>run_time</literal>.</para></listitem>
532                                         <listitem><para><literal>Real-Time Speed</literal> the speed since last checkpoint if the OI_scrub is running.</para></listitem>
533                                         <listitem><para><literal>Scanned</literal> total number of objects under /lost+found that have been scanned.</para></listitem>
534                                         <listitem><para><literal>Repaired</literal> total number of objects under /lost+found that have been recovered.</para></listitem>
535                                         <listitem><para><literal>Failed</literal> total number of objects under /lost+found failed to be scanned or failed to be recovered.</para></listitem>
536                                     </itemizedlist>
537                                 </entry>
538                             </row>
539                         </tbody>
540                     </tgroup>
541                 </informaltable>
542             </section>
543         </section>
544         <section condition='l24'>
545             <title>LFSCK status of namespace via <literal>procfs</literal></title>
546             <section >
547                 <title>Synopsis</title>
548                 <screen>lctl get_param -n mdd.<replaceable>FSNAME</replaceable>-<replaceable>MDT_device</replaceable>.lfsck_namespace
549                 </screen>
550             </section>
551             <section>
552                 <title>Description</title>
553                 <para>The <literal>namespace</literal> component is responsible for checking and repairing FID-in-Dirent and LinkEA consistency. The <literal>procfs</literal> interface for this component is in the MDD layer, named <literal>lfsck_namespace</literal>. To show the status of this component <literal>lctl get_param</literal> should be used as described in the synopsis.</para>
554             </section>
555             <section>
556                 <title>Output</title>
557                 <informaltable frame="all">
558                     <tgroup cols="2">
559                         <colspec colname="c1" colwidth="3*"/>
560                         <colspec colname="c2" colwidth="7*"/>
561                         <thead>
562                             <row>
563                                 <entry>
564                                     <para><emphasis role="bold">Information</emphasis></para>
565                                 </entry>
566                                 <entry>
567                                     <para><emphasis role="bold">Detail</emphasis></para>
568                                 </entry>
569                             </row>
570                         </thead>
571                         <tbody>
572                             <row>
573                                 <entry>
574                                     <para>General Information</para>
575                                 </entry>
576                                 <entry>
577                                     <itemizedlist>
578                                         <listitem><para>Name: <literal>lfsck_namespace</literal></para></listitem>
579                                         <listitem><para>LFSCK namespace magic.</para></listitem>
580                                         <listitem><para>LFSCK namespace version..</para></listitem>
581                                         <listitem><para>Status: one of the status - <literal>init</literal>, <literal>scanning-phase1</literal>, <literal>scanning-phase2</literal>, <literal>completed</literal>, <literal>failed</literal>, <literal>stopped</literal>, <literal>paused</literal>, or <literal>crashed</literal>.</para></listitem>
582                                         <listitem><para>Flags: including - <literal>scanned-once</literal> (the first cycle scanning has been
583                                                   completed), <literal>inconsistent</literal> (one
584                                                   or more inconsistent FID-in-Dirent or LinkEA
585                                                   entries have been discovered),
586                                                   <literal>upgrade</literal> (from Lustre software
587                                                   release 1.8 IGIF format.)</para></listitem>
588                                         <listitem><para>Parameters: including <literal>dryrun</literal>, <literal>all_targets</literal> and <literal>failout</literal>.</para></listitem>
589                                         <listitem><para>Time Since Last Completed.</para></listitem>
590                                         <listitem><para>Time Since Latest Start.</para></listitem>
591                                         <listitem><para>Time Since Last Checkpoint.</para></listitem>
592                                         <listitem><para>Latest Start Position: the position the checking began most recently.</para></listitem>
593                                         <listitem><para>Last Checkpoint Position.</para></listitem>
594                                         <listitem><para>First Failure Position: the position for the first object to be repaired.</para></listitem>
595                                         <listitem><para>Current Position.</para></listitem>
596                                     </itemizedlist>
597                                 </entry>
598                             </row>
599                             <row>
600                                 <entry>
601                                     <para>Statistics</para>
602                                 </entry>
603                                 <entry>
604                                     <itemizedlist>
605                                         <listitem><para><literal>Checked Phase1</literal> total number of objects scanned during <literal>scanning-phase1</literal>.</para></listitem>
606                                         <listitem><para><literal>Checked Phase2</literal> total number of objects scanned during <literal>scanning-phase2</literal>.</para></listitem>
607                                         <listitem><para><literal>Updated Phase1</literal> total number of objects repaired during <literal>scanning-phase1</literal>.</para></listitem>
608                                         <listitem><para><literal>Updated Phase2</literal> total number of objects repaired during <literal>scanning-phase2</literal>.</para></listitem>
609                                         <listitem><para><literal>Failed Phase1</literal> total number of objets that failed to be repaired during <literal>scanning-phase1</literal>.</para></listitem>
610                                         <listitem><para><literal>Failed Phase2</literal> total number of objets that failed to be repaired during <literal>scanning-phase2</literal>.</para></listitem>
611                                         <listitem><para><literal>Dirs</literal> total number of directories scanned.</para></listitem>
612                                         <listitem><para><literal>M-linked</literal> total number of multiple-linked objects that have been scanned.</para></listitem>
613                                         <listitem><para><literal>Nlinks Repaired</literal> total number of objects with nlink attributes that have been repaired.</para></listitem>
614                                         <listitem><para><literal>Lost_found</literal> total number of objects that have had a name entry added back to the namespace.</para></listitem>
615                                         <listitem><para><literal>Success Count</literal> the total number of completed LFSCK runs on the device.</para></listitem>
616                                         <listitem><para><literal>Run Time Phase1</literal> the duration of the LFSCK run during <literal>scanning-phase1</literal>. Excluding the time spent paused between checkpoints.</para></listitem>
617                                         <listitem><para><literal>Run Time Phase2</literal> the duration of the LFSCK run during <literal>scanning-phase2</literal>. Excluding the time spent paused between checkpoints.</para></listitem>
618                                         <listitem><para><literal>Average Speed Phase1</literal> calculated by dividing <literal>checked_phase1</literal> by <literal>run_time_phase1</literal>.</para></listitem>
619                                         <listitem><para><literal>Average Speed Phase2</literal> calculated by dividing <literal>checked_phase2</literal> by <literal>run_time_phase1</literal>.</para></listitem>
620                                         <listitem><para><literal>Real-Time Speed Phase1</literal> the speed since the last checkpoint if the LFSCK is running <literal>scanning-phase1</literal>.</para></listitem>
621                                         <listitem><para><literal>Real-Time Speed Phase2</literal> the speed since the last checkpoint if the LFSCK is running <literal>scanning-phase2</literal>.</para></listitem>
622                                     </itemizedlist>
623                                 </entry>
624                             </row>
625                         </tbody>
626                     </tgroup>
627                 </informaltable>
628             </section>
629         </section>
630         <section condition='l26'>
631             <title>LFSCK status of layout via <literal>procfs</literal></title>
632             <section >
633                 <title>Synopsis</title>
634                 <screen>lctl get_param -n mdd.<replaceable>FSNAME</replaceable>-<replaceable>MDT_device</replaceable>.lfsck_layout
635 lctl get_param -n obdfilter.<replaceable>FSNAME</replaceable>-<replaceable>OST_device</replaceable>.lfsck_layout
636                 </screen>
637             </section>
638             <section>
639                 <title>Description</title>
640                 <para>The <literal>layout</literal> component is responsible for checking and repairing MDT-OST inconsistency. The <literal>procfs</literal> interface for this component is in the MDD layer, named <literal>lfsck_layout</literal>, and in the OBD layer, named <literal>lfsck_layout</literal>. To show the status of this component <literal>lctl get_param</literal> should be used as described in the synopsis.</para>
641             </section>
642             <section>
643                 <title>Output</title>
644                 <informaltable frame="all">
645                     <tgroup cols="2">
646                         <colspec colname="c1" colwidth="3*"/>
647                         <colspec colname="c2" colwidth="7*"/>
648                         <thead>
649                             <row>
650                                 <entry>
651                                     <para><emphasis role="bold">Information</emphasis></para>
652                                 </entry>
653                                 <entry>
654                                     <para><emphasis role="bold">Detail</emphasis></para>
655                                 </entry>
656                             </row>
657                         </thead>
658                         <tbody>
659                             <row>
660                                 <entry>
661                                     <para>General Information</para>
662                                 </entry>
663                                 <entry>
664                                     <itemizedlist>
665                                         <listitem><para>Name: <literal>lfsck_layout</literal></para></listitem>
666                                         <listitem><para>LFSCK namespace magic.</para></listitem>
667                                         <listitem><para>LFSCK namespace version..</para></listitem>
668                                         <listitem><para>Status: one of the status - <literal>init</literal>, <literal>scanning-phase1</literal>, <literal>scanning-phase2</literal>, <literal>completed</literal>, <literal>failed</literal>, <literal>stopped</literal>, <literal>paused</literal>, <literal>crashed</literal>, <literal>partial</literal>, <literal>co-failed</literal>, <literal>co-stopped</literal>, or <literal>co-paused</literal>.</para></listitem>
669                                         <listitem><para>Flags: including - <literal>scanned-once</literal> (the first cycle scanning has been
670                                                   completed), <literal>inconsistent</literal> (one
671                                                   or more MDT-OST inconsistencies
672                                                   have been discovered),
673                                                   <literal>incomplete</literal> (some MDT or OST did not participate in the LFSCK or failed to finish the LFSCK) or <literal>crashed_lastid</literal> (the lastid files on the OST crashed and needs to be rebuilt).</para></listitem>
674                                         <listitem><para>Parameters: including <literal>dryrun</literal>, <literal>all_targets</literal> and <literal>failout</literal>.</para></listitem>
675                                         <listitem><para>Time Since Last Completed.</para></listitem>
676                                         <listitem><para>Time Since Latest Start.</para></listitem>
677                                         <listitem><para>Time Since Last Checkpoint.</para></listitem>
678                                         <listitem><para>Latest Start Position: the position the checking began most recently.</para></listitem>
679                                         <listitem><para>Last Checkpoint Position.</para></listitem>
680                                         <listitem><para>First Failure Position: the position for the first object to be repaired.</para></listitem>
681                                         <listitem><para>Current Position.</para></listitem>
682                                     </itemizedlist>
683                                 </entry>
684                             </row>
685                             <row>
686                                 <entry>
687                                     <para>Statistics</para>
688                                 </entry>
689                                 <entry>
690                                     <itemizedlist>
691                                         <listitem><para><literal>Success Count:</literal> the total number of completed LFSCK runs on the device.</para></listitem>
692                                         <listitem><para><literal>Repaired Dangling:</literal> total number of MDT-objects with dangling reference have been repaired in the scanning-phase1.</para></listitem>
693                                         <listitem><para><literal>Repaired Unmatched Pairs</literal> total number of unmatched MDT and OST-object paris have been repaired in the scanning-phase1</para></listitem>
694                                         <listitem><para><literal>Repaired Multiple Referenced</literal> total number of OST-objects with multiple reference have been repaired in the scanning-phase1.</para></listitem>
695                                         <listitem><para><literal>Repaired Orphan</literal> total number of orphan OST-objects have been repaired in the scanning-phase2.</para></listitem>
696                                         <listitem><para><literal>Repaired Inconsistent Owner</literal> total number.of OST-objects with incorrect owner information have been repaired in the scanning-phase1.</para></listitem>
697                                         <listitem><para><literal>Repaired Others</literal> total number of.other inconsistency repaired in the scanning phases. </para></listitem>
698                                         <listitem><para><literal>Skipped</literal> Number of skipped objects.</para></listitem>
699                                         <listitem><para><literal>Failed Phase1</literal> total number of objects that failed to be repaired during <literal>scanning-phase1</literal>.</para></listitem>
700                                         <listitem><para><literal>Failed Phase2</literal> total number of objects that failed to be repaired during <literal>scanning-phase2</literal>.</para></listitem>
701                                         <listitem><para><literal>Checked Phase1</literal> total number of objects scanned during <literal>scanning-phase1</literal>.</para></listitem>
702                                         <listitem><para><literal>Checked Phase2</literal> total number of objects scanned during <literal>scanning-phase2</literal>.</para></listitem>
703                                         <listitem><para><literal>Run Time Phase1</literal> the duration of the LFSCK run during <literal>scanning-phase1</literal>. Excluding the time spent paused between checkpoints.</para></listitem>
704                                         <listitem><para><literal>Run Time Phase2</literal> the duration of the LFSCK run during <literal>scanning-phase2</literal>. Excluding the time spent paused between checkpoints.</para></listitem>
705                                         <listitem><para><literal>Average Speed Phase1</literal> calculated by dividing <literal>checked_phase1</literal> by <literal>run_time_phase1</literal>.</para></listitem>
706                                         <listitem><para><literal>Average Speed Phase2</literal> calculated by dividing <literal>checked_phase2</literal> by <literal>run_time_phase1</literal>.</para></listitem>
707                                         <listitem><para><literal>Real-Time Speed Phase1</literal> the speed since the last checkpoint if the LFSCK is running <literal>scanning-phase1</literal>.</para></listitem>
708                                         <listitem><para><literal>Real-Time Speed Phase2</literal> the speed since the last checkpoint if the LFSCK is running <literal>scanning-phase2</literal>.</para></listitem>
709                                     </itemizedlist>
710                                 </entry>
711                             </row>
712                         </tbody>
713                     </tgroup>
714                 </informaltable>
715             </section>
716         </section>
717     </section>
718     <section>
719         <title>LFSCK adjustment interface</title>
720         <section condition='l26'>
721             <title>Rate control</title>
722             <section>
723                 <title>Synopsis</title>
724                 <screen>lctl set_param mdd.${FSNAME}-${MDT_device}.lfsck_speed_limit=N
725 lctl set_param obdfilter.${FSNAME}-${OST_device}.lfsck_speed_limit=N</screen>
726             </section>
727             <section>
728                 <title>Description</title>
729                 <para>Change the LFSCK upper speed limit.</para>
730             </section>
731             <section>
732                 <title>Values</title>
733                 <informaltable frame="all">
734                     <tgroup cols="2">
735                         <colspec colname="c1" colwidth="3*"/>
736                         <colspec colname="c2" colwidth="7*"/>
737                         <tbody>
738                             <row>
739                                 <entry>
740                                     <para>0</para>
741                                 </entry>
742                                 <entry>
743                                     <para>No speed limit (run at maximum speed.)</para>
744                                 </entry>
745                             </row>
746                             <row>
747                                 <entry>
748                                     <para>positive integer</para>
749                                 </entry>
750                                 <entry>
751                                     <para>Maximum number of objects to scan per second.</para>
752                                 </entry>
753                             </row>
754                         </tbody>
755                     </tgroup>
756                 </informaltable>
757             </section>
758         </section>
759         <section>
760             <title>Mount options</title>
761             <section>
762                 <title>Synopsis</title>
763                 <screen>lctl set_param osd_ldiskfs.${FSNAME}-${MDT_device}.auto_scrub=N
764                 </screen>
765             </section>
766             <section>
767                 <title>Description</title>
768                 <para>Typically, the MDT will detect restoration from a file-level backup during mount. For convenience an mount option <literal>noscrub</literal> is provided for MDTs. <literal>noscrub</literal> prevents the OI Scrub starting automatically when the MDT is mounted. The administrator can start LFSCK manually after the MDT is mounted with <literal>lctl</literal>. Manually starting LFSCK after mounting provides finer control over the starting conditions.</para>
769             </section>
770             <section>
771                 <title>Values</title>
772                 <informaltable frame="all">
773                     <tgroup cols="2">
774                         <colspec colname="c1" colwidth="3*"/>
775                         <colspec colname="c2" colwidth="7*"/>
776                         <tbody>
777                             <row>
778                                 <entry>
779                                     <para>0</para>
780                                 </entry>
781                                 <entry>
782                                     <para>Do not start OI Scrub automatically.</para>
783                                 </entry>
784                             </row>
785                             <row>
786                                 <entry>
787                                     <para>positive integer</para>
788                                 </entry>
789                                 <entry>
790                                     <para>Manually start OI Scrub if needed.</para>
791                                 </entry>
792                             </row>
793                         </tbody>
794                     </tgroup>
795                 </informaltable>
796             </section>
797         </section>
798     </section>
799     </section>
800 </chapter>