From 689465077486fa6fcc1865cf923e8650fef75f65 Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Thu, 20 Mar 2014 18:57:08 -0600 Subject: [PATCH] LUDOC-232 lfsck: remove mention of old lfsck The old e2fsprogs lfsck has been completely replaced by the new in-kernel online LFSCK. Since the old lfsck is uncommonly used, and is no longer usable on new filesystems, it is removed from the manual completely instead of keeping both in the manual. Signed-off-by: Andreas Dilger Change-Id: I828be37acbc7972aa8798b9852598737d897a911 Reviewed-on: http://review.whamcloud.com/9741 Tested-by: Jenkins Reviewed-by: James Nunez Reviewed-by: Fan Yong Reviewed-by: Richard Henwood --- BackupAndRestore.xml | 2 +- TroubleShootingRecovery.xml | 181 ++++---------------------------------------- UnderstandingLustre.xml | 4 +- UserUtilities.xml | 120 ----------------------------- 4 files changed, 19 insertions(+), 288 deletions(-) diff --git a/BackupAndRestore.xml b/BackupAndRestore.xml index eb30d77..b41c8db 100644 --- a/BackupAndRestore.xml +++ b/BackupAndRestore.xml @@ -353,7 +353,7 @@ trusted.fid= \ [oss]# umount /mnt/ost - If the file system was used between the time the backup was made and when it was restored, then the lfsck tool (part of Lustre e2fsprogs) can optionally be run to ensure the file system is coherent. If all of the device file systems were backed up at the same time after the entire Lustre file system was stopped, this is not necessary. In either case, the file system should be immediately usable even if lfsck is not run, though there may be I/O errors reading from files that are present on the MDT but not the OSTs, and files that were created after the MDT backup will not be accessible/visible. + If the file system was used between the time the backup was made and when it was restored, then the online LFSCK tool (part of Lustre code) will automatically be run to ensure the file system is coherent. If all of the device file systems were backed up at the same time after the entire Lustre file system was stopped, this is not necessary. In either case, the file system should be immediately usable even if LFSCK is not run, though there may be I/O errors reading from files that are present on the MDT but not the OSTs, and files that were created after the MDT backup will not be accessible/visible. See for details on using LFSCK.
<indexterm> diff --git a/TroubleShootingRecovery.xml b/TroubleShootingRecovery.xml index 319fcfb..db703e8 100644 --- a/TroubleShootingRecovery.xml +++ b/TroubleShootingRecovery.xml @@ -58,16 +58,10 @@ root# e2fsck -fn /dev/sda # don't fix file system, just check for corrupt : root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually <literal>yes</literal>) </screen> - <para>In addition, the <literal>e2fsprogs</literal> package contains the LFSCK tool, which - does distributed coherency checking for the Lustre file system after - <literal>e2fsck</literal> has been run. Running LFSCK is NOT required in a large - majority of cases, at a small risk of having some leaked space in the file system. To - avoid a lengthy downtime, it can be run (with care) after the Lustre file system is - started.</para> </section> <section xml:id="dbdoclet.50438225_37365"> <title><indexterm><primary>recovery</primary><secondary>corruption of Lustre file system</secondary></indexterm>Recovering from Corruption in the Lustre File System - In cases where the MDS or an OST becomes corrupt, you can run a distributed check on the file system to determine what sort of problems exist. Use LFSCK to correct any defects found. + In cases where an ldiskfs MDT or OST becomes corrupt, you need to run e2fsck to correct the local filesystem consistency, then use LFSCK to run a distributed check on the file system to resolve any inconsistencies between the MDTs and OSTs. Stop the Lustre file system. @@ -76,172 +70,30 @@ root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually Run e2fsck -f on the individual MDS / OST that had problems to fix any local file system damage. We recommend running e2fsck under script, to create a log of changes made to the file system in case it is needed later. After e2fsck is run, bring up the file system, if necessary, to reduce the outage window. - - Run a full e2fsck of the MDS to create a database for LFSCK. You must use the -n option for a mounted file system, otherwise you will corrupt the file system. - e2fsck -n -v --mdsdb /tmp/mdsdb /dev/{mdsdev} - - The mdsdb file can grow fairly large, depending on the number of files in the file system (10 GB or more for millions of files, though the actual file size is larger because the file is sparse). It is quicker to write the file to a local file system due to seeking and small writes. Depending on the number of files, this step can take several hours to complete. - Example - e2fsck -n -v --mdsdb /tmp/mdsdb /dev/sdb -e2fsck 1.42.5.wc3 (15-Sep-2012) -Warning: skipping journal recovery because doing a read-only filesystem check. -lustre-MDT0000 contains a file system with errors, check forced. -Pass 1: Checking inodes, blocks, and sizes -MDS: ost_idx 0 max_id 288 -MDS: got 8 bytes = 1 entries in lov_objids -MDS: max_files = 13 -MDS: num_osts = 1 -mds info db file written -Pass 2: Checking directory structure -Pass 3: Checking directory connectivity -Pass 4: Checking reference counts -Pass 5: Checking group summary information -Free blocks count wrong (656160, counted=656058). -Fix? no - -Free inodes count wrong (786419, counted=786036). -Fix? no - -Pass 6: Acquiring information for lfsck -MDS: max_files = 13 -MDS: num_osts = 1 -MDS: 'lustre-MDT0000_UUID' mdt idx 0: compat 0x4 rocomp 0x1 incomp 0x4 -lustre-MDT0000: ******* WARNING: Filesystem still has errors ******* -13 inodes used (0%) -2 non-contiguous inodes (15.4%) -# of inodes with ind/dind/tind blocks: 0/0/0 -130272 blocks used (16%) -0 bad blocks -1 large file -296 regular files -91 directories -0 character device files -0 block device files -0 fifos -0 links -0 symbolic links (0 fast symbolic links) -0 sockets --------- -387 files - - - - Make this file accessible on all OSTs, either by using a shared file system or copying the file to the OSTs. The pdcp command is useful here. - The pdcp command (installed with pdsh), can be used to copy files to groups of hosts. pdcp is available here: - http://sourceforge.net/projects/pdsh - - - Run a similar e2fsck step on the OSTs. The e2fsck --ostdb command can be run in parallel on all OSTs. - e2fsck -n -v --mdsdb /tmp/mdsdb --ostdb /tmp/{ostNdb} \/dev/{ostNdev} - - The mdsdb file is read-only in this step; a single copy can be shared by all OSTs. - - If the OSTs do not have shared file system access to the MDS, a stub mdsdb file, {mdsdb}.mdshdr, is generated. This can be used instead of the full mdsdb file. - - Example: - [root@oss161 ~]# e2fsck -n -v --mdsdb /tmp/mdsdb --ostdb \ /tmp/ostdb /dev/sda -e2fsck 1.42.5.wc3 (15-Sep-2012) -Warning: skipping journal recovery because doing a read-only filesystem check. -lustre-OST0000 contains a file system with errors, check forced. -Pass 1: Checking inodes, blocks, and sizes -Pass 2: Checking directory structure -Pass 3: Checking directory connectivity -Pass 4: Checking reference counts -Pass 5: Checking group summary information -Free blocks count wrong (989015, counted=817968). -Fix? no - -Free inodes count wrong (262088, counted=261767). -Fix? no - -Pass 6: Acquiring information for lfsck -OST: 'lustre-OST0000_UUID' ost idx 0: compat 0x2 rocomp 0 incomp 0x2 -OST: num files = 321 -OST: last_id = 321 - -lustre-OST0000: ******* WARNING: Filesystem still has errors ******* - -56 inodes used (0%) -27 non-contiguous inodes (48.2%) -# of inodes with ind/dind/tind blocks: 13/0/0 -59561 blocks used (5%) -0 bad blocks -1 large file -329 regular files -39 directories -0 character device files -0 block device files -0 fifos -0 links -0 symbolic links (0 fast symbolic links) -0 sockets --------- -368 files - - - - Make the mdsdb file and all ostdb files available on a mounted client and run LFSKC to examine the file system. Optionally, correct the defects found by LFSCK. - script /root/lfsck.lustre.log - lfsck -n -v --mdsdb /tmp/mdsdb --ostdb /tmp/{ost1db} /tmp/{ost2db} ... /lustre/mount/point\ - Example: - script /root/lfsck.lustre.log -lfsck -n -v --mdsdb /home/mdsdb --ostdb /home/{ost1db} /mnt/lustre/client/ -MDSDB: /home/mdsdb -OSTDB[0]: /home/ostdb -MOUNTPOINT: /mnt/lustre/client/ -MDS: max_id 288 OST: max_id 321 -lfsck: ost_idx 0: pass1: check for duplicate objects -lfsck: ost_idx 0: pass1 OK (287 files total) -lfsck: ost_idx 0: pass2: check for missing inode objects -lfsck: ost_idx 0: pass2 OK (287 objects) -lfsck: ost_idx 0: pass3: check for orphan objects -[0] uuid lustre-OST0000_UUID -[0] last_id 288 -[0] zero-length orphan objid 1 -lfsck: ost_idx 0: pass3 OK (321 files total) -lfsck: pass4: check for duplicate object references -lfsck: pass4 OK (no duplicates) -lfsck: fixed 0 errors - By default, LFSCK reports errors, but it does not repair any inconsistencies found. LFSCK checks for three kinds of inconsistencies: - - - Inode exists but has missing objects (dangling inode). This normally happens if there was a problem with an OST. - - - Inode is missing but OST has unreferenced objects (orphan object). Normally, this happens if there was a problem with the MDS. - - - Multiple inodes reference the same objects. This can happen if the MDS is corrupted or if the MDS storage is cached and loses some, but not all, writes. - - - If the file system is in use and being modified while the --mdsdb and --ostdb steps are running, LFSCK may report inconsistencies where none exist due to files and objects being created/removed after the database files were collected. Examine the LFSCK results closely. You may want to re-run the test. -
<indexterm><primary>recovery</primary><secondary>orphaned objects</secondary></indexterm>Working with Orphaned Objects - The easiest problem to resolve is that of orphaned objects. When the -l option for LFSCK is used, these objects are linked to new files and put into lost+found in the Lustre file system, where they can be examined and saved or deleted as necessary. If you are certain the objects are not useful, run LFSCK with the -d option to delete orphaned objects and free up any space they are using. - To fix dangling inodes, use LFSCK with the -c option to create new, zero-length objects on the OSTs. These files read back with binary zeros for stripes that had objects re-created. Even without LFSCK repair, these files can be read by entering: - dd if=/lustre/bad/file of=/new/file bs=4k conv=sync,noerror - Because it is rarely useful to have files with large holes in them, most users delete these files after reading them (if useful) and/or restoring them from backup. - - You cannot write to the holes of such files without having LFSCK re-create the objects. Generally, it is easier to delete these files and restore them from backup. - - To fix inodes with duplicate objects, use LFSCK with the -c option to copy the duplicate object to a new object and assign it to a file. One file will be okay and the duplicate will likely contain garbage. By itself, LFSCK cannot tell which file is the usable one. + The easiest problem to resolve is that of orphaned objects. When the LFSCK layout check is run, these objects are linked to new files and put into .lustre/lost+found in the Lustre file system, where they can be examined and saved or deleted as necessary.
<indexterm><primary>recovery</primary><secondary>unavailable OST</secondary></indexterm>Recovering from an Unavailable OST - One of the most common problems encountered in a Lustre file system environment is + One problem encountered in a Lustre file system environment is when an OST becomes unavailable due to a network partition, OSS node crash, etc. When this happens, the OST's clients pause and wait for the OST to become available again, either on the primary OSS or a failover OSS. When the OST comes back online, the Lustre file system starts a recovery process to enable clients to reconnect to the OST. Lustre servers put a limit on the time they will wait in recovery for clients to - reconnect. The timeout length is determined by the obd_timeout - parameter. + reconnect. During recovery, clients reconnect and replay their requests serially, in the same order they were done originally. Until a client receives a confirmation that a given transaction has been written to stable storage, the client holds on to the transaction, in case it needs to be replayed. Periodically, a progress message prints to the log, stating how_many/expected clients have reconnected. If the recovery is aborted, this log shows how many clients managed to reconnect. When all clients have completed recovery, or if the recovery timeout is reached, the recovery period ends and the OST resumes normal request processing. If some clients fail to replay their requests during the recovery period, this will not stop the recovery from completing. You may have a situation where the OST recovers, but some clients are not able to participate in recovery (e.g. network problems or client failure), so they are evicted and their requests are not replayed. This would result in any operations on the evicted clients failing, including in-progress writes, which would cause cached writes to be lost. This is a normal outcome; the recovery cannot wait indefinitely, or the file system would be hung any time a client failed. The lost transactions are an unfortunate result of the recovery process. + The failure of client recovery does not indicate or lead to + filesystem corruption. This is a normal event that is handled by + the MDT and OST, and should not result in any inconsistencies + between servers. + + The version-based recovery (VBR) feature enables a failed client to be ''skipped'', so remaining clients can replay their requests, resulting in a more successful recovery from a downed OST. For more information about the VBR feature, see (Version-based Recovery).
@@ -249,9 +101,8 @@ lfsck: fixed 0 errors <indexterm><primary>recovery</primary><secondary>oiscrub</secondary></indexterm><indexterm><primary>recovery</primary><secondary>lfsck</secondary></indexterm>Checking the file system with LFSCK LFSCK is an administrative tool introduced in Lustre software release 2.3 for checking and repair of the attributes specific to a mounted Lustre file system. It is similar in - concept to the offline LFSCK Lustre repair tool that is included with the Lustre - e2fsprogs package (see ), but LFSCK is implemented to run as part of the Lustre file system while the file + concept to an offline fsck repair tool for a local filesystem, + but LFSCK is implemented to run as part of the Lustre file system while the file system is mounted and in use. This allows consistency of checking and repair by the Lustre software without unnecessary downtime, and can be run on the largest Lustre file systems. @@ -261,13 +112,13 @@ lfsck: fixed 0 errors restoring from a file-level MDT backup (), or in case the OI table is otherwise corrupted. Later phases of LFSCK will add further checks to the Lustre distributed file system state. - In Lustre software release 2.4, LFSCK can verify and repairing FID-in-Dirent and LinkEA consistency. + In Lustre software release 2.4, LFSCK namespace scanning can verify and repair the directory FID-in-Dirent and LinkEA consistency. - In Lustre software release 2.6, LFSCK can verify and repair MDT-OST file layout inconsistency. File layout inconsistencies between MDT-objects and OST-objects that are checked and corrected include dangling reference, unreferenced OST-objects, mismatched references and multiple references. + In Lustre software release 2.6, LFSCK layout scanning can verify and repair MDT-OST file layout inconsistency. File layout inconsistencies between MDT-objects and OST-objects that are checked and corrected include dangling reference, unreferenced OST-objects, mismatched references and multiple references. Control and monitoring of LFSCK is through LFSCK and the /proc file system - interfaces. LFSCK supports three types of interface: switch interfaces, status - interfaces and adjustments interfaces. These interfaces are detailed below. + interfaces. LFSCK supports three types of interface: switch interface, status + interface and adjustment interface. These interfaces are detailed below.
LFSCK switch interface
diff --git a/UnderstandingLustre.xml b/UnderstandingLustre.xml index 279dedc..0a2f0b0 100644 --- a/UnderstandingLustre.xml +++ b/UnderstandingLustre.xml @@ -328,9 +328,9 @@ Disaster recovery tool: The Lustre file system - provides a distributed file system check (lfsck) that can restore consistency between + provides an online distributed file system check (LFSCK) that can restore consistency between storage components in case of a major file system error. A Lustre file system can - operate even in the presence of file system inconsistencies, so lfsck is not required + operate even in the presence of file system inconsistencies, and LFSCK can run while the filesystem is in use, so LFSCK is not required to complete before returning the file system to production. diff --git a/UserUtilities.xml b/UserUtilities.xml index 20f8319..3f6d562 100644 --- a/UserUtilities.xml +++ b/UserUtilities.xml @@ -9,9 +9,6 @@ - - - @@ -802,123 +799,6 @@ lfs help
-
- <indexterm><primary>lfsck</primary></indexterm> - <literal>lfsck</literal> - - lfsck ensures that objects are not referenced by multiple MDS files, - that there are no orphan objects on the OSTs (objects that do not have any file on the MDS - which references them), and that all of the objects referenced by the MDS exist. Under normal - circumstances, the Lustre software maintains such coherency by distributed logging mechanisms, - but under exceptional circumstances that may fail (e.g. disk failure, file system corruption - leading to e2fsck repair). To avoid lengthy downtime, you can also run - lfsck once the Lustre file system is already started. - The e2fsck utility is run on each of the local MDS and OST device file systems and verifies that the underlying ldiskfs is consistent. After e2fsck is run, lfsck does distributed coherency checking for the Lustre file system. In most cases, e2fsck is sufficient to repair any file system issues and lfsck is not required. -
- Synopsis - lfsck [-c|--create] [-d|--delete] [-f|--force] [-h|--help] [-l|--lostfound] [-n|--nofix] [-v|--verbose] --mdsdb mds_database_file --ostdb ost1_database_file[ost2_database_file...] /mount_point - - - As shown, the /mount_point parameter refers to the Lustre file system mount point. The default mount point is /mnt/lustre. - - - For lfsck, database filenames must be provided as absolute pathnames. Relative paths do not work, the databases cannot be properly opened. - -
-
- Options - The options and descriptions for the lfsck command are listed below. - - - - - - - - Option - - - Description - - - - - - - -c - - - Creates (empty) missing OST objects referenced by MDS inodes. - - - - - -d - - - Deletes orphaned objects from the file system. Since objects on the OST are often only one of several stripes of a file, it can be difficult to compile multiple objects together in a single, usable file. - - - - - -h - - - Prints a brief help message. - - - - - -l - - - Puts orphaned objects into a lost+found directory in the root of the file system. - - - - - -n - - - Does not repair the file system, just performs a read-only check (default). - - - - - -v - - - Verbose operation - more verbosity by specifying the option multiple times. - - - - - --mdsdb - mds_database_file - - - MDT database file created by running e2fsck --mdsdb mds_database_file /dev/mdt_device on the MDT backing device. This is required. - - - - - --ostdb ost1_database_file - [ost2_database_file...] - - - OST database files created by running e2fsck --ostdb ost_database_file /dev/ost_device on each of the OST backing devices. These are required unless an OST is unavailable, in which case all objects thereon are considered missing. - - - - - -
-
- Description - The lfsck utility is used to check and repair the distributed coherency of a Lustre file system. If an MDS or an OST becomes corrupt, run a distributed check on the file system to determine what sort of problems exist. Use lfsck to correct any defects found. - For more information on using e2fsck and lfsck, including examples, see (Commit on Share). For information on resolving orphaned objects, see (Working with Orphaned Objects). -
-
<indexterm><primary>filefrag</primary></indexterm> <literal>filefrag</literal> -- 1.8.3.1