From d34fe4730402dfed2b48ceb14b4a4061be09350b Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Thu, 4 Aug 2016 16:06:39 -0600 Subject: [PATCH] LUDOC-11 backup: recommend MDT device-level backup Add some text recommending to keep a full MDT device-level backup for all filesystems. This is a small amount of storage, but it can save a lot of grief if the MDT fails for some reason. Signed-off-by: Andreas Dilger Change-Id: I28748111ae366ca180ac2125230b23063057ae8e Reviewed-on: http://review.whamcloud.com/21726 Tested-by: Jenkins Reviewed-by: Zhiqi Tao --- BackupAndRestore.xml | 105 +++++++++++++++++++++++++++++--------------- LustreMaintenance.xml | 5 ++- LustreOperations.xml | 6 +-- TroubleShootingRecovery.xml | 2 +- 4 files changed, 76 insertions(+), 42 deletions(-) diff --git a/BackupAndRestore.xml b/BackupAndRestore.xml index 77f3fb1..fe280c9 100644 --- a/BackupAndRestore.xml +++ b/BackupAndRestore.xml @@ -10,31 +10,49 @@ xml:id="backupandrestore"> - + - + - + - + - + -
+ It is strongly recommended that sites perform + periodic device-level backup of the MDT(s) + (), + for example twice a week with alternate backups going to a separate + device, even if there is not enough capacity to do a full backup of all + of the filesystem data. Even if there are separate file-level backups of + some or all files in the filesystem, having a device-level backup of the + MDT can be very useful in case of MDT failure or corruption. Being able to + restore a device-level MDT backup can avoid the significantly longer process + of restoring the entire filesystem from backup. Since the MDT is required + for access to all files, its loss would otherwise force full restore of the + filesystem (if that is even possible) even if the OSTs are still OK. + Performing a periodic device-level MDT backup can be done relatively + inexpensively because the storage need only be connected to the primary + MDS (it can be manually connected to the backup MDS in the rare case + it is needed), and only needs good linear read/write performance. While + the device-level MDT backup is not useful for restoring individual files, + it is most efficient to handle the case of MDT failure or corruption. +
<indexterm> <primary>backup</primary> @@ -59,17 +77,15 @@ xml:id="backupandrestore"> clients working parallel in different directories) rather than on individual server nodes; this is no different than backing up any other file system.</para> - <para>However, due to the large size of most Lustre file systems, it is not - always possible to get a complete backup. We recommend that you back up - subsets of a file system. This includes subdirectories of the entire file - system, filesets for a single user, files incremented by date, and so - on.</para> + <para>However, due to the large size of most Lustre file systems, it is + not always possible to get a complete backup. We recommend that you back + up subsets of a file system. This includes subdirectories of the entire + file system, filesets for a single user, files incremented by date, and + so on, so that restores can be done more efficiently.</para> <note> - <para>In order to allow the file system namespace to scale for future - applications, Lustre software release 2.x internally uses a 128-bit file - identifier for all files. To interface with user applications, the Lustre - software presents 64-bit inode numbers for the - <literal>stat()</literal>, + <para>Lustre internally uses a 128-bit file identifier (FID) for all + files. To interface with user applications, the 64-bit inode numbers + are returned by the <literal>stat()</literal>, <literal>fstat()</literal>, and <literal>readdir()</literal> system calls on 64-bit applications, and 32-bit inode numbers to 32-bit applications.</para> @@ -87,9 +103,8 @@ xml:id="backupandrestore"> they return <literal>EOVERFLOW</literal> errors when accessing the Lustre files. To avoid this problem, Linux NFS clients can use the kernel command-line - option " - <literal>nfs.enable_ino64=0</literal>" in order to force the NFS client - to export 32-bit inode numbers to the client.</para> + option "<literal>nfs.enable_ino64=0</literal>" in order to force the + NFS client to export 32-bit inode numbers to the client.</para> <para> <emphasis role="bold">Workaround</emphasis>: We very strongly recommend that backups using @@ -97,7 +112,9 @@ xml:id="backupandrestore"> number to uniquely identify an inode to be run on 64-bit clients. The 128-bit Lustre file identifiers cannot be uniquely mapped to a 32-bit inode number, and as a result these utilities may operate incorrectly on - 32-bit clients.</para> + 32-bit clients. While there is still a small chance of inode number + collisions with 64-bit inodes, the FID allocation pattern is designed + to avoid collisions for long periods of usage.</para> </note> <section remap="h3"> <title> @@ -386,12 +403,12 @@ Changelog records consumed: 42</screen> </section> </section> </section> - <section xml:id="dbdoclet.50438207_71633"> + <section xml:id="dbdoclet.backup_device"> <title> <indexterm> <primary>backup</primary> <secondary>MDT/OST device level</secondary> - </indexterm>Backing Up and Restoring an MDT or OST (Device Level) + Backing Up and Restoring an MDT or OST (ldiskfs Device Level) In some cases, it is useful to do a full device-level backup of an individual device (MDT or OST), before replacing hardware, performing maintenance, etc. Doing full device-level backups ensures that all of the @@ -402,13 +419,16 @@ Changelog records consumed: 42 underlying devices. Keeping an updated full backup of the MDT is especially important - because a permanent failure of the MDT file system renders the much - larger amount of data in all the OSTs largely inaccessible and - unusable. + because permanent failure or corruption of the MDT file system renders + the much larger amount of data in all the OSTs largely inaccessible and + unusable. The storage needed for one or two full MDT device backups + is much smaller than doing a full filesystem backup, and can use less + expensive storage than the actual MDT device(s) since it only needs to + have good streaming read/write speed instead of high random IOPS. - In Lustre software release 2.0 through 2.2, the only successful way - to backup and restore an MDT is to do a device-level backup as is + In Lustre software release 2.0 through 2.2, the only successful + way to backup and restore an MDT is to do a device-level backup as is described in this section. File-level restore of an MDT is not possible before Lustre software release 2.3, as the Object Index (OI) file cannot be rebuilt after restore without the OI Scrub functionality. @@ -417,20 +437,20 @@ Changelog records consumed: 42 restore is detected (see LU-957), and file-level backup is supported (see - ). + ). If hardware replacement is the reason for the backup or if a spare storage device is available, it is possible to do a raw copy of the MDT or OST from one block device to the other, as long as the new device is at least as large as the original device. To do this, run: - dd if=/dev/{original} of=/dev/{newdev} bs=1M + dd if=/dev/{original} of=/dev/{newdev} bs=4M If hardware errors cause read problems on the original device, use the command below to allow as much data as possible to be read from the original device while skipping sections of the disk with errors: dd if=/dev/{original} of=/dev/{newdev} bs=4k conv=sync,noerror / count={original size in 4kB blocks} - Even in the face of hardware errors, the - ldiskfs file system is very robust and it may be possible + Even in the face of hardware errors, the ldiskfs + file system is very robust and it may be possible to recover the file system data after running e2fsck -fy /dev/{newdev} on the new device, along with ll_recover_lost_found_objs for OST devices. @@ -440,8 +460,21 @@ Changelog records consumed: 42 LFSCK scanning will automatically move objects from lost+found back into its correct location on the OST after directory corruption. + In order to ensure that the backup is fully consistent, the MDT or + OST must be unmounted, so that there are no changes being made to the + device while the data is being transferred. If the reason for the + backup is preventative (i.e. MDT backup on a running MDS in case of + future failures) then it is possible to perform a consistent backup from + an LVM snapshot. If an LVM snapshot is not available, and taking the + MDS offline for a backup is unacceptable, it is also possible to perform + a backup from the raw MDT block device. While the backup from the raw + device will not be fully consistent due to ongoing changes, the vast + majority of ldiskfs metadata is statically allocated, and inconsistencies + in the backup can be fixed by running e2fsck on the + backup device, and is still much better than not having any backup at all. +
-
+
<indexterm> <primary>backup</primary> @@ -450,7 +483,7 @@ Changelog records consumed: 42</screen> <indexterm> <primary>backup</primary> <secondary>MDT file system</secondary> - </indexterm>Making a File-Level Backup of an OST or MDT File System + Backing Up an OST or MDT (ldiskfs File System Level) This procedure provides an alternative to backup or migrate the data of an OST or MDT at the file level. At the file-level, unused space is omitted from the backed up and the process may be completed quicker with @@ -465,7 +498,7 @@ Changelog records consumed: 42 Prior to Lustre software release 2.3, the only successful way to perform an MDT backup and restore is to do a device-level backup as is described in - . The ability to do MDT + . The ability to do MDT file-level backups is not available for Lustre software release 2.0 through 2.2, because restoration of the Object Index (OI) file does not return the MDT to a functioning state. @@ -583,7 +616,7 @@ trusted.fid= \
-
+
<indexterm> <primary>backup</primary> @@ -647,7 +680,7 @@ trusted.fid= \ were created after the MDT backup will not be accessible or visible. See <xref linkend="dbdoclet.lfsckadmin" />for details on using LFSCK.</para> </section> - <section xml:id="dbdoclet.50438207_31553"> + <section xml:id="dbdoclet.backup_lvm_snapshot"> <title> <indexterm> <primary>backup</primary> diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml index b640810..99372b9 100644 --- a/LustreMaintenance.xml +++ b/LustreMaintenance.xml @@ -533,8 +533,9 @@ client$ lfs getstripe -M /mnt/lustre/local_dir0 <secondary>restoring OST config</secondary> </indexterm> Restoring OST Configuration Files If the original OST is still available, it is best to follow the OST backup and restore - procedure given in either , or and . + procedure given in either , or + and + . To replace an OST that was removed from service due to corruption or hardware failure, the file system needs to be formatted using mkfs.lustre, and the Lustre file system configuration should be restored, if available. diff --git a/LustreOperations.xml b/LustreOperations.xml index a70364f..8615fb9 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -846,9 +846,9 @@ tune2fs [-m reserved_blocks_percent] /dev/ Replacing an Existing OST or MDT To copy the contents of an existing OST to a new OST (or an old MDT to a new MDT), follow the process for either OST/MDT backups in - or - . For more information on - removing a MDT, see + or + . + For more information on removing a MDT, see .
diff --git a/TroubleShootingRecovery.xml b/TroubleShootingRecovery.xml index c2c901d..fde2b4e 100644 --- a/TroubleShootingRecovery.xml +++ b/TroubleShootingRecovery.xml @@ -210,7 +210,7 @@ root# e2fsck -fp /dev/sda # fix errors with prudent answers (usually an internal table called the OI Table. An OI Scrub traverses this the IO Table and makes corrections where necessary. An OI Scrub is required after restoring from a file-level MDT backup ( - ), or in case the OI Table is + ), or in case the OI Table is otherwise corrupted. Later phases of LFSCK will add further checks to the Lustre distributed file system state. In Lustre software release 2.4, LFSCK namespace -- 1.8.3.1