1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="backupandrestore">
5 <title xml:id="backupandrestore.title">Backing Up and Restoring a File
7 <para>This chapter describes how to backup and restore at the file
8 system-level, device-level and file-level in a Lustre file system. Each
9 backup approach is described in the the following sections:</para>
13 <xref linkend="dbdoclet.backup_file"/>
18 <xref linkend="dbdoclet.backup_device"/>
23 <xref linkend="backup_fs_level"/>
28 <xref linkend="backup_fs_level.restore"/>
33 <xref linkend="dbdoclet.backup_lvm_snapshot"/>
37 <para>It is <emphasis>strongly</emphasis> recommended that sites perform
38 periodic device-level backup of the MDT(s)
39 (<xref linkend="dbdoclet.backup_device"/>),
40 for example twice a week with alternate backups going to a separate
41 device, even if there is not enough capacity to do a full backup of all
42 of the filesystem data. Even if there are separate file-level backups of
43 some or all files in the filesystem, having a device-level backup of the
44 MDT can be very useful in case of MDT failure or corruption. Being able to
45 restore a device-level MDT backup can avoid the significantly longer process
46 of restoring the entire filesystem from backup. Since the MDT is required
47 for access to all files, its loss would otherwise force full restore of the
48 filesystem (if that is even possible) even if the OSTs are still OK.</para>
49 <para>Performing a periodic device-level MDT backup can be done relatively
50 inexpensively because the storage need only be connected to the primary
51 MDS (it can be manually connected to the backup MDS in the rare case
52 it is needed), and only needs good linear read/write performance. While
53 the device-level MDT backup is not useful for restoring individual files,
54 it is most efficient to handle the case of MDT failure or corruption.</para>
55 <section xml:id="dbdoclet.backup_file">
58 <primary>backup</primary>
61 <primary>restoring</primary>
65 <primary>LVM</primary>
69 <primary>rsync</primary>
71 </indexterm>Backing up a File System</title>
72 <para>Backing up a complete file system gives you full control over the
73 files to back up, and allows restoration of individual files as needed.
74 File system-level backups are also the easiest to integrate into existing
75 backup solutions.</para>
76 <para>File system backups are performed from a Lustre client (or many
77 clients working parallel in different directories) rather than on
78 individual server nodes; this is no different than backing up any other
80 <para>However, due to the large size of most Lustre file systems, it is
81 not always possible to get a complete backup. We recommend that you back
82 up subsets of a file system. This includes subdirectories of the entire
83 file system, filesets for a single user, files incremented by date, and
84 so on, so that restores can be done more efficiently.</para>
86 <para>Lustre internally uses a 128-bit file identifier (FID) for all
87 files. To interface with user applications, the 64-bit inode numbers
88 are returned by the <literal>stat()</literal>,
89 <literal>fstat()</literal>, and
90 <literal>readdir()</literal> system calls on 64-bit applications, and
91 32-bit inode numbers to 32-bit applications.</para>
92 <para>Some 32-bit applications accessing Lustre file systems (on both
93 32-bit and 64-bit CPUs) may experience problems with the
94 <literal>stat()</literal>,
95 <literal>fstat()</literal> or
96 <literal>readdir()</literal> system calls under certain circumstances,
97 though the Lustre client should return 32-bit inode numbers to these
99 <para>In particular, if the Lustre file system is exported from a 64-bit
100 client via NFS to a 32-bit client, the Linux NFS server will export
101 64-bit inode numbers to applications running on the NFS client. If the
102 32-bit applications are not compiled with Large File Support (LFS), then
104 <literal>EOVERFLOW</literal> errors when accessing the Lustre files. To
105 avoid this problem, Linux NFS clients can use the kernel command-line
106 option "<literal>nfs.enable_ino64=0</literal>" in order to force the
107 NFS client to export 32-bit inode numbers to the client.</para>
109 <emphasis role="bold">Workaround</emphasis>: We very strongly recommend
111 <literal>tar(1)</literal> and other utilities that depend on the inode
112 number to uniquely identify an inode to be run on 64-bit clients. The
113 128-bit Lustre file identifiers cannot be uniquely mapped to a 32-bit
114 inode number, and as a result these utilities may operate incorrectly on
115 32-bit clients. While there is still a small chance of inode number
116 collisions with 64-bit inodes, the FID allocation pattern is designed
117 to avoid collisions for long periods of usage.</para>
122 <primary>backup</primary>
123 <secondary>rsync</secondary>
124 </indexterm>Lustre_rsync</title>
126 <literal>lustre_rsync</literal> feature keeps the entire file system in
127 sync on a backup by replicating the file system's changes to a second
128 file system (the second file system need not be a Lustre file system, but
129 it must be sufficiently large).
130 <literal>lustre_rsync</literal> uses Lustre changelogs to efficiently
131 synchronize the file systems without having to scan (directory walk) the
132 Lustre file system. This efficiency is critically important for large
133 file systems, and distinguishes the Lustre
134 <literal>lustre_rsync</literal> feature from other replication/backup
139 <primary>backup</primary>
140 <secondary>rsync</secondary>
141 <tertiary>using</tertiary>
142 </indexterm>Using Lustre_rsync</title>
144 <literal>lustre_rsync</literal> feature works by periodically running
145 <literal>lustre_rsync</literal>, a userspace program used to
146 synchronize changes in the Lustre file system onto the target file
148 <literal>lustre_rsync</literal> utility keeps a status file, which
149 enables it to be safely interrupted and restarted without losing
150 synchronization between the file systems.</para>
151 <para>The first time that
152 <literal>lustre_rsync</literal> is run, the user must specify a set of
153 parameters for the program to use. These parameters are described in
154 the following table and in
155 <xref linkend="dbdoclet.50438219_63667" />. On subsequent runs, these
156 parameters are stored in the the status file, and only the name of the
157 status file needs to be passed to
158 <literal>lustre_rsync</literal>.</para>
160 <literal>lustre_rsync</literal>:</para>
163 <para>Register the changelog user. For details, see the
164 <xref linkend="systemconfigurationutilities" />(
165 <literal>changelog_register</literal>) parameter in the
166 <xref linkend="systemconfigurationutilities" />(
167 <literal>lctl</literal>).</para>
173 <para>Verify that the Lustre file system (source) and the replica
174 file system (target) are identical
175 <emphasis>before</emphasis> registering the changelog user. If the
176 file systems are discrepant, use a utility, e.g. regular
177 <literal>rsync</literal>(not
178 <literal>lustre_rsync</literal>), to make them identical.</para>
182 <literal>lustre_rsync</literal> utility uses the following
184 <informaltable frame="all">
186 <colspec colname="c1" colwidth="3*" />
187 <colspec colname="c2" colwidth="10*" />
192 <emphasis role="bold">Parameter</emphasis>
197 <emphasis role="bold">Description</emphasis>
207 <replaceable>src</replaceable></literal>
211 <para>The path to the root of the Lustre file system (source)
212 which will be synchronized. This is a mandatory option if a
213 valid status log created during a previous synchronization
215 <literal>--statuslog</literal>) is not specified.</para>
222 <replaceable>tgt</replaceable></literal>
226 <para>The path to the root where the source file system will
227 be synchronized (target). This is a mandatory option if the
228 status log created during a previous synchronization
230 <literal>--statuslog</literal>) is not specified. This option
231 can be repeated if multiple synchronization targets are
239 <replaceable>mdt</replaceable></literal>
243 <para>The metadata device to be synchronized. A changelog
244 user must be registered for this device. This is a mandatory
245 option if a valid status log created during a previous
246 synchronization operation (
247 <literal>--statuslog</literal>) is not specified.</para>
254 <replaceable>userid</replaceable></literal>
258 <para>The changelog user ID for the specified MDT. To use
259 <literal>lustre_rsync</literal>, the changelog user must be
260 registered. For details, see the
261 <literal>changelog_register</literal> parameter in
262 <xref linkend="systemconfigurationutilities" />(
263 <literal>lctl</literal>). This is a mandatory option if a
264 valid status log created during a previous synchronization
266 <literal>--statuslog</literal>) is not specified.</para>
272 <literal>--statuslog=
273 <replaceable>log</replaceable></literal>
277 <para>A log file to which synchronization status is saved.
279 <literal>lustre_rsync</literal> utility starts, if the status
280 log from a previous synchronization operation is specified,
281 then the state is read from the log and otherwise mandatory
282 <literal>--source</literal>,
283 <literal>--target</literal> and
284 <literal>--mdt</literal> options can be skipped. Specifying
286 <literal>--source</literal>,
287 <literal>--target</literal> and/or
288 <literal>--mdt</literal> options, in addition to the
289 <literal>--statuslog</literal> option, causes the specified
290 parameters in the status log to be overridden. Command line
291 options take precedence over options in the status
298 <replaceable>yes|no</replaceable></literal>
301 <para>Specifies whether extended attributes (
302 <literal>xattrs</literal>) are synchronized or not. The
303 default is to synchronize extended attributes.</para>
306 <para>Disabling xattrs causes Lustre striping information
307 not to be synchronized.</para>
315 <literal>--verbose</literal>
319 <para>Produces verbose output.</para>
325 <literal>--dry-run</literal>
329 <para>Shows the output of
330 <literal>lustre_rsync</literal> commands (
331 <literal>copy</literal>,
332 <literal>mkdir</literal>, etc.) on the target file system
333 without actually executing them.</para>
339 <literal>--abort-on-err</literal>
343 <para>Stops processing the
344 <literal>lustre_rsync</literal> operation if an error occurs.
345 The default is to continue the operation.</para>
355 <primary>backup</primary>
356 <secondary>rsync</secondary>
357 <tertiary>examples</tertiary>
359 <literal>lustre_rsync</literal> Examples</title>
361 <literal>lustre_rsync</literal> commands are listed below.</para>
362 <para>Register a changelog user for an MDT (e.g.
363 <literal>testfs-MDT0000</literal>).</para>
364 <screen># lctl --device testfs-MDT0000 changelog_register testfs-MDT0000
365 Registered changelog userid 'cl1'</screen>
366 <para>Synchronize a Lustre file system (
367 <literal>/mnt/lustre</literal>) to a target file system (
368 <literal>/mnt/target</literal>).</para>
369 <screen>$ lustre_rsync --source=/mnt/lustre --target=/mnt/target \
370 --mdt=testfs-MDT0000 --user=cl1 --statuslog sync.log --verbose
371 Lustre filesystem: testfs
372 MDT device: testfs-MDT0000
376 Changelog registration: cl1
377 Starting changelog record: 0
379 lustre_rsync took 1 seconds
380 Changelog records consumed: 22</screen>
381 <para>After the file system undergoes changes, synchronize the changes
382 onto the target file system. Only the
383 <literal>statuslog</literal> name needs to be specified, as it has all
384 the parameters passed earlier.</para>
385 <screen>$ lustre_rsync --statuslog sync.log --verbose
386 Replicating Lustre filesystem: testfs
387 MDT device: testfs-MDT0000
391 Changelog registration: cl1
392 Starting changelog record: 22
394 lustre_rsync took 2 seconds
395 Changelog records consumed: 42</screen>
396 <para>To synchronize a Lustre file system (
397 <literal>/mnt/lustre</literal>) to two target file systems (
398 <literal>/mnt/target1</literal> and
399 <literal>/mnt/target2</literal>).</para>
400 <screen>$ lustre_rsync --source=/mnt/lustre --target=/mnt/target1 \
401 --target=/mnt/target2 --mdt=testfs-MDT0000 --user=cl1 \
402 --statuslog sync.log</screen>
406 <section xml:id="dbdoclet.backup_device">
409 <primary>backup</primary>
410 <secondary>MDT/OST device level</secondary>
411 </indexterm>Backing Up and Restoring an MDT or OST (ldiskfs Device Level)</title>
412 <para>In some cases, it is useful to do a full device-level backup of an
413 individual device (MDT or OST), before replacing hardware, performing
414 maintenance, etc. Doing full device-level backups ensures that all of the
415 data and configuration files is preserved in the original state and is the
416 easiest method of doing a backup. For the MDT file system, it may also be
417 the fastest way to perform the backup and restore, since it can do large
418 streaming read and write operations at the maximum bandwidth of the
419 underlying devices.</para>
421 <para>Keeping an updated full backup of the MDT is especially important
422 because permanent failure or corruption of the MDT file system renders
423 the much larger amount of data in all the OSTs largely inaccessible and
424 unusable. The storage needed for one or two full MDT device backups
425 is much smaller than doing a full filesystem backup, and can use less
426 expensive storage than the actual MDT device(s) since it only needs to
427 have good streaming read/write speed instead of high random IOPS.</para>
429 <warning condition='l23'>
430 <para>In Lustre software release 2.0 through 2.2, the only successful
431 way to backup and restore an MDT is to do a device-level backup as is
432 described in this section. File-level restore of an MDT is not possible
433 before Lustre software release 2.3, as the Object Index (OI) file cannot
434 be rebuilt after restore without the OI Scrub functionality.
435 Since Lustre 2.3, Object Index files are automatically rebuilt at first
436 mount after a restore is detected (see
437 <link xl:href="http://jira.whamcloud.com/browse/LU-957">LU-957</link>),
438 and file-level backup is supported (see
439 <xref linkend="backup_fs_level"/>).</para>
441 <para>If hardware replacement is the reason for the backup or if a spare
442 storage device is available, it is possible to do a raw copy of the MDT or
443 OST from one block device to the other, as long as the new device is at
444 least as large as the original device. To do this, run:</para>
445 <screen>dd if=/dev/{original} of=/dev/{newdev} bs=4M</screen>
446 <para>If hardware errors cause read problems on the original device, use
447 the command below to allow as much data as possible to be read from the
448 original device while skipping sections of the disk with errors:</para>
449 <screen>dd if=/dev/{original} of=/dev/{newdev} bs=4k conv=sync,noerror /
450 count={original size in 4kB blocks}</screen>
451 <para>Even in the face of hardware errors, the <literal>ldiskfs</literal>
452 file system is very robust and it may be possible
453 to recover the file system data after running
454 <literal>e2fsck -fy /dev/{newdev}</literal> on the new device, along with
455 <literal>ll_recover_lost_found_objs</literal> for OST devices.</para>
456 <para condition="l26">With Lustre software version 2.6 and later, there is
457 no longer a need to run
458 <literal>ll_recover_lost_found_objs</literal> on the OSTs, since the
459 <literal>LFSCK</literal> scanning will automatically move objects from
460 <literal>lost+found</literal> back into its correct location on the OST
461 after directory corruption.</para>
462 <para>In order to ensure that the backup is fully consistent, the MDT or
463 OST must be unmounted, so that there are no changes being made to the
464 device while the data is being transferred. If the reason for the
465 backup is preventative (i.e. MDT backup on a running MDS in case of
466 future failures) then it is possible to perform a consistent backup from
467 an LVM snapshot. If an LVM snapshot is not available, and taking the
468 MDS offline for a backup is unacceptable, it is also possible to perform
469 a backup from the raw MDT block device. While the backup from the raw
470 device will not be fully consistent due to ongoing changes, the vast
471 majority of ldiskfs metadata is statically allocated, and inconsistencies
472 in the backup can be fixed by running <literal>e2fsck</literal> on the
473 backup device, and is still much better than not having any backup at all.
476 <section xml:id="backup_fs_level">
479 <primary>backup</primary>
480 <secondary>OST file system</secondary>
483 <primary>backup</primary>
484 <secondary>MDT file system</secondary>
485 </indexterm>Backing Up an OST or MDT (Backend File System Level)</title>
486 <para>This procedure provides an alternative to backup or migrate the data
487 of an OST or MDT at the file level. At the file-level, unused space is
488 omitted from the backup and the process may be completed quicker with a
489 smaller total backup size. Backing up a single OST device is not
490 necessarily the best way to perform backups of the Lustre file system,
491 since the files stored in the backup are not usable without metadata stored
492 on the MDT and additional file stripes that may be on other OSTs. However,
493 it is the preferred method for migration of OST devices, especially when it
494 is desirable to reformat the underlying file system with different
495 configuration options or to reduce fragmentation.</para>
497 <emphasis role="bold">Prior to Lustre software release 2.3</emphasis>, the
498 only successful way to perform an MDT backup and restore was to do a
499 device-level backup as described in
500 <xref linkend="dbdoclet.backup_device" />. The ability to do MDT
501 file-level backups is not available for Lustre software release 2.0
502 through 2.2, because restoration of the Object Index (OI) file does not
503 return the MDT to a functioning state.</para>
504 <para><emphasis role="bold">Since Lustre software release 2.3</emphasis>,
505 Object Index files are automatically rebuilt at first mount after a
506 restore is detected (see
507 <link xl:href="http://jira.whamcloud.com/browse/LU-957">LU-957</link>),
508 so file-level MDT restore is supported.</para></note>
509 <section xml:id="backup_fs_level.index_objects" condition="l2B">
512 <primary>backup</primary>
513 <secondary>index objects</secondary>
514 </indexterm>Backing Up an OST or MDT (Backend File System Level)</title>
515 <para>Prior to Lustre software release 2.11.0, we can only do the backend
516 file system level backup and restore process for ldiskfs-based systems.
517 The ability to perform a zfs-based MDT/OST file system level backup and
518 restore is introduced beginning in Lustre software release 2.11.0.
519 Differing from an ldiskfs-based system, index objects must be backed up
520 before the unmount of the target (MDT or OST) in order to be able to
521 restore the file system successfully. To enable index backup on the
522 target, execute the following command on the target server:</para>
523 <screen># lctl set_param osd-zfs.${fsname}-${target}.index_backup=1</screen>
524 <para><replaceable>${target}</replaceable> is composed of the target type
525 (MDT or OST) plus the target index, such as <literal>MDT0000</literal>,
526 <literal>OST0001</literal>, and so on.</para>
527 <note><para>The index_backup is also valid for an ldiskfs-based system,
528 that will be used when migrating data between ldiskfs-based and
529 zfs-based systems as described in <xref linkend="migrate_backends"/>.
532 <section xml:id="backup_fs_level.ost_mdt">
535 <primary>backup</primary>
536 <secondary>OST and MDT</secondary>
537 </indexterm>Backing Up an OST or MDT</title>
538 <para>For Lustre software release 2.3 and newer with MDT file-level backup
539 support, substitute <literal>mdt</literal> for <literal>ost</literal>
540 in the instructions below.</para>
543 <para><emphasis role="bold">Umount the target</emphasis></para>
546 <para><emphasis role="bold">Make a mountpoint for the file system.
548 <screen>[oss]# mkdir -p /mnt/ost</screen>
551 <para><emphasis role="bold">Mount the file system.</emphasis></para>
552 <para>For ldiskfs-based systems:</para>
553 <screen>[oss]# mount -t ldiskfs /dev/<emphasis>{ostdev}</emphasis> /mnt/ost</screen>
554 <para>For zfs-based systems:</para>
557 <para>Import the pool for the target if it is exported. For example:
558 <screen>[oss]# zpool import lustre-ost [-d ${ostdev_dir}]</screen>
562 <para>Enable the <literal>canmount</literal> property on the target
563 filesystem. For example:
564 <screen>[oss]# zfs set canmount=on ${fsname}-ost/ost</screen>
565 You also can specify the mountpoint property. By default, it will
566 be: <literal>/${fsname}-ost/ost</literal>
570 <para>Mount the target as 'zfs'. For example:
571 <screen>[oss]# zfs mount ${fsname}-ost/ost</screen>
578 <emphasis role="bold">Change to the mountpoint being backed
581 <screen>[oss]# cd /mnt/ost</screen>
585 <emphasis role="bold">Back up the extended attributes.</emphasis>
587 <screen>[oss]# getfattr -R -d -m '.*' -e hex -P . > ea-$(date +%Y%m%d).bak</screen>
589 <para>If the <literal>tar(1)</literal> command supports the
590 <literal>--xattr</literal> option (see below), the
591 <literal>getfattr</literal> step may be unnecessary as long as tar
592 correctly backs up the <literal>trusted.*</literal> attributes.
593 However, completing this step is not harmful and can serve as an
594 added safety measure.</para>
597 <para>In most distributions, the
598 <literal>getfattr</literal> command is part of the
599 <literal>attr</literal> package. If the
600 <literal>getfattr</literal> command returns errors like
601 <literal>Operation not supported</literal>, then the kernel does not
602 correctly support EAs. Stop and use a different backup method.</para>
607 <emphasis role="bold">Verify that the
608 <literal>ea-$date.bak</literal> file has properly backed up the EA
609 data on the OST.</emphasis>
611 <para>Without this attribute data, the MDT restore process will fail
612 and result in an unusable filesystem. The OST restore process may be
613 missing extra data that can be very useful in case of later file system
614 corruption. Look at this file with <literal>more</literal> or a text
615 editor. Each object file should have a corresponding item similar to
617 <screen>[oss]# file: O/0/d0/100992
619 0x0d822200000000004a8a73e500000000808a0100000000000000000000000000</screen>
623 <emphasis role="bold">Back up all file system data.</emphasis>
625 <screen>[oss]# tar czvf {backup file}.tgz [--xattrs] [--xattrs-include="trusted.*" --sparse .</screen>
628 <literal>--sparse</literal> option is vital for backing up an MDT.
629 Very old versions of tar may not support the
630 <literal>--sparse</literal> option correctly, which may cause the
631 MDT backup to take a long time. Known-working versions include
632 the tar from Red Hat Enterprise Linux distribution (RHEL version
633 6.3 or newer) or GNU tar version 1.25 and newer.</para>
636 <para>The tar <literal>--xattrs</literal> option is only available
637 in GNU tar version 1.27 or later or in RHEL 6.3 or newer. The
638 <literal>--xattrs-include="trusted.*"</literal> option is
639 <emphasis>required</emphasis> for correct restoration of the xattrs
640 when using GNU tar 1.27 or RHEL 7 and newer.</para>
645 <emphasis role="bold">Change directory out of the file
648 <screen>[oss]# cd -</screen>
652 <emphasis role="bold">Unmount the file system.</emphasis>
654 <screen>[oss]# umount /mnt/ost</screen>
656 <para>When restoring an OST backup on a different node as part of an
657 OST migration, you also have to change server NIDs and use the
658 <literal>--writeconf</literal> command to re-generate the
659 configuration logs. See
660 <xref linkend="lustremaintenance" />(Changing a Server NID).</para>
666 <section xml:id="backup_fs_level.restore">
669 <primary>backup</primary>
670 <secondary>restoring file system backup</secondary>
671 </indexterm>Restoring a File-Level Backup</title>
672 <para>To restore data from a file-level backup, you need to format the
673 device, restore the file data and then restore the EA data.</para>
676 <para>Format the new device.</para>
677 <screen>[oss]# mkfs.lustre --ost --index {<emphasis>OST index</emphasis>}
678 --replace --fstype=${fstype} {<emphasis>other options</emphasis>} /dev/<emphasis>{newdev}</emphasis></screen>
681 <para>Set the file system label (<emphasis role="bold">ldiskfs-based
682 systems only</emphasis>).</para>
683 <screen>[oss]# e2label {fsname}-OST{index in hex} /mnt/ost</screen>
686 <para>Mount the file system.</para>
687 <para>For ldiskfs-based systems:</para>
688 <screen>[oss]# mount -t ldiskfs /dev/<emphasis>{newdev}</emphasis> /mnt/ost</screen>
689 <para>For zfs-based systems:</para>
692 <para>Import the pool for the target if it is exported. For example:
694 <screen>[oss]# zpool import lustre-ost [-d ${ostdev_dir}]</screen>
697 <para>Enable the canmount property on the target filesystem. For
699 <screen>[oss]# zfs set canmount=on ${fsname}-ost/ost</screen>
700 <para>You also can specify the <literal>mountpoint</literal>
701 property. By default, it will be:
702 <literal>/${fsname}-ost/ost</literal></para>
705 <para>Mount the target as 'zfs'. For example:</para>
706 <screen>[oss]# zfs mount ${fsname}-ost/ost</screen>
711 <para>Change to the new file system mount point.</para>
712 <screen>[oss]# cd /mnt/ost</screen>
715 <para>Restore the file system backup.</para>
716 <screen>[oss]# tar xzvpf <emphasis>{backup file}</emphasis> [--xattrs] [--xattrs-include="trusted.*"] --sparse</screen>
718 <para>The tar <literal>--xattrs</literal> option is only available
719 in GNU tar version 1.27 or later or in RHEL 6.3 or newer. The
720 <literal>--xattrs-include="trusted.*"</literal> option is
721 <emphasis>required</emphasis> for correct restoration of the
722 MDT xattrs when using GNU tar 1.27 or RHEL 7 and newer. Otherwise,
723 the <literal>setfattr</literal> step below should be used.
728 <para>If not using a version of tar that supports direct xattr
729 backups, restore the file system extended attributes.</para>
730 <screen>[oss]# setfattr --restore=ea-${date}.bak</screen>
733 <literal>--xattrs</literal> option is supported by tar and specified
734 in the step above, this step is redundant.</para>
738 <para>Verify that the extended attributes were restored.</para>
739 <screen>[oss]# getfattr -d -m ".*" -e hex O/0/d0/100992 trusted.fid= \
740 0x0d822200000000004a8a73e500000000808a0100000000000000000000000000</screen>
743 <para>Remove old OI and LFSCK files.</para>
744 <screen>[oss]# rm -rf oi.16* lfsck_* LFSCK</screen>
747 <para>Remove old CATALOGS.</para>
748 <screen>[oss]# rm -f CATALOGS</screen>
750 <para>This is optional for the MDT side only. The CATALOGS record the
751 llog file handlers that are used for recovering cross-server updates.
752 Before OI scrub rebuilds the OI mappings for the llog files, the
753 related recovery will get a failure if it runs faster than the
754 background OI scrub. This will result in a failure of the whole mount
755 process. OI scrub is an online tool, therefore, a mount failure means
756 that the OI scrub will be stopped. Removing the old CATALOGS will
757 avoid this potential trouble. The side-effect of removing old
758 CATALOGS is that the recovery for related cross-server updates will
759 be aborted. However, this can be handled by LFSCK after the system
764 <para>Change directory out of the file system.</para>
765 <screen>[oss]# cd -</screen>
768 <para>Unmount the new file system.</para>
769 <screen>[oss]# umount /mnt/ost</screen>
770 <note><para>If the restored system has a different NID from the backup
771 system, please change the NID. For detail, please refer to
772 <xref linkend="dbdoclet.changingservernid" />. For example:</para>
773 <screen>[oss]# mount -t lustre -o nosvc ${fsname}-ost/ost /mnt/ost
774 [oss]# lctl replace_nids ${fsname}-OSTxxxx $new_nids
775 [oss]# umount /mnt/ost</screen></note>
778 <para>Mount the target as <literal>lustre</literal>.</para>
779 <para>Usually, we will use the <literal>-o abort_recov</literal> option
780 to skip unnecessary recovery. For example:</para>
781 <screen>[oss]# mount -t lustre -o abort_recov #{fsname}-ost/ost /mnt/ost</screen>
782 <para>Lustre can detect the restore automatically when mounting the
783 target, and then trigger OI scrub to rebuild the OIs and index objects
784 asynchronously in the background. You can check the OI scrub status
785 with the following command:</para>
786 <screen>[oss]# lctl get_param -n osd-${fstype}.${fsname}-${target}.oi_scrub</screen>
789 <para>If the file system was used between the time the backup was made and
790 when it was restored, then the online <literal>LFSCK</literal> tool will
791 automatically be run to ensure the filesystem is coherent. If all of the
792 device filesystems were backed up at the same time after Lustre was
793 was stopped, this step is unnecessary. In either case, the filesystem
794 will be immediately although there may be I/O errors reading
795 from files that are present on the MDT but not the OSTs, and files that
796 were created after the MDT backup will not be accessible or visible. See
797 <xref linkend="dbdoclet.lfsckadmin" />for details on using LFSCK.</para>
799 <section xml:id="dbdoclet.backup_lvm_snapshot">
802 <primary>backup</primary>
803 <secondary>using LVM</secondary>
804 </indexterm>Using LVM Snapshots with the Lustre File System</title>
805 <para>If you want to perform disk-based backups (because, for example,
806 access to the backup system needs to be as fast as to the primary Lustre
807 file system), you can use the Linux LVM snapshot tool to maintain multiple,
808 incremental file system backups.</para>
809 <para>Because LVM snapshots cost CPU cycles as new files are written,
810 taking snapshots of the main Lustre file system will probably result in
811 unacceptable performance losses. You should create a new, backup Lustre
812 file system and periodically (e.g., nightly) back up new/changed files to
813 it. Periodic snapshots can be taken of this backup file system to create a
814 series of "full" backups.</para>
816 <para>Creating an LVM snapshot is not as reliable as making a separate
817 backup, because the LVM snapshot shares the same disks as the primary MDT
818 device, and depends on the primary MDT device for much of its data. If
819 the primary MDT device becomes corrupted, this may result in the snapshot
820 being corrupted.</para>
825 <primary>backup</primary>
826 <secondary>using LVM</secondary>
827 <tertiary>creating</tertiary>
828 </indexterm>Creating an LVM-based Backup File System</title>
829 <para>Use this procedure to create a backup Lustre file system for use
830 with the LVM snapshot mechanism.</para>
833 <para>Create LVM volumes for the MDT and OSTs.</para>
834 <para>Create LVM devices for your MDT and OST targets. Make sure not
835 to use the entire disk for the targets; save some room for the
836 snapshots. The snapshots start out as 0 size, but grow as you make
837 changes to the current file system. If you expect to change 20% of
838 the file system between backups, the most recent snapshot will be 20%
839 of the target size, the next older one will be 40%, etc. Here is an
841 <screen>cfs21:~# pvcreate /dev/sda1
842 Physical volume "/dev/sda1" successfully created
843 cfs21:~# vgcreate vgmain /dev/sda1
844 Volume group "vgmain" successfully created
845 cfs21:~# lvcreate -L200G -nMDT0 vgmain
846 Logical volume "MDT0" created
847 cfs21:~# lvcreate -L200G -nOST0 vgmain
848 Logical volume "OST0" created
850 ACTIVE '/dev/vgmain/MDT0' [200.00 GB] inherit
851 ACTIVE '/dev/vgmain/OST0' [200.00 GB] inherit</screen>
854 <para>Format the LVM volumes as Lustre targets.</para>
855 <para>In this example, the backup file system is called
856 <literal>main</literal> and designates the current, most up-to-date
858 <screen>cfs21:~# mkfs.lustre --fsname=main --mdt --index=0 /dev/vgmain/MDT0
859 No management node specified, adding MGS to this MDT.
866 (MDT MGS first_time update )
867 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
869 checking for existing Lustre data
871 formatting backing filesystem ldiskfs on /dev/vgmain/MDT0
872 target name main-MDT0000
874 options -i 4096 -I 512 -q -O dir_index -F
875 mkfs_cmd = mkfs.ext2 -j -b 4096 -L main-MDT0000 -i 4096 -I 512 -q
876 -O dir_index -F /dev/vgmain/MDT0
877 Writing CONFIGS/mountdata
878 cfs21:~# mkfs.lustre --mgsnode=cfs21 --fsname=main --ost --index=0
886 (OST first_time update )
887 Persistent mount opts: errors=remount-ro,extents,mballoc
888 Parameters: mgsnode=192.168.0.21@tcp
889 checking for existing Lustre data
891 formatting backing filesystem ldiskfs on /dev/vgmain/OST0
892 target name main-OST0000
894 options -I 256 -q -O dir_index -F
895 mkfs_cmd = mkfs.ext2 -j -b 4096 -L lustre-OST0000 -J size=400 -I 256
896 -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256
897 -E resize=4290772992,lazy_journal_init, -F /dev/vgmain/OST0
898 Writing CONFIGS/mountdata
899 cfs21:~# mount -t lustre /dev/vgmain/MDT0 /mnt/mdt
900 cfs21:~# mount -t lustre /dev/vgmain/OST0 /mnt/ost
901 cfs21:~# mount -t lustre cfs21:/main /mnt/main
909 <primary>backup</primary>
910 <secondary>new/changed files</secondary>
911 </indexterm>Backing up New/Changed Files to the Backup File
913 <para>At periodic intervals e.g., nightly, back up new and changed files
914 to the LVM-based backup file system.</para>
915 <screen>cfs21:~# cp /etc/passwd /mnt/main
917 cfs21:~# cp /etc/fstab /mnt/main
919 cfs21:~# ls /mnt/main
920 fstab passwd</screen>
925 <primary>backup</primary>
926 <secondary>using LVM</secondary>
927 <tertiary>creating snapshots</tertiary>
928 </indexterm>Creating Snapshot Volumes</title>
929 <para>Whenever you want to make a "checkpoint" of the main Lustre file
930 system, create LVM snapshots of all target MDT and OSTs in the LVM-based
931 backup file system. You must decide the maximum size of a snapshot ahead
932 of time, although you can dynamically change this later. The size of a
933 daily snapshot is dependent on the amount of data changed daily in the
934 main Lustre file system. It is likely that a two-day old snapshot will be
935 twice as big as a one-day old snapshot.</para>
936 <para>You can create as many snapshots as you have room for in the volume
937 group. If necessary, you can dynamically add disks to the volume
939 <para>The snapshots of the target MDT and OSTs should be taken at the
940 same point in time. Make sure that the cronjob updating the backup file
941 system is not running, since that is the only thing writing to the disks.
942 Here is an example:</para>
943 <screen>cfs21:~# modprobe dm-snapshot
944 cfs21:~# lvcreate -L50M -s -n MDT0.b1 /dev/vgmain/MDT0
945 Rounding up size to full physical extent 52.00 MB
946 Logical volume "MDT0.b1" created
947 cfs21:~# lvcreate -L50M -s -n OST0.b1 /dev/vgmain/OST0
948 Rounding up size to full physical extent 52.00 MB
949 Logical volume "OST0.b1" created
951 <para>After the snapshots are taken, you can continue to back up
952 new/changed files to "main". The snapshots will not contain the new
954 <screen>cfs21:~# cp /etc/termcap /mnt/main
955 cfs21:~# ls /mnt/main
962 <primary>backup</primary>
963 <secondary>using LVM</secondary>
964 <tertiary>restoring</tertiary>
965 </indexterm>Restoring the File System From a Snapshot</title>
966 <para>Use this procedure to restore the file system from an LVM
970 <para>Rename the LVM snapshot.</para>
971 <para>Rename the file system snapshot from "main" to "back" so you
972 can mount it without unmounting "main". This is recommended, but not
974 <literal>--reformat</literal> flag to
975 <literal>tunefs.lustre</literal> to force the name change. For
977 <screen>cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/vgmain/MDT0.b1
978 checking for existing Lustre data
980 Reading CONFIGS/mountdata
981 Read previous values:
988 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
997 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
999 Writing CONFIGS/mountdata
1000 cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/vgmain/OST0.b1
1001 checking for existing Lustre data
1003 Reading CONFIGS/mountdata
1004 Read previous values:
1005 Target: main-OST0000
1011 Persistent mount opts: errors=remount-ro,extents,mballoc
1012 Parameters: mgsnode=192.168.0.21@tcp
1013 Permanent disk data:
1014 Target: back-OST0000
1020 Persistent mount opts: errors=remount-ro,extents,mballoc
1021 Parameters: mgsnode=192.168.0.21@tcp
1022 Writing CONFIGS/mountdata
1024 <para>When renaming a file system, we must also erase the last_rcvd
1025 file from the snapshots</para>
1026 <screen>cfs21:~# mount -t ldiskfs /dev/vgmain/MDT0.b1 /mnt/mdtback
1027 cfs21:~# rm /mnt/mdtback/last_rcvd
1028 cfs21:~# umount /mnt/mdtback
1029 cfs21:~# mount -t ldiskfs /dev/vgmain/OST0.b1 /mnt/ostback
1030 cfs21:~# rm /mnt/ostback/last_rcvd
1031 cfs21:~# umount /mnt/ostback</screen>
1034 <para>Mount the file system from the LVM snapshot. For
1036 <screen>cfs21:~# mount -t lustre /dev/vgmain/MDT0.b1 /mnt/mdtback
1037 cfs21:~# mount -t lustre /dev/vgmain/OST0.b1 /mnt/ostback
1038 cfs21:~# mount -t lustre cfs21:/back /mnt/back</screen>
1041 <para>Note the old directory contents, as of the snapshot time. For
1043 <screen>cfs21:~/cfs/b1_5/lustre/utils# ls /mnt/back
1049 <section remap="h3">
1052 <primary>backup</primary>
1053 <secondary>using LVM</secondary>
1054 <tertiary>deleting</tertiary>
1055 </indexterm>Deleting Old Snapshots</title>
1056 <para>To reclaim disk space, you can erase old snapshots as your backup
1057 policy dictates. Run:</para>
1058 <screen>lvremove /dev/vgmain/MDT0.b1</screen>
1060 <section remap="h3">
1063 <primary>backup</primary>
1064 <secondary>using LVM</secondary>
1065 <tertiary>resizing</tertiary>
1066 </indexterm>Changing Snapshot Volume Size</title>
1067 <para>You can also extend or shrink snapshot volumes if you find your
1068 daily deltas are smaller or larger than expected. Run:</para>
1069 <screen>lvextend -L10G /dev/vgmain/MDT0.b1</screen>
1071 <para>Extending snapshots seems to be broken in older LVM. It is
1072 working in LVM v2.02.01.</para>
1076 <section xml:id="migrate_backends" condition="l2B">
1079 <primary>backup</primary>
1080 <secondary>ZFS ZPL</secondary>
1081 </indexterm>Migration Between ZFS and ldiskfs Target Filesystems
1083 <para>Beginning with Lustre 2.11.0, it is possible to migrate between
1084 ZFS and ldiskfs backends. For migrating OSTs, it is best to use
1085 <literal>lfs find</literal>/<literal>lfs_migrate</literal> to empty out
1086 an OST while the filesystem is in use and then reformat it with the new
1087 fstype. For instructions on removing the OST, please see
1088 <xref linkend="section_remove_ost"/>.</para>
1089 <section remap="h3" xml:id="migrate_backends.zfs2ldiskfs">
1092 <primary>backup</primary>
1093 <secondary>ZFS to ldiskfs</secondary>
1094 </indexterm>Migrate from a ZFS to an ldiskfs based filesystem</title>
1095 <para>The first step of the process is to make a ZFS backend backup
1096 using <literal>tar</literal> as described in
1097 <xref linkend="backup_fs_level"/>.</para>
1098 <para>Next, restore the backup to an ldiskfs-based system as described
1099 in <xref linkend="backup_fs_level.restore"/>.</para>
1101 <section remap="h3" xml:id="migrate_backends.ldiskfs2zfs">
1104 <primary>backup</primary>
1105 <secondary>ZFS to ldiskfs</secondary>
1106 </indexterm>Migrate from an ldiskfs to a ZFS based filesystem</title>
1107 <para>The first step of the process is to make an ldiskfs backend backup
1108 using <literal>tar</literal> as described in
1109 <xref linkend="backup_fs_level"/>.</para>
1110 <para><emphasis role="strong">Caution:</emphasis>For a migration from
1111 ldiskfs to zfs, it is required to enable index_backup before the
1112 unmount of the target. This is an additional step for a regular
1113 ldiskfs-based backup/restore and easy to be missed.</para>
1114 <para>Next, restore the backup to an ldiskfs-based system as described
1115 in <xref linkend="backup_fs_level.restore"/>.</para>