BackupAndRestore.xml

   1 <?xml version='1.0' encoding='UTF-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="backupandrestore">
   3   <title xml:id="backupandrestore.title">Backing Up and Restoring a File System</title>
   4   <para>This chapter describes how to backup and restore at the file system-level, device-level and
   5     file-level in a Lustre file system. Each backup approach is described in the the following
   6     sections:</para>
   7   <itemizedlist>
   8     <listitem>
   9       <para><xref linkend="dbdoclet.50438207_56395"/></para>
  10     </listitem>
  11     <listitem>
  12       <para><xref linkend="dbdoclet.50438207_71633"/></para>
  13     </listitem>
  14     <listitem>
  15       <para><xref linkend="dbdoclet.50438207_21638"/></para>
  16     </listitem>
  17     <listitem>
  18       <para><xref linkend="dbdoclet.50438207_22325"/></para>
  19     </listitem>
  20     <listitem>
  21       <para><xref linkend="dbdoclet.50438207_31553"/></para>
  22     </listitem>
  23   </itemizedlist>
  24   <section xml:id="dbdoclet.50438207_56395">
  25       <title>
  26           <indexterm><primary>backup</primary></indexterm>
  27           <indexterm><primary>restoring</primary><see>backup</see></indexterm>
  28           <indexterm><primary>LVM</primary><see>backup</see></indexterm>
  29           <indexterm><primary>rsync</primary><see>backup</see></indexterm>
  30           Backing up a File System</title>
  31     <para>Backing up a complete file system gives you full control over the files to back up, and
  32       allows restoration of individual files as needed. File system-level backups are also the
  33       easiest to integrate into existing backup solutions.</para>
  34     <para>File system backups are performed from a Lustre client (or many clients working parallel in different directories) rather than on individual server nodes; this is no different than backing up any other file system.</para>
  35     <para>However, due to the large size of most Lustre file systems, it is not always possible to get a complete backup. We recommend that you back up subsets of a file system. This includes subdirectories of the entire file system, filesets for a single user, files incremented by date, and so on.</para>
  36     <note>
  37       <para>In order to allow the file system namespace to scale for future applications, Lustre
  38         software release 2.x internally uses a 128-bit file identifier for all files. To interface
  39         with user applications, the Lustre software presents 64-bit inode numbers for the
  40           <literal>stat()</literal>, <literal>fstat()</literal>, and <literal>readdir()</literal>
  41         system calls on 64-bit applications, and 32-bit inode numbers to 32-bit applications.</para>
  42       <para>Some 32-bit applications accessing Lustre file systems (on both 32-bit and 64-bit CPUs)
  43         may experience problems with the <literal>stat()</literal>, <literal>fstat()</literal>
  44           or<literal> readdir()</literal> system calls under certain circumstances, though the
  45         Lustre client should return 32-bit inode numbers to these applications.</para>
  46       <para>In particular, if the Lustre file system is exported from a 64-bit client via NFS to a
  47         32-bit client, the Linux NFS server will export 64-bit inode numbers to applications running
  48         on the NFS client. If the 32-bit applications are not compiled with Large File Support
  49         (LFS), then they return <literal>EOVERFLOW</literal> errors when accessing the Lustre files.
  50         To avoid this problem, Linux NFS clients can use the kernel command-line option
  51           &quot;<literal>nfs.enable_ino64=0</literal>&quot; in order to force the NFS client to
  52         export 32-bit inode numbers to the client.</para>
  53       <para><emphasis role="bold">Workaround</emphasis>: We very strongly recommend that backups using <literal>tar(1)</literal> and other utilities that depend on the inode number to uniquely identify an inode to be run on 64-bit clients. The 128-bit Lustre file identifiers cannot be uniquely mapped to a 32-bit inode number, and as a result these utilities may operate incorrectly on 32-bit clients.</para>
  54     </note>
  55     <section remap="h3">
  56       <title><indexterm><primary>backup</primary><secondary>rsync</secondary></indexterm>Lustre_rsync</title>
  57       <para>The <literal>lustre_rsync </literal>feature keeps the entire file system in sync on a backup by replicating the file system&apos;s changes to a second file system (the second file system need not be a Lustre file system, but it must be sufficiently large). <literal>lustre_rsync </literal>uses Lustre changelogs to efficiently synchronize the file systems without having to scan (directory walk) the Lustre file system. This efficiency is critically important for large file systems, and distinguishes the Lustre <literal>lustre_rsync</literal> feature from other replication/backup solutions.</para>
  58       <section remap="h4">
  59           <title><indexterm><primary>backup</primary><secondary>rsync</secondary><tertiary>using</tertiary></indexterm>Using Lustre_rsync</title>
  60         <para>The <literal>lustre_rsync</literal> feature works by periodically running <literal>lustre_rsync</literal>, a userspace program used to synchronize changes in the Lustre file system onto the target file system. The <literal>lustre_rsync</literal> utility keeps a status file, which enables it to be safely interrupted and restarted without losing synchronization between the file systems.</para>
  61         <para>The first time that <literal>lustre_rsync</literal> is run, the user must specify a set of parameters for the program to use. These parameters are described in the following table and in <xref linkend="dbdoclet.50438219_63667"/>. On subsequent runs, these parameters are stored in the the status file, and only the name of the status file needs to be passed to <literal>lustre_rsync</literal>.</para>
  62         <para>Before using <literal>lustre_rsync</literal>:</para>
  63         <itemizedlist>
  64           <listitem>
  65             <para>Register the changelog user. For details, see the <xref linkend="systemconfigurationutilities"/> (<literal>changelog_register</literal>) parameter in the <xref linkend="systemconfigurationutilities"/> (<literal>lctl</literal>).</para>
  66           </listitem>
  67         </itemizedlist>
  68         <para>- AND -</para>
  69         <itemizedlist>
  70           <listitem>
  71             <para>Verify that the Lustre file system (source) and the replica file system (target) are identical <emphasis>before</emphasis> registering the changelog user. If the file systems are discrepant, use a utility, e.g. regular <literal>rsync</literal> (not <literal>lustre_rsync</literal>), to make them identical.</para>
  72           </listitem>
  73         </itemizedlist>
  74         <para>The <literal>lustre_rsync</literal> utility uses the following parameters:</para>
  75         <informaltable frame="all">
  76           <tgroup cols="2">
  77             <colspec colname="c1" colwidth="3*"/>
  78             <colspec colname="c2" colwidth="10*"/>
  79             <thead>
  80               <row>
  81                 <entry>
  82                   <para><emphasis role="bold">Parameter</emphasis></para>
  83                 </entry>
  84                 <entry>
  85                   <para><emphasis role="bold">Description</emphasis></para>
  86                 </entry>
  87               </row>
  88             </thead>
  89             <tbody>
  90               <row>
  91                 <entry>
  92                   <para> <literal>--source=<replaceable>src</replaceable></literal></para>
  93                 </entry>
  94                 <entry>
  95                   <para>The path to the root of the Lustre file system (source) which will be synchronized. This is a mandatory option if a valid status log created during a previous synchronization operation (<literal>--statuslog</literal>) is not specified.</para>
  96                 </entry>
  97               </row>
  98               <row>
  99                 <entry>
 100                   <para> <literal>--target=<replaceable>tgt</replaceable></literal></para>
 101                 </entry>
 102                 <entry>
 103                   <para>The path to the root where the source file system will be synchronized (target). This is a mandatory option if the status log created during a previous synchronization operation (<literal>--statuslog</literal>) is not specified. This option can be repeated if multiple synchronization targets are desired.</para>
 104                 </entry>
 105               </row>
 106               <row>
 107                 <entry>
 108                   <para> <literal>--mdt=<replaceable>mdt</replaceable></literal></para>
 109                 </entry>
 110                 <entry>
 111                   <para>The metadata device to be synchronized. A changelog user must be registered for this device. This is a mandatory option if a valid status log created during a previous synchronization operation (<literal>--statuslog</literal>) is not specified.</para>
 112                 </entry>
 113               </row>
 114               <row>
 115                 <entry>
 116                   <para> <literal>--user=<replaceable>userid</replaceable></literal></para>
 117                 </entry>
 118                 <entry>
 119                   <para>The changelog user ID for the specified MDT. To use <literal>lustre_rsync</literal>, the changelog user must be registered. For details, see the <literal>changelog_register</literal> parameter in <xref linkend="systemconfigurationutilities"/> (<literal>lctl</literal>). This is a mandatory option if a valid status log created during a previous synchronization operation (<literal>--statuslog</literal>) is not specified.</para>
 120                 </entry>
 121               </row>
 122               <row>
 123                 <entry>
 124                   <para> <literal>--statuslog=<replaceable>log</replaceable></literal></para>
 125                 </entry>
 126                 <entry>
 127                   <para>A log file to which synchronization status is saved. When the <literal>lustre_rsync</literal> utility starts, if the status log from a previous synchronization operation is specified, then the state is read from the log and otherwise mandatory <literal>--source</literal>, <literal>--target</literal> and <literal>--mdt</literal> options can be skipped. Specifying the <literal>--source</literal>, <literal>--target</literal> and/or <literal>--mdt</literal> options, in addition to the <literal>--statuslog</literal> option, causes the specified parameters in the status log to be overridden. Command line options take precedence over options in the status log.</para>
 128                 </entry>
 129               </row>
 130               <row>
 131                 <entry>
 132                   <literal> --xattr <replaceable>yes|no</replaceable> </literal>
 133                 </entry>
 134                 <entry>
 135                   <para>Specifies whether extended attributes (<literal>xattrs</literal>) are synchronized or not. The default is to synchronize extended attributes.</para>
 136                   <para><note>
 137                       <para>Disabling xattrs causes Lustre striping information not to be synchronized.</para>
 138                     </note></para>
 139                 </entry>
 140               </row>
 141               <row>
 142                 <entry>
 143                   <para> <literal>--verbose</literal></para>
 144                 </entry>
 145                 <entry>
 146                   <para>Produces verbose output.</para>
 147                 </entry>
 148               </row>
 149               <row>
 150                 <entry>
 151                   <para> <literal>--dry-run</literal></para>
 152                 </entry>
 153                 <entry>
 154                   <para>Shows the output of <literal>lustre_rsync</literal> commands (<literal>copy</literal>, <literal>mkdir</literal>, etc.) on the target file system without actually executing them.</para>
 155                 </entry>
 156               </row>
 157               <row>
 158                 <entry>
 159                   <para> <literal>--abort-on-err</literal></para>
 160                 </entry>
 161                 <entry>
 162                   <para>Stops processing the <literal>lustre_rsync</literal> operation if an error occurs. The default is to continue the operation.</para>
 163                 </entry>
 164               </row>
 165             </tbody>
 166           </tgroup>
 167         </informaltable>
 168       </section>
 169       <section remap="h4">
 170           <title><indexterm><primary>backup</primary><secondary>rsync</secondary><tertiary>examples</tertiary></indexterm><literal>lustre_rsync</literal> Examples</title>
 171         <para>Sample <literal>lustre_rsync</literal> commands are listed below.</para>
 172         <para>Register a changelog user for an MDT (e.g. <literal>testfs-MDT0000</literal>).</para>
 173         <screen># lctl --device testfs-MDT0000 changelog_register testfs-MDT0000
 174 Registered changelog userid &apos;cl1&apos;</screen>
 175         <para>Synchronize a Lustre file system (<literal>/mnt/lustre</literal>) to a target file system (<literal>/mnt/target</literal>).</para>
 176         <screen>$ lustre_rsync --source=/mnt/lustre --target=/mnt/target \
 177            --mdt=testfs-MDT0000 --user=cl1 --statuslog sync.log  --verbose
 178 Lustre filesystem: testfs
 179 MDT device: testfs-MDT0000
 180 Source: /mnt/lustre
 181 Target: /mnt/target
 182 Statuslog: sync.log
 183 Changelog registration: cl1
 184 Starting changelog record: 0
 185 Errors: 0
 186 lustre_rsync took 1 seconds
 187 Changelog records consumed: 22</screen>
 188         <para>After the file system undergoes changes, synchronize the changes onto the target file system. Only the <literal>statuslog</literal> name needs to be specified, as it has all the parameters passed earlier.</para>
 189         <screen>$ lustre_rsync --statuslog sync.log --verbose
 190 Replicating Lustre filesystem: testfs
 191 MDT device: testfs-MDT0000
 192 Source: /mnt/lustre
 193 Target: /mnt/target
 194 Statuslog: sync.log
 195 Changelog registration: cl1
 196 Starting changelog record: 22
 197 Errors: 0
 198 lustre_rsync took 2 seconds
 199 Changelog records consumed: 42</screen>
 200         <para>To synchronize a Lustre file system (<literal>/mnt/lustre</literal>) to two target file systems (<literal>/mnt/target1</literal> and <literal>/mnt/target2</literal>).</para>
 201         <screen>$ lustre_rsync --source=/mnt/lustre --target=/mnt/target1 \
 202            --target=/mnt/target2 --mdt=testfs-MDT0000 --user=cl1  \
 203            --statuslog sync.log</screen>
 204       </section>
 205     </section>
 206   </section>
 207   <section xml:id="dbdoclet.50438207_71633">
 208       <title><indexterm><primary>backup</primary><secondary>MDS/OST device level</secondary></indexterm>Backing Up and Restoring an MDS or OST (Device Level)</title>
 209     <para>In some cases, it is useful to do a full device-level backup of an individual device (MDT or OST), before replacing hardware, performing maintenance, etc. Doing full device-level backups ensures that all of the data and configuration files is preserved in the original state and is the easiest method of doing a backup. For the MDT file system, it may also be the fastest way to perform the backup and restore, since it can do large streaming read and write operations at the maximum bandwidth of the underlying devices.</para>
 210     <note>
 211       <para>Keeping an updated full backup of the MDT is especially important because a permanent failure of the MDT file system renders the much larger amount of data in all the OSTs largely inaccessible and unusable.</para>
 212     </note>
 213     <warning condition='l23'>
 214         <para>In Lustre software release 2.0 through 2.2, the only successful way to backup and
 215         restore an MDT is to do a device-level backup as is described in this section. File-level
 216         restore of an MDT is not possible before Lustre software release 2.3, as the Object Index
 217         (OI) file cannot be rebuilt after restore without the OI Scrub functionality. <emphasis
 218           role="bold">Since Lustre software release 2.3</emphasis>, Object Index files are
 219         automatically rebuilt at first mount after a restore is detected (see <link
 220           xl:href="http://jira.hpdd.intel.com/browse/LU-957">LU-957</link>), and file-level backup
 221         is supported (see <xref linkend="dbdoclet.50438207_21638"/>).</para>
 222     </warning>
 223     <para>If hardware replacement is the reason for the backup or if a spare storage device is available, it is possible to do a raw copy of the MDT or OST from one block device to the other, as long as the new device is at least as large as the original device. To do this, run:</para>
 224     <screen>dd if=/dev/{original} of=/dev/{newdev} bs=1M</screen>
 225     <para>If hardware errors cause read problems on the original device, use the command below to allow as much data as possible to be read from the original device while skipping sections of the disk with errors:</para>
 226     <screen>dd if=/dev/{original} of=/dev/{newdev} bs=4k conv=sync,noerror /
 227       count={original size in 4kB blocks}</screen>
 228     <para>Even in the face of hardware errors, the <literal>ldiskfs</literal>
 229     file system is very robust and it may be possible to recover the file
 230     system data after running <literal>e2fsck -fy /dev/{newdev}</literal> on
 231     the new device, along with <literal>ll_recover_lost_found_objs</literal>
 232     for OST devices.</para>
 233   </section>
 234   <section xml:id="dbdoclet.50438207_21638">
 235       <title><indexterm><primary>backup</primary><secondary>OST file system</secondary></indexterm><indexterm><primary>backup</primary><secondary>MDT file system</secondary></indexterm>Making a File-Level Backup of an OST or MDT File System</title>
 236     <para>This procedure provides an alternative to backup or migrate the data of an OST or MDT at the file level. At the file-level, unused space is omitted from the backed up and the process may be completed quicker with smaller total backup size. Backing up a single OST device is not necessarily the best way to perform backups of the Lustre file system, since the files stored in the backup are not usable without metadata stored on the MDT and additional file stripes that may be on other OSTs. However, it is the preferred method for migration of OST devices, especially when it is desirable to reformat the underlying file system with different configuration options or to reduce fragmentation.</para>
 237     <note>
 238         <para>Prior to Lustre software release 2.3, the only successful way to perform an MDT backup
 239         and restore is to do a device-level backup as is described in <xref
 240           linkend="dbdoclet.50438207_71633"/>. The ability to do MDT file-level backups is not
 241         available for Lustre software release 2.0 through 2.2, because restoration of the Object
 242         Index (OI) file does not return the MDT to a functioning state. <emphasis role="bold">Since
 243           Lustre software release 2.3</emphasis>, Object Index files are automatically rebuilt at
 244         first mount after a restore is detected (see <link
 245           xl:href="http://jira.hpdd.intel.com/browse/LU-957">LU-957</link>), so file-level MDT
 246         restore is supported.</para>
 247     </note>
 248     <para>For Lustre software release 2.3 and newer with MDT file-level backup support, substitute
 249         <literal>mdt</literal> for <literal>ost</literal> in the instructions below.</para>
 250     <orderedlist>
 251       <listitem>
 252         <para><emphasis role="bold">Make a mountpoint for the file system.</emphasis></para>
 253         <screen>[oss]# mkdir -p /mnt/ost</screen>
 254       </listitem>
 255       <listitem>
 256         <para><emphasis role="bold">Mount the file system.</emphasis></para>
 257         <screen>[oss]# mount -t ldiskfs /dev/<emphasis>{ostdev}</emphasis> /mnt/ost</screen>
 258       </listitem>
 259       <listitem>
 260         <para><emphasis role="bold">Change to the mountpoint being backed up.</emphasis></para>
 261         <screen>[oss]# cd /mnt/ost</screen>
 262       </listitem>
 263       <listitem>
 264         <para><emphasis role="bold">Back up the extended attributes.</emphasis></para>
 265         <screen>[oss]# getfattr -R -d -m &apos;.*&apos; -e hex -P . &gt; ea-$(date +%Y%m%d).bak</screen>
 266         <note>
 267           <para>If the <literal>tar(1)</literal> command supports the <literal>--xattr</literal> option, the <literal>getfattr</literal> step may be unnecessary as long as tar does a backup of the <literal>trusted.*</literal> attributes. However, completing this step is not harmful and can serve as an added safety measure.</para>
 268         </note>
 269         <note>
 270           <para>In most distributions, the <literal>getfattr</literal> command is part of the <literal>attr</literal> package. If the <literal>getfattr</literal> command returns errors like <literal>Operation not supported</literal>, then the kernel does not correctly support EAs. Stop and use a different backup method.</para>
 271         </note>
 272       </listitem>
 273       <listitem>
 274         <para><emphasis role="bold">Verify that the <literal>ea-$date.bak</literal> file has properly backed up the EA data on the OST.</emphasis></para>
 275         <para>Without this attribute data, the restore process may be missing extra data that can be very useful in case of later file system corruption. Look at this file with more or a text editor. Each object file should have a corresponding item similar to this:</para>
 276         <screen>[oss]# file: O/0/d0/100992
 277 trusted.fid= \
 278 0x0d822200000000004a8a73e500000000808a0100000000000000000000000000</screen>
 279       </listitem>
 280       <listitem>
 281         <para><emphasis role="bold">Back up all file system data.</emphasis></para>
 282         <screen>[oss]# tar czvf {backup file}.tgz [--xattrs] --sparse .</screen>
 283         <note>
 284             <para>The tar <literal>--sparse</literal> option is vital for backing up an MDT. In
 285             order to have <literal>--sparse</literal> behave correctly, and complete the backup of
 286             and MDT in finite time, the version of tar must be specified. Correctly functioning
 287             versions of tar include the Lustre software enhanced version of tar at <link
 288               xmlns:xlink="http://www.w3.org/1999/xlink"
 289               xlink:href="https://wiki.hpdd.intel.com/display/PUB/Lustre+Tools#LustreTools-lustre-tar"
 290             />, the tar from a Red Hat Enterprise Linux distribution (version 6.3 or more recent)
 291             and the GNU tar version 1.25 or more recent.</para>
 292         </note>
 293         <warning>
 294             <para>The tar <literal>--xattrs</literal> option is only available
 295             in GNU tar distributions from Red Hat or Intel.</para>
 296         </warning>
 297       </listitem>
 298       <listitem>
 299         <para><emphasis role="bold">Change directory out of the file system.</emphasis></para>
 300         <screen>[oss]# cd -</screen>
 301       </listitem>
 302       <listitem>
 303         <para><emphasis role="bold">Unmount the file system.</emphasis></para>
 304         <screen>[oss]# umount /mnt/ost</screen>
 305         <note>
 306           <para>When restoring an OST backup on a different node as part of an OST migration, you also have to change server NIDs and use the <literal>--writeconf</literal> command to re-generate the configuration logs. See <xref linkend="lustremaintenance"/> (Changing a Server NID).</para>
 307         </note>
 308       </listitem>
 309     </orderedlist>
 310   </section>
 311   <section xml:id="dbdoclet.50438207_22325">
 312     <title><indexterm><primary>backup</primary><secondary>restoring file system backup</secondary></indexterm>Restoring a File-Level Backup</title>
 313     <para>To restore data from a file-level backup, you need to format the device, restore the file data and then restore the EA data.</para>
 314     <orderedlist>
 315       <listitem>
 316         <para>Format the new device.</para>
 317         <screen>[oss]# mkfs.lustre --ost --index {<emphasis>OST index</emphasis>} {<emphasis>other options</emphasis>} /dev/<emphasis>{newdev}</emphasis></screen>
 318       </listitem>
 319       <listitem>
 320         <para>Set the file system label.</para>
 321         <screen>[oss]# e2label {fsname}-OST{index in hex} /mnt/ost</screen>
 322       </listitem>
 323       <listitem>
 324         <para>Mount the file system.</para>
 325         <screen>[oss]# mount -t ldiskfs /dev/<emphasis>{newdev}</emphasis> /mnt/ost</screen>
 326       </listitem>
 327       <listitem>
 328         <para>Change to the new file system mount point.</para>
 329         <screen>[oss]# cd /mnt/ost</screen>
 330       </listitem>
 331       <listitem>
 332         <para>Restore the file system backup.</para>
 333         <screen>[oss]# tar xzvpf <emphasis>{backup file}</emphasis> [--xattrs] --sparse</screen>
 334       </listitem>
 335       <listitem>
 336         <para>Restore the file system extended attributes.</para>
 337         <screen>[oss]# setfattr --restore=ea-${date}.bak</screen>
 338         <note>
 339             <para>If <literal>--xattrs</literal> option is supported by tar and specified in the step above, this step is redundant.</para>
 340         </note>
 341       </listitem>
 342       <listitem>
 343         <para>Verify that the extended attributes were restored.</para>
 344         <screen>[oss]# getfattr -d -m &quot;.*&quot; -e hex O/0/d0/100992 trusted.fid= \
 345 0x0d822200000000004a8a73e500000000808a0100000000000000000000000000</screen>
 346       </listitem>
 347       <listitem>
 348         <para>Change directory out of the file system.</para>
 349         <screen>[oss]# cd -</screen>
 350       </listitem>
 351       <listitem>
 352         <para>Unmount the new file system.</para>
 353         <screen>[oss]# umount /mnt/ost</screen>
 354       </listitem>
 355     </orderedlist>
 356     <para>If the file system was used between the time the backup was made and when it was restored, then the <literal>lfsck</literal> tool (part of Lustre <literal>e2fsprogs</literal>) can optionally be run to ensure the file system is coherent. If all of the device file systems were backed up at the same time after the entire Lustre file system was stopped, this is not necessary. In either case, the file system should be immediately usable even if <literal>lfsck</literal> is not run, though there may be I/O errors reading from files that are present on the MDT but not the OSTs, and files that were created after the MDT backup will not be accessible/visible.</para>
 357   </section>
 358   <section xml:id="dbdoclet.50438207_31553">
 359     <title><indexterm>
 360         <primary>backup</primary>
 361         <secondary>using LVM</secondary>
 362       </indexterm>Using LVM Snapshots with the Lustre File System</title>
 363     <para>If you want to perform disk-based backups (because, for example, access to the backup system needs to be as fast as to the primary Lustre file system), you can use the Linux LVM snapshot tool to maintain multiple, incremental file system backups.</para>
 364     <para>Because LVM snapshots cost CPU cycles as new files are written, taking snapshots of the main Lustre file system will probably result in unacceptable performance losses. You should create a new, backup Lustre file system and periodically (e.g., nightly) back up new/changed files to it. Periodic snapshots can be taken of this backup file system to create a series of &quot;full&quot; backups.</para>
 365     <note>
 366       <para>Creating an LVM snapshot is not as reliable as making a separate backup, because the LVM snapshot shares the same disks as the primary MDT device, and depends on the primary MDT device for much of its data. If the primary MDT device becomes corrupted, this may result in the snapshot being corrupted.</para>
 367     </note>
 368     <section remap="h3">
 369         <title><indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>creating</tertiary></indexterm>Creating an LVM-based Backup File System</title>
 370       <para>Use this procedure to create a backup Lustre file system for use with the LVM snapshot mechanism.</para>
 371       <orderedlist>
 372         <listitem>
 373           <para>Create LVM volumes for the MDT and OSTs.</para>
 374           <para>Create LVM devices for your MDT and OST targets. Make sure not to use the entire disk for the targets; save some room for the snapshots. The snapshots start out as 0 size, but grow as you make changes to the current file system. If you expect to change 20% of the file system between backups, the most recent snapshot will be 20% of the target size, the next older one will be 40%, etc. Here is an example:</para>
 375           <screen>cfs21:~# pvcreate /dev/sda1
 376    Physical volume &quot;/dev/sda1&quot; successfully created
 377 cfs21:~# vgcreate vgmain /dev/sda1
 378    Volume group &quot;vgmain&quot; successfully created
 379 cfs21:~# lvcreate -L200G -nMDT0 vgmain
 380    Logical volume &quot;MDT0&quot; created
 381 cfs21:~# lvcreate -L200G -nOST0 vgmain
 382    Logical volume &quot;OST0&quot; created
 383 cfs21:~# lvscan
 384    ACTIVE                  &apos;/dev/vgmain/MDT0&apos; [200.00 GB] inherit
 385    ACTIVE                  &apos;/dev/vgmain/OST0&apos; [200.00 GB] inherit</screen>
 386         </listitem>
 387         <listitem>
 388           <para>Format the LVM volumes as Lustre targets.</para>
 389           <para>In this example, the backup file system is called <literal>main</literal> and
 390             designates the current, most up-to-date backup.</para>
 391           <screen>cfs21:~# mkfs.lustre --fsname=main --mdt --index=0 /dev/vgmain/MDT0
 392  No management node specified, adding MGS to this MDT.
 393     Permanent disk data:
 394  Target:     main-MDT0000
 395  Index:      0
 396  Lustre FS:  main
 397  Mount type: ldiskfs
 398  Flags:      0x75
 399                (MDT MGS first_time update )
 400  Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 401  Parameters:
 402 checking for existing Lustre data
 403  device size = 200GB
 404  formatting backing filesystem ldiskfs on /dev/vgmain/MDT0
 405          target name  main-MDT0000
 406          4k blocks     0
 407          options        -i 4096 -I 512 -q -O dir_index -F
 408  mkfs_cmd = mkfs.ext2 -j -b 4096 -L main-MDT0000  -i 4096 -I 512 -q
 409   -O dir_index -F /dev/vgmain/MDT0
 410  Writing CONFIGS/mountdata
 411 cfs21:~# mkfs.lustre --mgsnode=cfs21 --fsname=main --ost --index=0 /dev/vgmain/OST0
 412     Permanent disk data:
 413  Target:     main-OST0000
 414  Index:      0
 415  Lustre FS:  main
 416  Mount type: ldiskfs
 417  Flags:      0x72
 418                (OST first_time update )
 419  Persistent mount opts: errors=remount-ro,extents,mballoc
 420  Parameters: mgsnode=192.168.0.21@tcp
 421 checking for existing Lustre data
 422  device size = 200GB
 423  formatting backing filesystem ldiskfs on /dev/vgmain/OST0
 424          target name  main-OST0000
 425          4k blocks     0
 426          options        -I 256 -q -O dir_index -F
 427  mkfs_cmd = mkfs.ext2 -j -b 4096 -L lustre-OST0000 -J size=400 -I 256
 428   -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256
 429   -E resize=4290772992,lazy_journal_init, -F /dev/vgmain/OST0
 430  Writing CONFIGS/mountdata
 431 cfs21:~# mount -t lustre /dev/vgmain/MDT0 /mnt/mdt
 432 cfs21:~# mount -t lustre /dev/vgmain/OST0 /mnt/ost
 433 cfs21:~# mount -t lustre cfs21:/main /mnt/main</screen>
 434         </listitem>
 435       </orderedlist>
 436     </section>
 437     <section remap="h3">
 438         <title><indexterm><primary>backup</primary><secondary>new/changed files</secondary></indexterm>Backing up New/Changed Files to the Backup File System</title>
 439       <para>At periodic intervals e.g., nightly, back up new and changed files to the LVM-based backup file system.</para>
 440       <screen>cfs21:~# cp /etc/passwd /mnt/main
 441
 442 cfs21:~# cp /etc/fstab /mnt/main
 443
 444 cfs21:~# ls /mnt/main
 445 fstab  passwd</screen>
 446     </section>
 447     <section remap="h3">
 448         <title><indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>creating snapshots</tertiary></indexterm>Creating Snapshot Volumes</title>
 449       <para>Whenever you want to make a &quot;checkpoint&quot; of the main Lustre file system, create LVM snapshots of all target MDT and OSTs in the LVM-based backup file system. You must decide the maximum size of a snapshot ahead of time, although you can dynamically change this later. The size of a daily snapshot is dependent on the amount of data changed daily in the main Lustre file system. It is likely that a two-day old snapshot will be twice as big as a one-day old snapshot.</para>
 450       <para>You can create as many snapshots as you have room for in the volume group. If necessary, you can dynamically add disks to the volume group.</para>
 451       <para>The snapshots of the target MDT and OSTs should be taken at the same point in time. Make sure that the cronjob updating the backup file system is not running, since that is the only thing writing to the disks. Here is an example:</para>
 452       <screen>cfs21:~# modprobe dm-snapshot
 453 cfs21:~# lvcreate -L50M -s -n MDT0.b1 /dev/vgmain/MDT0
 454    Rounding up size to full physical extent 52.00 MB
 455    Logical volume &quot;MDT0.b1&quot; created
 456 cfs21:~# lvcreate -L50M -s -n OST0.b1 /dev/vgmain/OST0
 457    Rounding up size to full physical extent 52.00 MB
 458    Logical volume &quot;OST0.b1&quot; created</screen>
 459       <para>After the snapshots are taken, you can continue to back up new/changed files to &quot;main&quot;. The snapshots will not contain the new files.</para>
 460       <screen>cfs21:~# cp /etc/termcap /mnt/main
 461 cfs21:~# ls /mnt/main
 462 fstab  passwd  termcap</screen>
 463     </section>
 464     <section remap="h3">
 465         <title><indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>restoring</tertiary></indexterm>Restoring the File System From a Snapshot</title>
 466       <para>Use this procedure to restore the file system from an LVM snapshot.</para>
 467       <orderedlist>
 468         <listitem>
 469           <para>Rename the LVM snapshot.</para>
 470           <para>Rename the file system snapshot from &quot;main&quot; to &quot;back&quot; so you can mount it without unmounting &quot;main&quot;. This is recommended, but not required. Use the <literal>--reformat</literal> flag to <literal>tunefs.lustre</literal> to force the name change. For example:</para>
 471           <screen>cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/vgmain/MDT0.b1
 472  checking for existing Lustre data
 473  found Lustre data
 474  Reading CONFIGS/mountdata
 475 Read previous values:
 476  Target:     main-MDT0000
 477  Index:      0
 478  Lustre FS:  main
 479  Mount type: ldiskfs
 480  Flags:      0x5
 481               (MDT MGS )
 482  Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 483  Parameters:
 484 Permanent disk data:
 485  Target:     back-MDT0000
 486  Index:      0
 487  Lustre FS:  back
 488  Mount type: ldiskfs
 489  Flags:      0x105
 490               (MDT MGS writeconf )
 491  Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 492  Parameters:
 493 Writing CONFIGS/mountdata
 494 cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/vgmain/OST0.b1
 495  checking for existing Lustre data
 496  found Lustre data
 497  Reading CONFIGS/mountdata
 498 Read previous values:
 499  Target:     main-OST0000
 500  Index:      0
 501  Lustre FS:  main
 502  Mount type: ldiskfs
 503  Flags:      0x2
 504               (OST )
 505  Persistent mount opts: errors=remount-ro,extents,mballoc
 506  Parameters: mgsnode=192.168.0.21@tcp
 507 Permanent disk data:
 508  Target:     back-OST0000
 509  Index:      0
 510  Lustre FS:  back
 511  Mount type: ldiskfs
 512  Flags:      0x102
 513               (OST writeconf )
 514  Persistent mount opts: errors=remount-ro,extents,mballoc
 515  Parameters: mgsnode=192.168.0.21@tcp
 516 Writing CONFIGS/mountdata</screen>
 517         <para>When renaming a file system, we must also erase the last_rcvd file from the
 518             snapshots</para>
 519           <screen>cfs21:~# mount -t ldiskfs /dev/vgmain/MDT0.b1 /mnt/mdtback
 520 cfs21:~# rm /mnt/mdtback/last_rcvd
 521 cfs21:~# umount /mnt/mdtback
 522 cfs21:~# mount -t ldiskfs /dev/vgmain/OST0.b1 /mnt/ostback
 523 cfs21:~# rm /mnt/ostback/last_rcvd
 524 cfs21:~# umount /mnt/ostback</screen>
 525         </listitem>
 526         <listitem>
 527           <para>Mount the file system from the LVM snapshot.  For example:</para>
 528           <screen>cfs21:~# mount -t lustre /dev/vgmain/MDT0.b1 /mnt/mdtback
 529 cfs21:~# mount -t lustre /dev/vgmain/OST0.b1 /mnt/ostback
 530 cfs21:~# mount -t lustre cfs21:/back /mnt/back</screen>
 531         </listitem>
 532         <listitem>
 533           <para>Note the old directory contents, as of the snapshot time.  For example:</para>
 534           <screen>cfs21:~/cfs/b1_5/lustre/utils# ls /mnt/back
 535 fstab  passwds</screen>
 536         </listitem>
 537       </orderedlist>
 538     </section>
 539     <section remap="h3">
 540         <title><indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>deleting</tertiary></indexterm>Deleting Old Snapshots</title>
 541       <para>To reclaim disk space, you can erase old snapshots as your backup policy dictates. Run:</para>
 542       <screen>lvremove /dev/vgmain/MDT0.b1</screen>
 543     </section>
 544     <section remap="h3">
 545       <title><indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>resizing</tertiary></indexterm>Changing Snapshot Volume Size</title>
 546       <para>You can also extend or shrink snapshot volumes if you find your daily deltas are smaller or larger than expected. Run:</para>
 547       <screen>lvextend -L10G /dev/vgmain/MDT0.b1</screen>
 548       <note>
 549         <para>Extending snapshots seems to be broken in older LVM. It is working in LVM v2.02.01.</para>
 550       </note>
 551     </section>
 552   </section>
 553 </chapter>