1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="zfssnapshots" condition='l2A'>
5 <title xml:id="zfssnapshots.title">Lustre ZFS Snapshots</title>
6 <para>This chapter describes the ZFS Snapshot feature support in Lustre and
7 contains following sections:</para>
10 <para><xref linkend="zfssnapshotIntro"/></para>
13 <para><xref linkend="zfssnapshotConfig"/></para>
16 <para><xref linkend="zfssnapshotOps"/></para>
19 <para><xref linkend="zfssnapshotBarrier"/></para>
22 <para><xref linkend="zfssnapshotLogs"/></para>
25 <para><xref linkend="zfssnapshotLustreLogs"/></para>
28 <section xml:id="zfssnapshotIntro">
29 <title><indexterm><primary>Introduction</primary>
30 </indexterm>Introduction</title>
31 <para>Snapshots provide fast recovery of files from a previously created
32 checkpoint without recourse to an offline backup or remote replica.
33 Snapshots also provide a means to version-control storage, and can be used
34 to recover lost files or previous versions of files.</para>
35 <para>Filesystem snapshots are intended to be mounted on user-accessible
36 nodes, such as login nodes, so that users can restore files (e.g. after
37 accidental delete or overwrite) without administrator intervention. It
38 would be possible to mount the snapshot filesystem(s) via automount when
39 users access them, rather than mounting all snapshots, to reduce overhead
40 on login nodes when the snapshots are not in use.</para>
41 <para>Recovery of lost files from a snapshot is usually considerably
42 faster than from any offline backup or remote replica. However, note that
43 snapshots do not improve storage reliability and are just as exposed to
44 hardware failure as any other storage volume.</para>
45 <section xml:id="zfssnapshotsReq">
46 <title><indexterm><primary>Introduction</primary>
47 <secondary>Requirements</secondary></indexterm>Requirements
49 <para>All Lustre server targets must be ZFS file systems running
50 Lustre version 2.10 or later. In addition, the MGS must be able to
51 communicate via ssh or another remote access protocol, without
52 password authentication, to all other servers.</para>
53 <para>The feature is enabled by default and cannot be disabled. The
54 management of snapshots is done through <literal>lctl</literal>
55 commands on the MGS.</para>
56 <para>Lustre snapshot is based on Copy-On-Write; the snapshot and file
57 system may share a single copy of the data until a file is changed on
58 the file system. The snapshot will prevent the space of deleted or
59 overwritten files from being released until the snapshot(s)
60 referencing those files is deleted. The file system administrator
61 needs to establish a snapshot create/backup/remove policy according to
62 their system’s actual size and usage.</para>
65 <section xml:id="zfssnapshotConfig">
66 <title><indexterm><primary>feature overview</primary>
67 <secondary>configuration</secondary></indexterm>Configuration
69 <para>The snapshot tool loads system configuration from the
70 <literal>/etc/ldev.conf</literal> file on the MGS and calls related
71 ZFS commands to maintian the Lustre snapshot pieces on all targets
72 (MGS/MDT/OST). Please note that the <literal>/etc/ldev.conf</literal>
73 file is used for other purposes as well.</para>
74 <para>The format of the file is:</para>
75 <screen><host> foreign/- <label> <device> [journal-path]/- [raidtab]</screen>
76 <para>The format of <literal><label></literal> is:</para>
77 <screen>fsname-<role><index> or <role><index></screen>
78 <para>The format of <device> is:</para>
79 <screen>[md|zfs:][pool_dir/]<pool>/<filesystem></screen>
80 <para>Snapshot only uses the fields <host>, <label> and
81 <device>.</para>
82 <para>Example 1:</para>
83 <screen>mgs# cat /etc/ldev.conf
84 host-mdt1 - myfs-MDT0000 zfs:/tmp/myfs-mdt1/mdt1
85 host-mdt2 - myfs-MDT0001 zfs:myfs-mdt2/mdt2
86 host-ost1 - OST0000 zfs:/tmp/myfs-ost1/ost1
87 host-ost2 - OST0001 zfs:myfs-ost2/ost2</screen>
88 <para>Example 2:</para>
90 <emphasis role="bold">For the given mounted MGS/OST on single node:</emphasis>
91 singlenode# mount | grep "lustre-[m|o]"
92 lustre-mdt1/mdt1 on /mnt/lustre-mds1 type lustre (rw,svname=lustre-MDT0000,mgs,osd=osd-zfs)
93 lustre-ost1/ost1 on /mnt/lustre-ost1 type lustre (rw,svname=lustre-OST0000, mgsnode=x.x.x.x@tcp, osd=osd-zfs)
95 <emphasis role="bold">The corresponding /etc/ldev.conf would be</emphasis>
96 singlenode# cat /etc/ldev.conf
97 centos79z1 - lustre-MDT0000 zfs:/tmp/lustre-mdt1/mdt1 - -
98 centos79z1 - lustre-OST0000 zfs:/tmp/lustre-ost1/ost1 - -
100 <emphasis role="Bold">Where:</emphasis>
102 <informaltable frame="all">
104 <colspec colwidth="50*"/>
105 <colspec colwidth="50*"/>
109 <para><emphasis role="bold">Fields</emphasis></para>
112 <para><emphasis role="bold">Description</emphasis></para>
117 <row><entry>centos79z1</entry><entry>Hostname</entry></row>
118 <row><entry>-</entry><entry>Not Used</entry></row>
119 <row><entry>lustre-OST0000</entry><entry>Device label</entry></row>
120 <row><entry>zfs:/tmp/lustre-ost1/ost1</entry><entry>Device Name</entry></row>
121 <row><entry>-</entry><entry>Not Used</entry></row>
122 <row><entry>-</entry><entry>Not Used</entry></row>
127 <para>The configuration file is edited manually.</para>
128 <para> Once the configuration file is updated to reflect the current
129 file system setup, you are ready to create a file system snapshot.
132 <section xml:id="zfssnapshotOps">
133 <title><indexterm><primary>operations</primary>
134 </indexterm>Snapshot Operations</title>
135 <section xml:id="zfssnapshotCreate">
136 <title><indexterm><primary>operations</primary>
137 <secondary>create</secondary></indexterm>Creating a Snapshot
139 <para>To create a snapshot of an existing Lustre file system, run the
140 following <literal>lctl</literal> command on the MGS:</para>
141 <screen>lctl snapshot_create [-b | --barrier [on | off]] [-c | --comment
142 comment] -F | --fsname fsname> [-h | --help] -n | --name ssname>
143 [-r | --rsh remote_shell][-t | --timeout timeout]</screen>
144 <informaltable frame="all">
146 <colspec colname="c1" colwidth="50*"/>
147 <colspec colname="c2" colwidth="50*"/>
151 <para><emphasis role="bold">Option</emphasis></para>
154 <para><emphasis role="bold">Description</emphasis></para>
161 <para> <literal>-b</literal></para>
164 <para>set write barrier before creating snapshot. The
165 default value is 'on'.</para>
170 <para> <literal>-c</literal></para>
173 <para>a description for the purpose of the snapshot
179 <para> <literal>-F</literal></para>
182 <para>the filesystem name</para>
187 <para> <literal>-h</literal></para>
190 <para>help information</para>
195 <para> <literal>-n</literal></para>
198 <para>the name of the snapshot</para>
203 <para> <literal>-r</literal></para>
206 <para>the remote shell used for communication with
207 remote target. The default value is 'ssh'</para>
212 <para> <literal>-t</literal></para>
215 <para>the lifetime (seconds) for write barrier. The
216 default value is 30 seconds</para>
223 <section xml:id="zfssnapshotDelete">
224 <title><indexterm><primary>operations</primary>
225 <secondary>delete</secondary></indexterm>Delete a Snapshot
227 <para>To delete an existing snapshot, run the following
228 <literal>lctl</literal> command on the MGS:</para>
229 <screen>lctl snapshot_destroy [-f | --force] <-F | --fsname fsname>
230 <-n | --name ssname> [-r | --rsh remote_shell]</screen>
231 <informaltable frame="all">
233 <colspec colname="c1" colwidth="50*"/>
234 <colspec colname="c2" colwidth="50*"/>
238 <para><emphasis role="bold">Option</emphasis></para>
241 <para><emphasis role="bold">Description</emphasis>
249 <para> <literal>-f</literal></para>
252 <para>destroy the snapshot by force</para>
257 <para> <literal>-F</literal></para>
260 <para>the filesystem name</para>
265 <para> <literal>-h</literal></para>
268 <para>help information</para>
273 <para> <literal>-n</literal></para>
276 <para>the name of the snapshot</para>
281 <para> <literal>-r</literal></para>
284 <para>the remote shell used for communication with
285 remote target. The default value is 'ssh'</para>
292 <section xml:id="zfssnapshotMount">
293 <title><indexterm><primary>operations</primary>
294 <secondary>mount</secondary></indexterm>Mounting a Snapshot
296 <para>Snapshots are treated as separate file systems and can be mounted on
297 Lustre clients. The snapshot file system must be mounted as a
298 read-only file system with the <literal>-o ro</literal> option.
299 If the <literal>mount</literal> command does not include the read-only
300 option, the mount will fail.</para>
301 <note><para>Before a snapshot can be mounted on the client, the snapshot
302 must first be mounted on the servers using the <literal>lctl</literal>
303 utility.</para></note>
304 <para>To mount a snapshot on the server, run the following lctl command
306 <screen>lctl snapshot_mount <-F | --fsname fsname> [-h | --help]
307 <-n | --name ssname> [-r | --rsh remote_shell]</screen>
308 <informaltable frame="all">
310 <colspec colname="c1" colwidth="50*"/>
311 <colspec colname="c2" colwidth="50*"/>
315 <para><emphasis role="bold">Option</emphasis></para>
318 <para><emphasis role="bold">Description</emphasis>
326 <para> <literal>-F</literal></para>
329 <para>the filesystem name</para>
334 <para> <literal>-h</literal></para>
337 <para>help information</para>
342 <para> <literal>-n</literal></para>
345 <para>the name of the snapshot</para>
350 <para> <literal>-r</literal></para>
353 <para>the remote shell used for communication with
354 remote target. The default value is 'ssh'</para>
360 <para>After the successful mounting of the snapshot on the server, clients
361 can now mount the snapshot as a read-only filesystem. For example, to
362 mount a snapshot named <replaceable>snapshot_20170602</replaceable> for a
363 filesystem named <replaceable>myfs</replaceable>, the following mount
364 command would be used:</para>
365 <screen>mgs# lctl snapshot_mount -F myfs -n snapshot_20170602</screen>
366 <para>After mounting on the server, use
367 <literal>lctl snapshot_list</literal> to get the fsname for the snapshot
368 itself as follows:</para>
369 <screen>ss_fsname=$(lctl snapshot_list -F myfs -n snapshot_20170602 |
370 awk '/^snapshot_fsname/ { print $2 }')</screen>
371 <para>Finally, mount the snapshot on the client:</para>
372 <screen>mount -t lustre -o ro $MGS_nid:/$ss_fsname $local_mount_point</screen>
374 <section xml:id="zfssnapshotUnmount">
375 <title><indexterm><primary>operations</primary>
376 <secondary>unmount</secondary></indexterm>Unmounting a Snapshot
378 <para>To unmount a snapshot from the servers, first unmount the snapshot
379 file system from all clients, using the standard <literal>umount</literal>
380 command on each client. For example, to unmount the snapshot file system
381 named <replaceable>snapshot_20170602</replaceable> run the following
382 command on each client that has it mounted:</para>
383 <screen>client# umount $local_mount_point</screen>
384 <para>After all clients have unmounted the snapshot file system, run the
385 following <literal>lctl</literal>command on a server node where the
386 snapshot is mounted:</para>
387 <screen>lctl snapshot_umount [-F | --fsname fsname] [-h | --help]
388 <-n | -- name ssname> [-r | --rsh remote_shell]</screen>
389 <informaltable frame="all">
391 <colspec colname="c1" colwidth="50*"/>
392 <colspec colname="c2" colwidth="50*"/>
396 <para><emphasis role="bold">Option</emphasis></para>
399 <para><emphasis role="bold">Description</emphasis>
407 <para> <literal>-F</literal></para>
410 <para>the filesystem name</para>
415 <para> <literal>-h</literal></para>
418 <para>help information</para>
423 <para> <literal>-n</literal></para>
426 <para>the name of the snapshot</para>
431 <para> <literal>-r</literal></para>
434 <para>the remote shell used for communication with
435 remote target. The default value is 'ssh'</para>
441 <para>For example:</para>
442 <screen>lctl snapshot_umount -F myfs -n snapshot_20170602</screen>
444 <section xml:id="zfssnapshotList">
445 <title><indexterm><primary>operations</primary>
446 <secondary>list</secondary></indexterm>List Snapshots
448 <para>To list the available snapshots for a given file system, use the
449 following <literal>lctl</literal> command on the MGS:</para>
450 <screen>lctl snapshot_list [-d | --detail] <-F | --fsname fsname>
451 [-h | -- help] [-n | --name ssname] [-r | --rsh remote_shell]</screen>
452 <informaltable frame="all">
454 <colspec colname="c1" colwidth="50*"/>
455 <colspec colname="c2" colwidth="50*"/>
459 <para><emphasis role="bold">Option</emphasis></para>
462 <para><emphasis role="bold">Description</emphasis>
470 <para> <literal>-d</literal></para>
473 <para>list every piece for the specified snapshot
479 <para> <literal>-F</literal></para>
482 <para>the filesystem name</para>
487 <para> <literal>-h</literal></para>
490 <para>help information</para>
495 <para> <literal>-n</literal></para>
498 <para>the snapshot's name. If the snapshot name is not
499 supplied, all snapshots for this file system will be
505 <para> <literal>-r</literal></para>
508 <para>the remote shell used for communication with
509 remote target. The default value is 'ssh'</para>
516 <section xml:id="zfssnapshotModify">
517 <title><indexterm><primary>operations</primary>
518 <secondary>modify</secondary></indexterm>Modify Snapshot Attributes
520 <para>Currently, Lustre snapshot has five user visible attributes;
521 snapshot name, snapshot comment, create time, modification time, and
522 snapshot file system name. Among them, the former two attributes can be
523 modified. Renaming follows the general ZFS snapshot name rules, such as
524 the maximum length is 256 bytes, cannot conflict with the reserved names,
526 <para>To modify a snapshot’s attributes, use the following
527 <literal>lctl</literal> command on the MGS:</para>
528 <screen>lctl snapshot_modify [-c | --comment comment]
529 <-F | --fsname fsname> [-h | --help] <-n | --name ssname>
530 [-N | --new new_ssname] [-r | --rsh remote_shell]</screen>
531 <informaltable frame="all">
533 <colspec colname="c1" colwidth="50*"/>
534 <colspec colname="c2" colwidth="50*"/>
538 <para><emphasis role="bold">Option</emphasis></para>
541 <para><emphasis role="bold">Description</emphasis>
549 <para> <literal>-c</literal></para>
552 <para>update the snapshot's comment</para>
557 <para> <literal>-F</literal></para>
560 <para>the filesystem name</para>
565 <para> <literal>-h</literal></para>
568 <para>help information</para>
573 <para> <literal>-n</literal></para>
576 <para>the snapshot's name</para>
581 <para> <literal>-N</literal></para>
584 <para>rename the snapshot's name as
585 <replaceable>new_ssname</replaceable></para>
590 <para> <literal>-r</literal></para>
593 <para>the remote shell used for communication with
594 remote target. The default value is 'ssh'</para>
602 <section xml:id="zfssnapshotBarrier">
603 <title><indexterm><primary>barrier</primary>
604 </indexterm>Global Write Barriers</title>
605 <para>Snapshots are non-atomic across multiple MDTs and OSTs, which means
606 that if there is activity on the file system while a snapshot is being
607 taken, there may be user-visible namespace inconsistencies with files
608 created or destroyed in the interval between the MDT and OST snapshots.
609 In order to create a consistent snapshot of the file system, we are able
610 to set a global write barrier, or “freeze” the system. Once set, all
611 metadata modifications will be blocked until the write barrier is actively
612 removed (“thawed”) or expired. The user can set a timeout parameter on a
613 global barrier or the barrier can be explicitly removed. The default
614 timeout period is 30 seconds.</para>
615 <para>It is important to note that snapshots are usable without the global
616 barrier. Only files that are currently being modified by clients (write,
617 create, unlink) may be inconsistent as noted above if the barrier is not
618 used. Other files not curently being modified would be usable even
619 without the barrier.</para>
620 <para>The snapshot create command will call the write barrier internally
621 when requested using the <literal>-b</literal> option to
622 <literal>lctl snapshot_create</literal>. So, explicit use of the barrier
623 is not required when using snapshots but included here as an option to
624 quiet the file system before a snapshot is created.</para>
625 <section xml:id="zfssnapshotBarrierImpose">
626 <title><indexterm><primary>barrier</primary>
627 <secondary>impose</secondary></indexterm>Impose Barrier
629 <para>To impose a global write barrier, run the
630 <literal>lctl barrier_freeze</literal> command on the MGS:</para>
631 <screen>lctl barrier_freeze <fsname> [timeout (in seconds)]
632 where timeout default is 30.</screen>
633 <para>For example, to freeze the filesystem
634 <replaceable>testfs</replaceable> for <literal>15</literal> seconds:
636 <screen>mgs# lctl barrier_freeze testfs 15</screen>
637 <para>If the command is successful, there will be no output from
638 the command. Otherwise, an error message will be printed.</para>
640 <section xml:id="zfssnapshotBarrierRemove">
641 <title><indexterm><primary>barrier</primary>
642 <secondary>remove</secondary></indexterm>Remove Barrier
644 <para>To remove a global write barrier, run the
645 <literal>lctl barrier_thaw</literal> command on the MGS:</para>
646 <screen>lctl barrier_thaw <fsname></screen>
647 <para>For example, to thaw the write barrier for the filesystem
648 <replaceable>testfs</replaceable>:
650 <screen>mgs# lctl barrier_thaw testfs</screen>
651 <para>If the command is successful, there will be no output from
652 the command. Otherwise, an error message will be printed.</para>
654 <section xml:id="zfssnapshotBarrierQuery">
655 <title><indexterm><primary>barrier</primary>
656 <secondary>query</secondary></indexterm>Query Barrier
658 <para>To see how much time is left on a global write barrier, run the
659 <literal>lctl barrier_stat</literal> command on the MGS:</para>
660 <screen># lctl barrier_stat <fsname></screen>
661 <para>For example, to stat the write barrier for the filesystem
662 <replaceable>testfs</replaceable>:
664 <screen>mgs# lctl barrier_stat testfs
665 The barrier for testfs is in 'frozen'
666 The barrier will be expired after 7 seconds</screen>
667 <para>If the command is successful, a status from the table below
668 will be printed. Otherwise, an error message will be printed.</para>
669 <para>The possible status and related meanings for the write barrier
670 are as follows:</para>
671 <table frame="all" xml:id="writebarrierstatus.tab1">
672 <title>Write Barrier Status</title>
674 <colspec colname="c1" colwidth="50*"/>
675 <colspec colname="c2" colwidth="50*"/>
679 <para><emphasis role="bold">Status</emphasis>
683 <para><emphasis role="bold">Meaning</emphasis>
691 <para> <literal>init</literal></para>
694 <para>barrier has never been set on the system
700 <para> <literal>freezing_p1</literal></para>
703 <para>In the first stage of setting the write
709 <para> <literal>freezing_p2</literal></para>
712 <para> the second stage of setting the write
718 <para> <literal>frozen</literal></para>
721 <para>the write barrier has been set successfully
727 <para> <literal>thawing</literal></para>
730 <para>In thawing the write barrier</para>
735 <para> <literal>thawed</literal></para>
738 <para>The write barrier has been thawed</para>
743 <para> <literal>failed</literal></para>
746 <para>Failed to set write barrier</para>
751 <para> <literal>expired</literal></para>
754 <para>The write barrier is expired</para>
759 <para> <literal>rescan</literal></para>
762 <para>In scanning the MDTs status, see the command
763 <literal>barrier_rescan</literal></para>
768 <para> <literal>unknown</literal></para>
771 <para>Other cases</para>
777 <para>If the barrier is in ’freezing_p1’, ’freezing_p2’ or ’frozen’
778 status, then the remaining lifetime will be returned also.</para>
780 <section xml:id="zfssnapshotBarrierRescan">
781 <title><indexterm><primary>barrier</primary>
782 <secondary>rescan</secondary></indexterm>Rescan Barrier
784 <para> To rescan a global write barrier to check which MDTs are
785 active, run the <literal>lctl barrier_rescan</literal> command on the
787 <screen>lctl barrier_rescan <fsname> [timeout (in seconds)],
788 where the default timeout is 30 seconds.</screen>
789 <para>For example, to rescan the barrier for filesystem
790 <replaceable>testfs</replaceable>:</para>
791 <screen>mgs# lctl barrier_rescan testfs
792 1 of 4 MDT(s) in the filesystem testfs are inactive</screen>
793 <para>If the command is successful, the number of MDTs that are
794 unavailable against the total MDTs will be reported. Otherwise, an
795 error message will be printed.</para>
798 <section xml:id="zfssnapshotLogs">
799 <title><indexterm><primary>logs</primary>
800 </indexterm>Snapshot Logs</title>
801 <para>A log of all snapshot activity can be found in the following file:
802 <literal>/var/log/lsnapshot.log</literal>. This file contains information
803 on when a snapshot was created, an attribute was changed, when it was
804 mounted, and other snapshot information.</para>
805 <para>The following is a sample <literal>/var/log/lsnapshot</literal>
807 <screen>Mon Mar 21 19:43:06 2016
808 (15826:jt_snapshot_create:1138:scratch:ssh): Create snapshot lss_0_0
809 successfully with comment <(null)>, barrier <enable>, timeout <30>
810 Mon Mar 21 19:43:11 2016(13030:jt_snapshot_create:1138:scratch:ssh):
811 Create snapshot lss_0_1 successfully with comment <(null)>, barrier
812 <disable>, timeout <-1>
813 Mon Mar 21 19:44:38 2016 (17161:jt_snapshot_mount:2013:scratch:ssh):
814 The snapshot lss_1a_0 is mounted
815 Mon Mar 21 19:44:46 2016
816 (17662:jt_snapshot_umount:2167:scratch:ssh): the snapshot lss_1a_0
818 Mon Mar 21 19:47:12 2016
819 (20897:jt_snapshot_destroy:1312:scratch:ssh): Destroy snapshot
820 lss_2_0 successfully with force <disable></screen>
822 <section xml:id="zfssnapshotLustreLogs">
823 <title><indexterm><primary>configlogs</primary>
824 </indexterm>Lustre Configuration Logs</title>
825 <para>A snapshot is independent from the original file system that it is
826 derived from and is treated as a new file system name that can be mounted
827 by Lustre client nodes. The file system name is part of the configuration
828 log names and exists in configuration log entries. Two commands exist to
829 manipulate configuration logs: <literal>lctl fork_lcfg</literal> and
830 <literal>lctl erase_lcfg</literal>.</para>
831 <para>The snapshot commands will use configuration log functionality
832 internally when needed. So, use of the barrier is not required to use
833 snapshots but included here as an option. The following configuration log
834 commands are independent of snapshots and can be used independent of
836 <para>To fork a configuration log, run the following
837 <literal>lctl</literal> command on the MGS:</para>
838 <screen>lctl fork_lcfg</screen>
839 <para>Usage: fork_lcfg <fsname> <newname></para>
840 <para>To erase a configuration log, run the following
841 <literal>lctl</literal> command on the MGS:</para>
842 <screen>lctl erase_lcfg</screen>
843 <para>Usage: erase_lcfg <fsname></para>
847 vim:expandtab:shiftwidth=2:tabstop=8: