1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="lustrehsm" condition='l25'>
3 <title xml:id="lustrehsm.title">Hierarchical Storage Management (HSM)</title>
4 <para>This chapter describes how to bind Lustre to a Hierarchical Storage Management (HSM) solution.</para>
5 <section xml:id="hsm_introduction">
7 <indexterm><primary>Hierarchical Storage Management (HSM)</primary><secondary>introduction</secondary></indexterm>Introduction</title>
8 <para>The Lustre file system can bind to a Hierarchical Storage Management (HSM)
9 solution using a specific set of functions. These functions enable connecting
10 a Lustre file system to one or more external storage systems, typically HSMs.
11 With a Lustre file system bound to a HSM solution, the Lustre file system acts
12 as a high speed cache in front of these slower HSM storage systems. </para>
14 <para>The Lustre file system integration with HSM provides a mechanism for
15 files to simultaneously exist in a HSM solution and have a metadata entry in
16 the Lustre file system that can be examined. Reading, writing or truncating the
17 file will trigger the file data to be fetched from the HSM storage back into
18 the Lustre file system.</para>
20 <para>The process of copying a file into the HSM storage is known as
21 <emphasis>archive</emphasis>. Once the archive is complete, the Lustre file
22 data can be deleted (known as <emphasis>release</emphasis>.) The process of
23 returning data from the HSM storage to the Lustre file system is called
24 <emphasis>restore</emphasis>. The archive and restore operations require a
25 Lustre file system component called an <emphasis>Agent</emphasis>. </para>
27 <para>An Agent is a specially designed Lustre client node that mounts the
28 Lustre file system in question. On an Agent, a user space program called a
29 copytool is run to coordinate the archive and restore of files between the
30 Lustre file system and the HSM solution.</para>
32 <para>Requests to restore a given file are registered and dispatched by a
33 facet on the MDT called the Coordinator.
35 <figure xml:id='hsmcopytoolfig'>
36 <title>Overview of the Lustre file system HSM</title>
39 <imagedata fileref='figures/HSM_copytool.svg' format='svg'/>
46 <section xml:id="hsmsetup">
48 <indexterm><primary>HSM</primary><secondary>setup</secondary></indexterm>Setup</title>
50 <section xml:id='hsmrequirements'>
52 <indexterm><primary>HSM</primary><secondary>requirements</secondary></indexterm>Requirements
54 <para>To setup a Lustre/HSM configuration you need:</para>
57 <para>a standard Lustre file system (version 2.5.0 and above)</para>
60 <para>a minimum of 2 clients, 1 used for your chosen computation task that generates
61 useful data, and 1 used as an agent.</para>
64 <para>Multiple agents can be employed. All the agents need to share access
65 to their backend storage. For the POSIX copytool, a POSIX namespace like NFS or
66 another Lustre file system is suitable.</para>
69 <section xml:id='hsmcoordinator'>
71 <indexterm><primary>HSM</primary><secondary>coordinator</secondary></indexterm>Coordinator
74 <para>To bind a Lustre file system to a HSM system a coordinator
75 must be activated on each of your filesystem MDTs. This can be achieved with the command:</para>
76 <screen>$ lctl set_param mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm_control=enabled
77 mdt.lustre-MDT0000.hsm_control=enabled</screen>
78 <para>To verify that the coordinator is running correctly</para>
80 <screen>$ lctl get_param mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm_control
81 mdt.lustre-MDT0000.hsm_control=enabled</screen>
84 <section xml:id='hsmagents'>
86 <indexterm><primary>HSM</primary><secondary>agents</secondary></indexterm>Agents
89 <para>Once a coordinator is started, launch the copytool on each agent node to connect to your HSM storage. If your HSM storage has POSIX access this command will be of the form:</para>
90 <screen>lhsmtool_posix --daemon --hsm-root <replaceable>$HSMPATH</replaceable> --archive=1 <replaceable>$LUSTREPATH</replaceable></screen>
91 <para>The POSIX copytool must be stopped by sending it a TERM signal.</para>
96 <section xml:id="hsmagentsandcopytool">
98 <indexterm><primary>HSM</primary><secondary>agents and copytools</secondary></indexterm>Agents and copytool</title>
101 Agents are Lustre file system clients running copytool. copytool is a userspace
102 daemon that transfers data between Lustre and a HSM solution. Because different
103 HSM solutions use different APIs, copytools can typically only work with a
104 specific HSM. Only one copytool can be run by an agent node.</para>
106 <para>The following rule applies regarding copytool instances: a Lustre file
107 system only supports a single copytool process, per ARCHIVE ID (see below),
108 per client node. Due to a Lustre software limitation, this constraint is
109 irrespective of the number of Lustre file systems mounted by the Agent.</para>
111 <para>Bundled with Lustre tools, the POSIX copytool can work with any HSM or
112 external storage that exports a POSIX API. </para>
114 <section xml:id='hsmarchivebackends'>
116 <indexterm><primary>HSM</primary><secondary>archiveID backends</secondary></indexterm>Archive ID, multiple backends
119 <para>A Lustre file system can be bound to several different HSM solutions.
120 Each bound HSM solution is identified by a number referred to as ARCHIVE ID. A
121 unique value of ARCHIVE ID must be chosen for each bound HSM solution. ARCHIVE
122 ID must be in the range 1 to 32.</para>
124 <para>A Lustre file system supports an unlimited number of copytool instances.
125 You need, at least, one copytool per ARCHIVE ID. When using the POSIX copytool,
126 this ID is defined using <literal>--archive</literal> switch.</para>
128 <para>For example: if a single Lustre file system is bound to 2 different HSMs (A and B,) ARCHIVE ID “1” can be chosen for HSM A and ARCHIVE ID “2” for HSM B. If you start 3 copytool instances for ARCHIVE ID 1, all of them will use Archive ID “1”. The same rule applies for copytool instances dealing with the HSM B, using Archive ID “2”. </para>
130 <para>When issuing HSM requests, you can use the <literal>--archive</literal> switch
131 to choose the backend you want to use. In this example, file <literal>foo</literal> will be
132 archived into backend ARCHIVE ID “5”:</para>
134 <screen>$ lfs hsm_archive --archive=5 /mnt/lustre/foo</screen>
136 <para>A default ARCHIVE ID can be defined which will be used when the <literal>--archive</literal> switch is not specified:</para>
138 <screen>$ lctl set_param -P mdt.<replaceable>lustre-MDT0000</replaceable>.hsm.default_archive_id=5</screen>
140 <para>The ARCHIVE ID of archived files can be checked using <literal>lfs
141 hsm_state</literal> command:</para>
143 <screen>$ lfs hsm_state /mnt/lustre/foo
144 /mnt/lustre/foo: (0x00000009) exists archived, archive_id:5</screen>
148 <section xml:id='hsmregisteredagents'>
150 <indexterm><primary>HSM</primary><secondary>registered agents</secondary></indexterm>Registered agents
153 <para>A Lustre file system allocates a unique UUID per client mount point, for each
154 filesystem. Only one copytool can be registered for each Lustre mount point.
155 As a consequence, the UUID uniquely identifies a copytool, per filesystem.</para>
157 <para>The currently registered copytool instances (agents UUID) can be retrieved by running the following command, per MDT, on MDS nodes:</para>
159 <screen>$ lctl get_param -n mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm.agents
160 uuid=a19b2416-0930-fc1f-8c58-c985ba5127ad archive_id=1 requests=[current:0 ok:0 errors:0]</screen>
162 <para>The returned fields have the following meaning:</para>
165 <para><literal>uuid</literal> the client mount used by the corresponding copytool.</para>
168 <para><literal>archive_id</literal> comma-separated list of ARCHIVE IDs accessible by this copytool.</para>
171 <para><literal>requests</literal> various statistics on the number of requests processed by this copytool.</para>
177 <section xml:id='hsmtimeout'>
179 <indexterm><primary>HSM</primary><secondary>timeout</secondary></indexterm>Timeout
182 <para>One or more copytool instances may experience conditions that
183 cause them to become unresponsive. To avoid blocking access to the related
184 files a timeout value is defined for request processing. A copytool must be
185 able to fully complete a request within this time. The default is 3600 seconds.
187 <screen>$ lctl set_param -n mdt.<replaceable>lustre-MDT0000</replaceable>.hsm.active_request_timeout
194 <section xml:id='hsmrequests'>
196 <indexterm><primary>HSM</primary><secondary>requests</secondary></indexterm>Requests
199 <para>Data management between a Lustre file system and HSM solutions is driven by requests. There are five types:</para>
203 <para><literal>ARCHIVE</literal> Copy data from a Lustre file system file into the HSM solution.</para>
206 <para><literal>RELEASE</literal> Remove file data from the Lustre file system.</para>
209 <para><literal>RESTORE</literal> Copy back data from the HSM solution into the corresponding Lustre file system file.</para>
212 <para><literal>REMOVE</literal> Delete the copy of the data from the HSM solution.</para>
215 <para><literal>CANCEL</literal> Cancel an in-progress or pending request.</para>
219 <para>Only the <literal>RELEASE</literal> is performed synchronously and
220 does not involve the coordinator. Other requests are handled by Coordinators.
221 Each MDT coordinator is resiliently managing them.</para>
223 <section xml:id='hsmcommands'>
225 <indexterm><primary>HSM</primary><secondary>commands</secondary></indexterm>Commands
228 <para>Requests are submitted using <literal>lfs</literal> command:</para>
229 <screen>$ lfs hsm_archive [--archive=<replaceable>ID</replaceable>] <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]
230 $ lfs hsm_release <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]
231 $ lfs hsm_restore <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]
232 $ lfs hsm_remove <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]
235 <para>Requests are sent to the default ARCHIVE ID unless an ARCHIVE ID is specified with the <literal>--archive</literal> option (See <xref linkend="hsmarchivebackends"/>).</para>
238 <section xml:id='hsmautorestore'>
240 <indexterm><primary>HSM</primary><secondary>automatic restore</secondary></indexterm>Automatic restore
243 <para>Released files are automatically restored when a process tries to read or modify them. The corresponding I/O will block waiting for the file to be restored. This is transparent to the process. For example, the following command automatically restores the file if released.</para>
244 <screen>$ cat <replaceable>/mnt/lustre/released_file</replaceable></screen>
247 <section xml:id='hsmrequestmonitoring'>
249 <indexterm><primary>HSM</primary><secondary>request monitoring</secondary></indexterm>Request monitoring
252 <para>The list of registered requests and their status can be monitored, per MDT, with the following command:</para>
254 <screen>$ lctl get_param -n mdt.<replaceable>lustre-MDT0000</replaceable>.hsm.actions</screen>
256 <para>The list of requests currently being processed by a copytool is available with:</para>
258 <screen>$ lctl get_param -n mdt.<replaceable>lustre-MDT0000</replaceable>.hsm.active_requests</screen>
263 <section xml:id='hsmfilestates'>
265 <indexterm><primary>HSM</primary><secondary>file states</secondary></indexterm>File states
268 <para>When files are archived or released, their state in the Lustre file system changes. This state can be read using the following <literal>lfs</literal> command:</para>
270 <screen>$ lfs hsm_state <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]</screen>
272 <para>There is also a list of specific policy flags which could be set to have a per-file specific policy:
277 <para><literal>NOARCHIVE</literal> This file will never be archived.</para>
280 <para><literal>NORELEASE</literal> This file will never be released. This value cannot be set if the flag is currently set to <literal>RELEASED</literal></para>
283 <para><literal>DIRTY</literal> This file has been modified since a copy of it was made in the HSM solution. <literal>DIRTY</literal> files should be archived again. The <literal>DIRTY</literal> flag can only be set if <literal>EXIST</literal> is set.</para>
287 <para>The following options can only be set by the root user.</para>
292 <para><literal>LOST</literal> This file was previously archived but the
293 copy was lost on the HSM solution for some reason in the backend (for example,
294 by a corrupted tape), and could not be restored. If the file is not in the
295 <literal>RELEASE</literal> state it needs to be archived again. If the file
296 is in the <literal>RELEASE</literal> state, the file data is lost.</para>
301 <para>Some flags can be manually set or cleared using the following commands:</para>
303 <screen>$ lfs hsm_set [<replaceable>FLAGS</replaceable>] <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]
304 $ lfs hsm_clear [<replaceable>FLAGS</replaceable>] <replaceable>FILE1</replaceable> [<replaceable>FILE2</replaceable>...]</screen>
308 <section xml:id='hsmtuning'>
310 <indexterm><primary>HSM</primary><secondary>tuning</secondary></indexterm>Tuning
313 <section xml:id='hsmhsm_control'>
315 <indexterm><primary>HSM</primary><secondary>hsm_control</secondary></indexterm><literal>hsm_controlpolicy</literal>
318 <para><literal>hsm_control</literal> controls coordinator activity and can also purge the action list.</para>
320 <screen>$ lctl set_param mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm_control=purge</screen>
322 <para>Possible values are:</para>
326 <para><literal>enabled</literal> Start coordinator thread. Requests are dispatched on available copytool instances.</para>
329 <para><literal>disabled</literal> Pause coordinator activity. No new request will be scheduled. No timeout will be handled. New requests will be registered but will be handled only when the coordinator is enabled again.</para>
332 <para><literal>shutdown</literal> Stop coordinator thread. No request can be submitted.</para>
335 <para><literal>purge</literal> Clear all recorded requests. Do not change coordinator state.</para>
341 <section xml:id='hsmmax_requests'>
343 <indexterm><primary>HSM</primary><secondary>max_requests</secondary></indexterm><literal>max_requests</literal>
346 <para><literal>max_requests</literal> is the maximum number of active
347 requests at the same time. This is a per coordinator value, and independent of
348 the number of agents.</para>
350 <para>For example, if 2 MDT and 4 agents are present, the agents will never have to handle more than 2 x <literal>max_requests</literal>.</para>
352 <screen>$ lctl set_param mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm.max_requests=10</screen>
356 <section xml:id='hsmpolicy'>
358 <indexterm><primary>HSM</primary><secondary>policy</secondary></indexterm><literal>policy</literal>
361 <para>Change system behavior. Values can be added or removed by prefixing them with '+' or '-'.</para>
363 <screen>$ lctl set_param mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm.policy=+NRA</screen>
365 <para>Possible values are a combination of:</para>
369 <para><literal>NRA</literal> No Retry Action. If a restore fails, do not reschedule it automatically.</para>
372 <para><literal>NBR</literal> Non Blocking Restore. No automatic restore is triggered. Access to a released file returns <literal>ENODATA</literal>.</para>
377 <section xml:id='hsmgrace_delay'>
379 <indexterm><primary>HSM</primary><secondary>grace_delay</secondary></indexterm><literal>grace_delay</literal>
382 <para><literal>grace_delay</literal> is the delay, expressed in seconds,
383 before a successful or failed request is cleared from the whole request
386 <screen>$ lctl set_param mdt.<replaceable>$FSNAME-MDT0000</replaceable>.hsm.grace_delay=10</screen>
391 <section xml:id='hsmchangelogs'>
393 <indexterm><primary>HSM</primary><secondary>changelogs</secondary></indexterm>change logs
396 <para>A changelog record type “HSM“ was added for Lustre file system
397 logs that relate to HSM events.</para>
398 <screen>16HSM 13:49:47.469433938 2013.10.01 0x280 t=[0x200000400:0x1:0x0]</screen>
400 <para>Two items of information are available for each HSM record: the
401 FID of the modified file and a bit mask. The bit mask codes the following
402 information (lowest bits first):</para>
406 <para>Error code, if any (7 bits)</para>
409 <para>HSM event (3 bits)</para>
412 <para><literal>HE_ARCHIVE = 0</literal> File has been archived.</para>
415 <para><literal>HE_RESTORE = 1</literal> File has been restored.</para>
418 <para><literal>HE_CANCEL = 2</literal> A request for this file has been canceled.</para>
421 <para><literal>HE_RELEASE = 3</literal> File has been released.</para>
424 <para><literal>HE_REMOVE = 4</literal> A remove request has been executed automatically.</para>
427 <para><literal>HE_STATE = 5</literal> File flags have changed.</para>
432 <para>HSM flags (3 bits)</para>
435 <para><literal>CLF_HSM_DIRTY=0x1</literal></para>
440 <para>In the above example, <literal>0x280</literal> means the error code is 0 and the event is HE_STATE.</para>
442 <para>When using <literal>liblustreapi</literal>, there is a list of helper functions to easily extract the different values from this bitmask, like: <literal>hsm_get_cl_event()</literal>, <literal>hsm_get_cl_flags()</literal>, and <literal>hsm_get_cl_error()</literal></para>
446 <section xml:id='hsmpolicyengine'>
448 <indexterm><primary>HSM</primary><secondary>policy engine</secondary></indexterm>Policy engine
451 <para>A Lustre file system does not have an internal component responsible for automatically scheduling archive requests and release requests under any conditions (like low space). Automatically scheduling archive operations is the role of the policy engine.</para>
453 <para>It is recommended that the Policy Engine run on a dedicated client, similar to an agent node, with a Lustre version 2.5+.</para>
455 <para>A policy engine is a userspace program using the Lustre file system HSM specific API to monitor the file system and schedule requests.</para>
457 <para>Robinhood is the recommended policy engine.</para>
460 <section xml:id='hsmrobinhood'>
462 <indexterm><primary>HSM</primary><secondary>robinhood</secondary></indexterm>Robinhood
465 <para>Robinhood is a Policy engine and reporting tool for large file
466 systems. It maintains a replicate of file system metadata in a database that
467 can be queried at will. Robinhood makes it possible to schedule mass action on
468 file system entries by defining attribute-based policies, provides fast <literal>find</literal>
469 and <literal>du</literal> enhanced clones, and provides administrators with an overall
470 view of file system content through a web interface and command line tools.</para>
472 <para>Robinhood can be used for various configurations. Robinhood is an external project, and further information can be found on the project website: <link xl:href='https://sourceforge.net/apps/trac/robinhood/wiki/Doc'>https://sourceforge.net/apps/trac/robinhood/wiki/Doc</link>.</para>