Whamcloud - gitweb
FIX: validation, ulink -> link
[doc/manual.git] / LustreMonitoring.xml
1 <?xml version='1.0' encoding='UTF-8'?>
2 <!-- This document was created with Syntext Serna Free. -->
3 <chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="lustremonitoring">
4   <info>
5     <title xml:id="lustremonitoring.title">Lustre Monitoring</title>
6   </info>
7   <para>This chapter provides information on monitoring Lustre and includes the following sections:</para>
8   <itemizedlist>
9     <listitem>
10       <para><xref linkend="dbdoclet.50438273_18711"/>Lustre Changelogs</para>
11     </listitem>
12     <listitem>
13       <para><xref linkend="dbdoclet.50438273_81684"/>Lustre Monitoring Tool</para>
14     </listitem>
15     <listitem>
16       <para><xref linkend="dbdoclet.50438273_80593"/>CollectL</para>
17     </listitem>
18     <listitem>
19       <para><xref linkend="dbdoclet.50438273_44185"/>Other Monitoring Options</para>
20     </listitem>
21   </itemizedlist>
22   <section xml:id="dbdoclet.50438273_18711">
23     <title>12.1 Lustre <anchor xml:id="dbdoclet.50438273_marker-1297383" xreflabel=""/>Changelogs</title>
24     <para>The changelogs feature records events that change the file system namespace or file metadata. Changes such as file creation, deletion, renaming, attribute changes, etc. are recorded with the target and parent file identifiers (FIDs), the name of the target, and a timestamp. These records can be used for a variety of purposes:</para>
25     <itemizedlist>
26       <listitem>
27         <para> Capture recent changes to feed into an archiving system.</para>
28       </listitem>
29       <listitem>
30         <para> Use changelog entries to exactly replicate changes in a file system mirror.</para>
31       </listitem>
32       <listitem>
33         <para> Set up &quot;watch scripts&quot; that take action on certain events or directories.</para>
34       </listitem>
35       <listitem>
36         <para> Maintain a rough audit trail (file/directory changes with timestamps, but no user information).</para>
37       </listitem>
38     </itemizedlist>
39     <para>Changelogs record types are:</para>
40     <informaltable frame="all">
41       <tgroup cols="2">
42         <colspec colname="c1" colwidth="50*"/>
43         <colspec colname="c2" colwidth="50*"/>
44         <thead>
45           <row>
46             <entry>
47               <para><emphasis role="bold">Value</emphasis></para>
48             </entry>
49             <entry>
50               <para><emphasis role="bold">Description</emphasis></para>
51             </entry>
52           </row>
53         </thead>
54         <tbody>
55           <row>
56             <entry>
57               <para> MARK</para>
58             </entry>
59             <entry>
60               <para> Internal recordkeeping</para>
61             </entry>
62           </row>
63           <row>
64             <entry>
65               <para> CREAT</para>
66             </entry>
67             <entry>
68               <para> Regular file creation</para>
69             </entry>
70           </row>
71           <row>
72             <entry>
73               <para> MKDIR</para>
74             </entry>
75             <entry>
76               <para> Directory creation</para>
77             </entry>
78           </row>
79           <row>
80             <entry>
81               <para> HLINK</para>
82             </entry>
83             <entry>
84               <para> Hard link</para>
85             </entry>
86           </row>
87           <row>
88             <entry>
89               <para> SLINK</para>
90             </entry>
91             <entry>
92               <para> Soft link</para>
93             </entry>
94           </row>
95           <row>
96             <entry>
97               <para> MKNOD</para>
98             </entry>
99             <entry>
100               <para> Other file creation</para>
101             </entry>
102           </row>
103           <row>
104             <entry>
105               <para> UNLNK</para>
106             </entry>
107             <entry>
108               <para> Regular file removal</para>
109             </entry>
110           </row>
111           <row>
112             <entry>
113               <para> RMDIR</para>
114             </entry>
115             <entry>
116               <para> Directory removal</para>
117             </entry>
118           </row>
119           <row>
120             <entry>
121               <para> RNMFM</para>
122             </entry>
123             <entry>
124               <para> Rename, original</para>
125             </entry>
126           </row>
127           <row>
128             <entry>
129               <para> RNMTO</para>
130             </entry>
131             <entry>
132               <para> Rename, final</para>
133             </entry>
134           </row>
135           <row>
136             <entry>
137               <para> IOCTL</para>
138             </entry>
139             <entry>
140               <para> ioctl on file or directory</para>
141             </entry>
142           </row>
143           <row>
144             <entry>
145               <para> TRUNC</para>
146             </entry>
147             <entry>
148               <para> Regular file truncated</para>
149             </entry>
150           </row>
151           <row>
152             <entry>
153               <para> SATTR</para>
154             </entry>
155             <entry>
156               <para> Attribute change</para>
157             </entry>
158           </row>
159           <row>
160             <entry>
161               <para> XATTR</para>
162             </entry>
163             <entry>
164               <para> Extended attribute change</para>
165             </entry>
166           </row>
167           <row>
168             <entry>
169               <para> UNKNW</para>
170             </entry>
171             <entry>
172               <para> Unknown operation</para>
173             </entry>
174           </row>
175         </tbody>
176       </tgroup>
177     </informaltable>
178     <para>FID-to-full-pathname and pathname-to-FID functions are also included to map target and parent FIDs into the file system namespace.</para>
179     <section remap="h3">
180       <title>12.1.1 Working with Changelogs</title>
181       <para>Several commands are available to work with changelogs.</para>
182       <section remap="h5">
183         <title>lctl changelog_register</title>
184         <para>Because changelog records take up space on the MDT, the system administration must register changelog users. The registrants specify which records they are &quot;done with&quot;, and the system purges up to the greatest common record.</para>
185         <para>To register a new changelog user, run:</para>
186         <screen>lctl --device &lt;mdt_device&gt; changelog_register
187 </screen>
188         <para>Changelog entries are not purged beyond a registered user&apos;s set point (see <literal>lfs changelog_clear</literal>).</para>
189       </section>
190       <section remap="h5">
191         <title>lfs changelog</title>
192         <para>To display the metadata changes on an MDT (the changelog records), run:</para>
193         <screen>lfs changelog &lt;MDT name&gt; [startrec [endrec]] 
194 </screen>
195         <para>It is optional whether to specify the start and end records.</para>
196         <para>These are sample changelog records:</para>
197         <screen>2 02MKDIR 4298396676 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\
198  pics 
199 3 01CREAT 4298402264 0x0 t=[0x200000405:0x15fa:0x0] p=[0x200000405:0x15f9:0\
200 x0] chloe.jpg 
201 4 06UNLNK 4298404466 0x0 t=[0x200000405:0x15fa:0x0] p=[0x200000405:0x15f9:0\
202 x0] chloe.jpg 
203 5 07RMDIR 4298405394 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\
204  pics 
205 </screen>
206       </section>
207       <section remap="h5">
208         <title>lfs changelog_clear</title>
209         <para>To clear old changelog records for a specific user (records that the user no longer needs), run:</para>
210         <screen>lfs changelog_clear &lt;MDT name&gt; &lt;user ID&gt; &lt;endrec&gt;
211 </screen>
212         <para>The <literal>changelog_clear</literal> command indicates that changelog records previous to &lt;endrec&gt; are no longer of interest to a particular user &lt;user ID&gt;, potentially allowing the MDT to free up disk space. An <literal>&lt;endrec&gt;</literal> value of 0 indicates the current last record. To run <literal>changelog_clear</literal>, the changelog user must be registered on the MDT node using <literal>lctl</literal>.</para>
213         <para>When all changelog users are done with records &lt; X, the records are deleted.</para>
214       </section>
215       <section remap="h5">
216         <title>lctl changelog_deregister</title>
217         <para>To deregister (unregister) a changelog user, run:</para>
218         <screen>lctl --device &lt;mdt_device&gt; changelog_deregister &lt;user ID&gt;       </screen>
219         <para> <literal>changelog_deregister cl1</literal> effectively does a <literal>changelog_clear cl1 0</literal> as it deregisters.</para>
220       </section>
221     </section>
222     <section remap="h3">
223       <title>12.1.2 Changelog Examples</title>
224       <para>This section provides examples of different changelog commands.</para>
225       <section remap="h5">
226         <title>Registering a Changelog User</title>
227         <para>To register a new changelog user for a device (<literal>lustre-MDT0000</literal>):</para>
228         <screen># lctl --device lustre-MDT0000 changelog_register
229 lustre-MDT0000: Registered changelog userid &apos;cl1&apos;
230 </screen>
231       </section>
232       <section remap="h5">
233         <title>Displaying Changelog Records</title>
234         <para>To display changelog records on an MDT (<literal>lustre-MDT0000</literal>):</para>
235         <screen>$ lfs changelog lustre-MDT0000
236 1 00MARK  19:08:20.890432813 2010.03.24 0x0 t=[0x10001:0x0:0x0] p=[0:0x0:0x\
237 0] mdd_obd-lustre-MDT0000-0 
238 2 02MKDIR 19:10:21.509659173 2010.03.24 0x0 t=[0x200000420:0x3:0x0] p=[0x61\
239 b4:0xca2c7dde:0x0] mydir 
240 3 14SATTR 19:10:27.329356533 2010.03.24 0x0 t=[0x200000420:0x3:0x0] 
241 4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\
242 0000420:0x3:0x0] hosts 
243 </screen>
244         <para>Changelog records include this information:</para>
245         <screen>rec# 
246 operation_type(numerical/text) 
247 timestamp 
248 datestamp 
249 flags 
250 t=target_FID 
251 p=parent_FID 
252 target_name
253 </screen>
254         <para>Displayed in this format:</para>
255         <screen>rec# operation_type(numerical/text) timestamp datestamp flags t=target_FID \
256 p=parent_FID target_name
257 </screen>
258         <para>For example:</para>
259         <screen>4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\
260 0000420:0x3:0x0] hosts
261 </screen>
262       </section>
263       <section remap="h5">
264         <title>Clearing Changelog Records</title>
265         <para>To notify a device that a specific user (<literal>cl1</literal>) no longer needs records (up to and including 3):</para>
266         <screen>$ lfs changelog_clear  lustre-MDT0000 cl1 3
267 </screen>
268         <para>To confirm that the <literal>changelog_clear</literal> operation was successful, run <literal>lfs changelog</literal>; only records after id-3 are listed:</para>
269         <screen>$ lfs changelog lustre-MDT0000
270 4 01CREAT 19:10:37.113847713 2010.03.24 0x0 t=[0x200000420:0x4:0x0] p=[0x20\
271 0000420:0x3:0x0] hosts
272 </screen>
273       </section>
274       <section remap="h5">
275         <title>Deregistering a Changelog User</title>
276         <para>To deregister a changelog user (<literal>cl1</literal>) for a specific device (<literal>lustre-MDT0000</literal>):</para>
277         <screen># lctl --device lustre-MDT0000 changelog_deregister cl1
278 lustre-MDT0000: Deregistered changelog user &apos;cl1&apos;
279 </screen>
280         <para>The deregistration operation clears all changelog records for the specified user (<literal>cl1</literal>).</para>
281         <screen>$ lfs changelog lustre-MDT0000
282 5 00MARK  19:13:40.858292517 2010.03.24 0x0 t=[0x40001:0x0:0x0] p=[0:0x0:0x\
283 0] mdd_obd-lustre-MDT0000-0 
284 </screen>
285         <informaltable frame="none">
286           <tgroup cols="1">
287             <colspec colname="c1" colwidth="100*"/>
288             <tbody>
289               <row>
290                 <entry>
291                   <para><emphasis role="bold">Note -</emphasis>MARK records typically indicate changelog recording status changes.</para>
292                 </entry>
293               </row>
294             </tbody>
295           </tgroup>
296         </informaltable>
297       </section>
298       <section remap="h5">
299         <title>Displaying the Changelog Index and Registered Users</title>
300         <para>To display the current, maximum changelog index and registered changelog users for a specific device (<literal>lustre-MDT0000</literal>):</para>
301         <screen># lctl get_param  mdd.lustre-MDT0000.changelog_users 
302 mdd.lustre-MDT0000.changelog_users=current index: 8 
303 ID    index 
304 cl2   8
305 </screen>
306       </section>
307       <section remap="h5">
308         <title>Displaying the Changelog Mask</title>
309         <para>To show the current changelog mask on a specific device (<literal>lustre-MDT0000</literal>):</para>
310         <screen># lctl get_param  mdd.lustre-MDT0000.changelog_mask 
311
312 mdd.lustre-MDT0000.changelog_mask= 
313 MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RNMFM RNMTO OPEN CLOSE IOCTL\
314  TRUNC SATTR XATTR HSM 
315 </screen>
316       </section>
317       <section remap="h5">
318         <title>Setting the Changelog Mask</title>
319         <para>To set the current changelog mask on a specific device (<literal>lustre-MDT0000</literal>):</para>
320         <screen># lctl set_param mdd.lustre-MDT0000.changelog_mask=HLINK 
321 mdd.lustre-MDT0000.changelog_mask=HLINK 
322 $ lfs changelog_clear lustre-MDT0000 cl1 0 
323 $ mkdir /mnt/lustre/mydir/foo
324 $ cp /etc/hosts /mnt/lustre/mydir/foo/file
325 $ ln /mnt/lustre/mydir/foo/file /mnt/lustre/mydir/myhardlink
326 </screen>
327         <para> Only item types that are in the mask show up in the changelog.</para>
328         <screen>$ lfs changelog lustre-MDT0000
329 9 03HLINK 19:19:35.171867477 2010.03.24 0x0 t=[0x200000420:0x6:0x0] p=[0x20\
330 0000420:0x3:0x0] myhardlink
331 </screen>
332       </section>
333     </section>
334   </section>
335   <section xml:id="dbdoclet.50438273_81684">
336     <title>12.2 Lustre <anchor xml:id="dbdoclet.50438273_marker-1297386" xreflabel=""/>Monitoring Tool</title>
337     <para>The Lustre Monitoring Tool (LMT) is a Python-based, distributed system developed and maintained by Lawrence Livermore National Lab (LLNL)). It provides a &apos;&apos;top&apos;&apos; like display of activity on server-side nodes (MDS, OSS and portals routers) on one or more Lustre file systems. It does not provide support for monitoring clients. For more information on LMT, including the setup procedure, see:</para>
338     <para><link xl:href="http://code.google.com/p/lmt/">http://code.google.com/p/lmt/</link></para>
339     <para>LMT questions can be directed to:</para>
340     <para><link xl:href="mailto:lmt-discuss@googlegroups.com">lmt-discuss@googlegroups.com</link></para>
341   </section>
342   <section xml:id="dbdoclet.50438273_80593">
343     <title>12.3 Collect<anchor xml:id="dbdoclet.50438273_marker-1297391" xreflabel=""/>L</title>
344     <para>CollectL is another tool that can be used to monitor Lustre. You can run CollectL on a Lustre system that has any combination of MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and played back at a later time. It can also be converted to a format suitable for plotting.</para>
345     <para>For more information about CollectL, see:</para>
346     <para><link xl:href="http://collectl.sourceforge.net">http://collectl.sourceforge.net</link></para>
347     <para>Lustre-specific documentation is also available. See:</para>
348     <para><link xl:href="http://collectl.sourceforge.net/Tutorial-Lustre.html">http://collectl.sourceforge.net/Tutorial-Lustre.html</link></para>
349   </section>
350   <section xml:id="dbdoclet.50438273_44185">
351     <title>12.4 Other Monitoring Options</title>
352     <para>A variety of standard tools are available publically.</para>
353     <para>Another option is to script a simple monitoring solution that looks at various reports from ipconfig, as well as the procfs files generated by Lustre.</para>
354     <para><anchor xml:id="dbdoclet.50438273_67514" xreflabel=""/>&#160;</para>
355   </section>
356 </chapter>