Whamcloud - gitweb
LUDOC-431 lnet: doc for asymmetrical route checking
[doc/manual.git] / LustreDebugging.xml
index fcd030d..66f1960 100644 (file)
   <section xml:id="dbdoclet.50438274_15874">
     <title><indexterm><primary>debugging</primary></indexterm>
 Diagnostic and Debugging Tools</title>
-    <para>A variety of diagnostic and analysis tools are available to debug issues with the Lustre software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project.</para>
+    <para>A variety of diagnostic and analysis tools are available to debug
+      issues with the Lustre software. Some of these are provided in Linux
+      distributions, while others have been developed and are made available
+      by the Lustre project.</para>
     <section remap="h3" xml:id="section_dms_q21_kk">
       <title><indexterm>
           <primary>debugging</primary>
           <secondary>tools</secondary>
         </indexterm> Lustre Debugging Tools</title>
-      <para>The following in-kernel debug mechanisms are incorporated into the Lustre
-        software:</para>
+      <para>The following in-kernel debug mechanisms are incorporated into
+        the Lustre software:</para>
       <itemizedlist>
         <listitem>
-          <para xml:id="para_fkj_rld_hk"><emphasis role="bold">Debug logs</emphasis> - A circular
-            debug buffer to which Lustre internal debug messages are written (in contrast to error
-            messages, which are printed to the syslog or console). Entries to the Lustre debug log
-            are controlled by the mask set by <literal>/proc/sys/lnet/debug</literal>. The log size
-            defaults to 5 MB per CPU but can be increased as a busy system will quickly overwrite 5
-            MB. When the buffer fills, the oldest information is discarded.</para>
+          <para xml:id="para_fkj_rld_hk"><emphasis role="bold">Debug logs</emphasis>
+            - A circular debug buffer to which Lustre internal debug messages
+            are written (in contrast to error messages, which are printed to the
+            syslog or console). Entries in the Lustre debug log are controlled
+            by a mask set by <literal>lctl set_param debug=<replaceable>mask</replaceable></literal>.
+            The log size defaults to 5 MB per CPU but can be increased as a
+            busy system will quickly overwrite 5 MB. When the buffer fills,
+            the oldest log records are discarded.</para>
         </listitem>
         <listitem>
-          <para><emphasis role="bold">Debug daemon</emphasis> - The debug daemon controls logging of
-            debug messages.</para>
+          <para><emphasis role="bold">
+              <literal>lctl get_param debug</literal>
+            </emphasis> - This shows the current debug mask used to delimit
+            the debugging information written out to the kernel debug logs.
+          </para>
+        </listitem>
+        <listitem>
+          <para><emphasis role="bold">
+              <literal>lctl debug_kernel <replaceable>file</replaceable></literal>
+            </emphasis> - Dump the Lustre kernel debug log to the specified
+            file as ASCII text for further debugging and analysis.
+          </para>
         </listitem>
         <listitem>
           <para><emphasis role="bold">
-              <literal>/proc/sys/lnet/debug</literal>
-            </emphasis> - This file contains a mask that can be used to delimit the debugging
-            information written out to the kernel debug logs.</para>
+              <literal>lctl set_param debug_mb=<replaceable>size</replaceable></literal>
+            </emphasis> - This sets the maximum size of the in-kernel Lustre
+            debug buffer, in units of MiB.
+          </para>
+        </listitem>
+        <listitem>
+          <para><emphasis role="bold">Debug daemon</emphasis>
+            - The debug daemon controls the continuous logging of debug
+            messages to a log file in userspace.</para>
         </listitem>
       </itemizedlist>
       <para>The following tools are also provided with the Lustre software:</para>
@@ -49,9 +70,10 @@ Diagnostic and Debugging Tools</title>
         <listitem>
           <para><emphasis role="bold">
               <literal>lctl</literal>
-            </emphasis> - This tool is used with the debug_kernel option to manually dump the Lustre
-            debugging log or post-process debugging logs that are dumped automatically. For more
-            information about the lctl tool, see <xref linkend="dbdoclet.50438274_62472"/> and <xref
+            </emphasis> - This tool is used with the debug_kernel option to
+            manually dump the Lustre debugging log or post-process debugging
+            logs that are dumped automatically. For more information about the
+            lctl tool, see <xref linkend="dbdoclet.50438274_62472"/> and <xref
               linkend="dbdoclet.50438219_38274"/>.</para>
         </listitem>
         <listitem>
@@ -127,9 +149,9 @@ Diagnostic and Debugging Tools</title>
            packet inspection tool that allows debugging of information that was
            sent between the various Lustre nodes. This tool is built on top of
            <literal>tcpdump</literal> and can read packet dumps generated by
-           it.  There are plug-ins available to dissassemble the LNET and
+           it.  There are plug-ins available to dissassemble the LNet and
            Lustre protocols.  They are located within the <link
-           xl:href="http://git.hpdd.intel.com/">Lustre git repository</link>
+           xl:href="http://git.whamcloud.com/">Lustre git repository</link>
            under <literal>lustre/contrib/wireshark/</literal>.  Installation
            instruction are included in that directory. See also <link
            xl:href="http://www.wireshark.org/">Wireshark Website</link> for
@@ -240,7 +262,7 @@ Diagnostic and Debugging Tools</title>
                         <para> <emphasis role="bold">trace</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Entry/Exit markers</para>
+                        <para> Function entry/exit markers</para>
                       </entry>
                     </row>
                     <row>
@@ -248,7 +270,7 @@ Diagnostic and Debugging Tools</title>
                         <para> <emphasis role="bold">dlmtrace</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Locking-related information</para>
+                        <para> Distributed locking-related information</para>
                       </entry>
                     </row>
                     <row>
@@ -269,50 +291,58 @@ Diagnostic and Debugging Tools</title>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">ext2</emphasis></para>
+                        <para> <emphasis role="bold">malloc</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Anything from the ext2_debug</para>
+                        <para> Memory allocation or free information</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">malloc</emphasis></para>
+                        <para> <emphasis role="bold">cache</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Print malloc or free information</para>
+                        <para> Cache-related information</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">cache</emphasis></para>
+                        <para> <emphasis role="bold">info</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Cache-related information</para>
+                        <para> Non-critical general information</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">info</emphasis></para>
+                        <para> <emphasis role="bold">dentry</emphasis></para>
                       </entry>
                       <entry>
-                        <para> General information</para>
+                        <para> kernel namespace cache handling</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">ioctl</emphasis></para>
+                        <para> <emphasis role="bold">mmap</emphasis></para>
                       </entry>
                       <entry>
-                        <para> IOCTL-related information</para>
+                        <para> Memory-mapped IO interface</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">blocks</emphasis></para>
+                        <para> <emphasis role="bold">page</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Ext2 block allocation information</para>
+                        <para> Page cache and bulk data transfers</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para> <emphasis role="bold">info</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para> Miscellaneous informational messages</para>
                       </entry>
                     </row>
                     <row>
@@ -320,7 +350,15 @@ Diagnostic and Debugging Tools</title>
                         <para> <emphasis role="bold">net</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Networking</para>
+                        <para> LNet network related debugging</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para> <emphasis role="bold">console</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Significant system events, printed to console</para>
                       </entry>
                     </row>
                     <row>
@@ -328,63 +366,106 @@ Diagnostic and Debugging Tools</title>
                         <para> <emphasis role="bold">warning</emphasis></para>
                       </entry>
                       <entry>
-                        <para> &#160;</para>
+                        <para>Significant but non-fatal exceptions, printed
+                          to console</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">buffs</emphasis></para>
+                        <para> <emphasis role="bold">error</emphasis></para>
                       </entry>
                       <entry>
-                        <para> &#160;</para>
+                        <para>Critical error messages, printed to console</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">other</emphasis></para>
+                        <para> <emphasis role="bold">neterror</emphasis></para>
                       </entry>
                       <entry>
-                        <para> &#160;</para>
+                        <para>Significant LNet error messages</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">dentry</emphasis></para>
+                        <para> <emphasis role="bold">emerg</emphasis></para>
                       </entry>
                       <entry>
-                        <para> &#160;</para>
+                        <para>Fatal system errors, printed to console</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">portals</emphasis></para>
+                        <para><emphasis role="bold">config</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Entry/Exit markers</para>
+                        <para>Configuration and setup, enabled by default</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">page</emphasis></para>
+                        <para><emphasis role="bold">ha</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Bulk page handling</para>
+                        <para>Failover and recovery-related information,
+                        enabled by default</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">error</emphasis></para>
+                        <para><emphasis role="bold">hsm</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Error messages</para>
+                        <para>Hierarchical space management/tiering</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">emerg</emphasis></para>
+                        <para><emphasis role="bold">ioctl</emphasis></para>
                       </entry>
                       <entry>
-                        <para> &#160;</para>
+                        <para>IOCTL-related information, enabled by default</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">layout</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>File layout handling (PFL, FLR, DoM)</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">lfsck</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Filesystem consistency checking, enabled by
+                        default</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">other</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Miscellaneious other debug messages</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">quota</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Space accounting and management</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">reada</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Client readahead management</para>
                       </entry>
                     </row>
                     <row>
@@ -392,15 +473,31 @@ Diagnostic and Debugging Tools</title>
                         <para> <emphasis role="bold">rpctrace</emphasis></para>
                       </entry>
                       <entry>
-                        <para> For distributed debugging</para>
+                        <para>Remote request/reply tracing and debugging</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">sec</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Security, Kerberos, Shared Secret Key handling</para>
+                      </entry>
+                    </row>
+                    <row>
+                      <entry>
+                        <para><emphasis role="bold">snapshot</emphasis></para>
+                      </entry>
+                      <entry>
+                        <para>Filesystem snapshot management</para>
                       </entry>
                     </row>
                     <row>
                       <entry>
-                        <para> <emphasis role="bold">ha</emphasis></para>
+                        <para><emphasis role="bold">vfstrace</emphasis></para>
                       </entry>
                       <entry>
-                        <para> Failover and recovery-related information</para>
+                        <para>Kernel VFS interface operations</para>
                       </entry>
                     </row>
                   </tbody>
@@ -839,7 +936,7 @@ lctl&gt; debug_kernel [<replaceable>filename</replaceable>] </screen>
                   </emphasis></para>
               </entry>
               <entry>
-                <para> Behaves similarly to <literal>CERROR()</literal>, but prints error messages for LNET if <literal>D_NETERR</literal> is set in the <literal>debug</literal> mask. This is appropriate for serious networking errors. Messages printed to the console are rate-limited.</para>
+                <para> Behaves similarly to <literal>CERROR()</literal>, but prints error messages for LNet if <literal>D_NETERR</literal> is set in the <literal>debug</literal> mask. This is appropriate for serious networking errors. Messages printed to the console are rate-limited.</para>
               </entry>
             </row>
             <row>
@@ -988,7 +1085,7 @@ lctl&gt; debug_kernel [<replaceable>filename</replaceable>] </screen>
     <section remap="h3">
       <title>Accessing the <literal>ptlrpc</literal> Request History</title>
       <para>Each service maintains a request history, which can be useful for first occurrence troubleshooting.</para>
-      <para><literal>ptlrpc</literal> is an RPC protocol layered on LNET that deals with stateful servers and has semantics and built-in support for recovery.</para>
+      <para><literal>ptlrpc</literal> is an RPC protocol layered on LNet that deals with stateful servers and has semantics and built-in support for recovery.</para>
       <para>The ptlrpc request history works as follows:</para>
       <orderedlist>
         <listitem>
@@ -1001,7 +1098,7 @@ lctl&gt; debug_kernel [<replaceable>filename</replaceable>] </screen>
           <para>Buffers are culled from the service request buffer history if it has grown above <literal>req_buffer_history_max</literal> and its reqs are removed from the service request history.</para>
         </listitem>
       </orderedlist>
-      <para>Request history is accessed and controlled using the following /proc files under the service directory:</para>
+      <para>Request history is accessed and controlled using the following parameters for each service:</para>
       <itemizedlist>
         <listitem>
           <para><literal>req_buffer_history_len </literal></para>