From ed90fdff764e0ccd032e1a141d4cf7fe13e58425 Mon Sep 17 00:00:00 2001 From: Richard Henwood Date: Wed, 18 May 2011 11:54:24 -0500 Subject: [PATCH] FIX: xrefs and tidying --- LustreDebugging.xml | 218 +++++++++++++++------------------------------------- 1 file changed, 64 insertions(+), 154 deletions(-) diff --git a/LustreDebugging.xml b/LustreDebugging.xml index aba6f5a..a50c983 100644 --- a/LustreDebugging.xml +++ b/LustreDebugging.xml @@ -1,32 +1,26 @@ -
+
- Lustre Debugging + Lustre Debugging + This chapter describes tips and information to debug Lustre, and includes the following sections: - Diagnostic and Debugging Tools + + - + + - Lustre Debugging Procedures - - - - - - Lustre Debugging for Developers - - - + + -
- <anchor xml:id="dbdoclet.50438274_pgfId-1295665" xreflabel=""/> -
- 28.1 <anchor xml:id="dbdoclet.50438274_15874" xreflabel=""/>Diagnostic and Debugging Tools + +
+ 28.1 Diagnostic and Debugging Tools A variety of diagnostic and analysis tools are available to debug issues with the Lustre software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project.
<anchor xml:id="dbdoclet.50438274_pgfId-1295667" xreflabel=""/>28.1.1 Lustre Debugging Tools @@ -34,41 +28,29 @@ Debug logs - A circular debug buffer to which Lustre internal debug messages are written (in contrast to error messages, which are printed to the syslog or console). Entries to the Lustre debug log are controlled by the mask set by /proc/sys/lnet/debug. The log size defaults to 5 MB per CPU but can be increased as a busy system will quickly overwrite 5 MB. When the buffer fills, the oldest information is discarded. - - - + Debug daemon - The debug daemon controls logging of debug messages. - - - + /proc/sys/lnet/debug - This file contains a mask that can be used to delimit the debugging information written out to the kernel debug logs. - - - + The following tools are also provided with the Lustre software: - lctl - This tool is used with the debug_kernel option to manually dump the Lustre debugging log or post-process debugging logs that are dumped automatically. For more information about the lctl tool, see Using the lctl Tool to View Debug Messages and lctl. - - - + lctl - This tool is used with the debug_kernel option to manually dump the Lustre debugging log or post-process debugging logs that are dumped automatically. For more information about the lctl tool, see and (lctl). + Lustre subsystem asserts - A panic-style assertion (LBUG) in the kernel causes Lustre to dump the debug log to the file /tmp/lustre-log.<timestamp> where it can be retrieved after a reboot. For more information, see Viewing Error Messages. - - - + lfs - This utility provides access to the extended attributes (EAs) of a Lustre file (along with other information). For more inforamtion about lfs, see lfs. - - - +
@@ -80,47 +62,33 @@ strace . This tool allows a system call to be traced. - - - + /var/log/messages . syslogd prints fatal or serious messages at this log. - - - + Crash dumps . On crash-dump enabled kernels, sysrq c produces a crash dump. Lustre enhances this crash dump with a log dump (the last 64 KB of the log) to the console. - - - + debugfs . Interactive file system debugger. - - - + The following logging and data collection tools can be used to collect information for debugging Lustre kernel issues: kdump . A Linux kernel crash utility useful for debugging a system running Red Hat Enterprise Linux. For more information about kdump, see the Red Hat knowledge base article How do I configure kexec/kdump on Red Hat Enterprise Linux 5?. To download kdump, go to the Fedora Project Download site. - - - + netconsole . Enables kernel-level network logging over UDP. A system requires (SysRq) allows users to collect relevant data through netconsole. - - - + netdump . A crash dump utility from Red Hat that allows memory images to be dumped over a network to a central server for analysis. The netdump utility was replaced by kdump in RHEL 5. For more information about netdump, see Red Hat, Inc.'s Network Console and Crash Dump Facility. - - - +
@@ -130,97 +98,66 @@ leak_finder.pl . This program provided with Lustre is useful for finding memory leaks in the code. - - - + A virtual machine is often used to create an isolated development and test environment. Some commonly-used virtual machines are: VirtualBox Open Source Edition . Provides enterprise-class virtualization capability for all major platforms and is available free at Get Sun VirtualBox. - - - + VMware Server . Virtualization platform available as free introductory software at Download VMware Server. - - - + Xen . A para-virtualized environment with virtualization capabilities similar to VMware Server and Virtual Box. However, Xen allows the use of modified kernels to provide near-native performance and the ability to emulate shared storage. For more information, go to xen.org. - - - + A variety of debuggers and analysis tools are available including: kgdb . The Linux Kernel Source Level Debugger kgdb is used in conjunction with the GNU Debugger gdb for debugging the Linux kernel. For more information about using kgdb with gdb, see Chapter 6. Running Programs Under gdb in the Red Hat Linux 4 Debugging with GDB guide. - - - + crash . Used to analyze saved crash dump data when a system had panicked or locked up or appears unresponsive. For more information about using crash to analyze a crash dump, see: - - Red Hat Magazine article: A quick overview of Linux kernel crash dump analysis - - - + Crash Usage: A Case Study from the white paper Red Hat Crash Utility by David Anderson - - - + Kernel Trap forum entry: Linux: Kernel Crash Dumps - - - + White paper: A Quick Overview of Linux Kernel Crash Dump Analysis - - - +
-
- 28.2 <anchor xml:id="dbdoclet.50438274_23607" xreflabel=""/>Lustre Debugging Procedures +
+ 28.2 Lustre Debugging Procedures The procedures below may be useful to administrators or developers debugging a Lustre files system.
<anchor xml:id="dbdoclet.50438274_pgfId-1295735" xreflabel=""/>28.2.1 Understanding the Lustre Debug Messaging Format - Lustre debug messages are categorized by originating sybsystem, message type, and locaton in the source code. For a list of subsystems and message types, see Lustre Debug Messages. - - - - - - Note -For a current list of subsystems and debug message types, see lnet/include/libcfs/libcfs.h in the Lustre tree - - - - - The elements of a Lustre debug message are described in Format of Lustre Debug Messages. + Lustre debug messages are categorized by originating sybsystem, message type, and locaton in the source code. For a list of subsystems and message types, see . + For a current list of subsystems and debug message types, see lnet/include/libcfs/libcfs.h in the Lustre tree + The elements of a Lustre debug message are described in Format of Lustre Debug Messages.
<anchor xml:id="dbdoclet.50438274_pgfId-1295747" xreflabel=""/>28.2.1.1 <anchor xml:id="dbdoclet.50438274_57603" xreflabel=""/>Lustre <anchor xml:id="dbdoclet.50438274_marker-1295746" xreflabel=""/>Debug Messages Each Lustre debug message has the tag of the subsystem it originated in, the message type, and the location in the source code. The subsystems and debug types used in Lustre are as follows: Standard Subsystems: - - - + mdc, mds, osc, ost, obdclass, obdfilter, llite, ptlrpc, portals, lnd, ldlm, lov @@ -395,35 +332,21 @@ Obtain a list of all the types and subsystems: - - - + lctl > debug_list <subs | types> Filter the debug log: - - - + lctl > filter <subsystem name | debug type> - - - - - - Note -When lctl filters, it removes unwanted lines from the displayed output. This does not affect the contents of the debug log in the kernel's memory. As a result, you can print the log many times with different filtering levels without worrying about losing data. - - - - + When lctl filters, it removes unwanted lines from the displayed output. This does not affect the contents of the debug log in the kernel's memory. As a result, you can print the log many times with different filtering levels without worrying about losing data. + Show debug messages belonging to certain subsystem or type: - - - + lctl > show <subsystem name | debug type> debug_kernel pulls the data from the kernel logs, filters it appropriately, and displays or saves it as per the specified options @@ -433,9 +356,7 @@ Filter a log on disk, if you already have a debug log saved to disk (likely from a crash): - - - + lctl > debug_file <input filename> [output filename] @@ -448,22 +369,11 @@ Completely flush the kernel debug buffer: - - - + lctl > clear - - - - - - Note -Debug messages displayed with lctl are also subject to the kernel debug masks; the filters are additive. - - - - + Debug messages displayed with lctl are also subject to the kernel debug masks; the filters are additive.
<anchor xml:id="dbdoclet.50438274_pgfId-1295915" xreflabel=""/>28.2.2.1 Sample lctl<anchor xml:id="dbdoclet.50438274_marker-1295914" xreflabel=""/>Run Below is a sample run using the lctl command. @@ -611,8 +521,8 @@
-
- 28.3 <anchor xml:id="dbdoclet.50438274_80443" xreflabel=""/>Lustre Debugging for Developers +
+ 28.3 Lustre Debugging for Developers The procedures in this section may be useful to developers debugging Lustre code.
<anchor xml:id="dbdoclet.50438274_pgfId-1296027" xreflabel=""/>28.3.1 Adding Debugging to the <anchor xml:id="dbdoclet.50438274_marker-1296026" xreflabel=""/>Lustre Source Code @@ -701,33 +611,31 @@ Each service maintains a request history, which can be useful for first occurrence troubleshooting. Ptlrpc is an RPC protocol layered on LNET that deals with stateful servers and has semantics and built-in support for recovery. A prlrpc request history works as follows: - 1. Request_in_callback() adds the new request to the service's request history. - 2. When a request buffer becomes idle, it is added to the service's request buffer history list. - 3. Buffers are culled from the service's request buffer history if it has grown above + + Request_in_callback() adds the new request to the service's request history. + + When a request buffer becomes idle, it is added to the service's request buffer history list. + + Buffers are culled from the service's request buffer history if it has grown above req_buffer_history_max and its reqs are removed from the service's request history. + Request history is accessed and controlled using the following /proc files under the service directory: req_buffer_history_len - - - + Number of request buffers currently in the history req_buffer_history_max - - - + Maximum number of request buffers to keep req_history - - - + The request history Requests in the history include "live" requests that are currently being handled. Each line in req_history looks like: @@ -792,10 +700,13 @@ sysctl -w lnet.debug=+malloc Then complete the following steps: + 1. Dump the log into a user-specified log file using lctl (see Using the lctl Tool to View Debug Messages). + 2. Run the leak finder on the newly-created log dump: perl leak_finder.pl <ascii-logname> + The output is: malloced 8bytes at a3116744 (called pathcopy) (lprocfs_status.c:lprocfs_add_vars:80) @@ -807,6 +718,5 @@ line 241)
-
-- 1.8.3.1