X-Git-Url: https://git.whamcloud.com/?a=blobdiff_plain;ds=sidebyside;f=LustreDebugging.xml;h=9938cb9c277cef663e0c644d01079335073278e0;hb=69ec928d39070d5836fdbaca1a2f62f2f378b63f;hp=32fd72ba00750f67700beac82f87029d57380d63;hpb=01116a68e1e017be5ee74e614429595cbb28ca6f;p=doc%2Fmanual.git diff --git a/LustreDebugging.xml b/LustreDebugging.xml index 32fd72b..9938cb9 100644 --- a/LustreDebugging.xml +++ b/LustreDebugging.xml @@ -1,4 +1,7 @@ - + + Debugging a Lustre File System This chapter describes tips and information to debug a Lustre file system, and includes the following sections: @@ -16,32 +19,53 @@
<indexterm><primary>debugging</primary></indexterm> Diagnostic and Debugging Tools - A variety of diagnostic and analysis tools are available to debug issues with the Lustre software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project. + A variety of diagnostic and analysis tools are available to debug + issues with the Lustre software. Some of these are provided in Linux + distributions, while others have been developed and are made available + by the Lustre project.
<indexterm> <primary>debugging</primary> <secondary>tools</secondary> </indexterm> Lustre Debugging Tools - The following in-kernel debug mechanisms are incorporated into the Lustre - software: + The following in-kernel debug mechanisms are incorporated into + the Lustre software: - Debug logs - A circular - debug buffer to which Lustre internal debug messages are written (in contrast to error - messages, which are printed to the syslog or console). Entries to the Lustre debug log - are controlled by the mask set by /proc/sys/lnet/debug. The log size - defaults to 5 MB per CPU but can be increased as a busy system will quickly overwrite 5 - MB. When the buffer fills, the oldest information is discarded. + Debug logs + - A circular debug buffer to which Lustre internal debug messages + are written (in contrast to error messages, which are printed to the + syslog or console). Entries in the Lustre debug log are controlled + by a mask set by lctl set_param debug=mask. + The log size defaults to 5 MB per CPU but can be increased as a + busy system will quickly overwrite 5 MB. When the buffer fills, + the oldest log records are discarded. - Debug daemon - The debug daemon controls logging of - debug messages. + + lctl get_param debug + - This shows the current debug mask used to delimit + the debugging information written out to the kernel debug logs. + + + + + lctl debug_kernel file + - Dump the Lustre kernel debug log to the specified + file as ASCII text for further debugging and analysis. + - /proc/sys/lnet/debug - - This file contains a mask that can be used to delimit the debugging - information written out to the kernel debug logs. + lctl set_param debug_mb=size + - This sets the maximum size of the in-kernel Lustre + debug buffer, in units of MiB. + + + + Debug daemon + - The debug daemon controls the continuous logging of debug + messages to a log file in userspace. The following tools are also provided with the Lustre software: @@ -49,9 +73,10 @@ Diagnostic and Debugging Tools lctl - - This tool is used with the debug_kernel option to manually dump the Lustre - debugging log or post-process debugging logs that are dumped automatically. For more - information about the lctl tool, see and - This tool is used with the debug_kernel option to + manually dump the Lustre debugging log or post-process debugging + logs that are dumped automatically. For more information about the + lctl tool, see and . @@ -127,9 +152,9 @@ Diagnostic and Debugging Tools packet inspection tool that allows debugging of information that was sent between the various Lustre nodes. This tool is built on top of tcpdump and can read packet dumps generated by - it. There are plug-ins available to dissassemble the LNET and + it. There are plug-ins available to dissassemble the LNet and Lustre protocols. They are located within the Lustre git repository + xl:href="http://git.whamcloud.com/">Lustre git repository under lustre/contrib/wireshark/. Installation instruction are included in that directory. See also Wireshark Website for @@ -240,7 +265,7 @@ Diagnostic and Debugging Tools trace - Entry/Exit markers + Function entry/exit markers @@ -248,7 +273,7 @@ Diagnostic and Debugging Tools dlmtrace - Locking-related information + Distributed locking-related information @@ -269,50 +294,58 @@ Diagnostic and Debugging Tools - ext2 + malloc - Anything from the ext2_debug + Memory allocation or free information - malloc + cache - Print malloc or free information + Cache-related information - cache + info - Cache-related information + Non-critical general information - info + dentry - General information + kernel namespace cache handling - ioctl + mmap - IOCTL-related information + Memory-mapped IO interface - blocks + page - Ext2 block allocation information + Page cache and bulk data transfers + + + + + info + + + Miscellaneous informational messages @@ -320,7 +353,15 @@ Diagnostic and Debugging Tools net - Networking + LNet network related debugging + + + + + console + + + Significant system events, printed to console @@ -328,63 +369,106 @@ Diagnostic and Debugging Tools warning -   + Significant but non-fatal exceptions, printed + to console - buffs + error -   + Critical error messages, printed to console - other + neterror -   + Significant LNet error messages - dentry + emerg -   + Fatal system errors, printed to console - portals + config - Entry/Exit markers + Configuration and setup, enabled by default - page + ha - Bulk page handling + Failover and recovery-related information, + enabled by default - error + hsm - Error messages + Hierarchical space management/tiering - emerg + ioctl -   + IOCTL-related information, enabled by default + + + + + layout + + + File layout handling (PFL, FLR, DoM) + + + + + lfsck + + + Filesystem consistency checking, enabled by + default + + + + + other + + + Miscellaneious other debug messages + + + + + quota + + + Space accounting and management + + + + + reada + + + Client readahead management @@ -392,15 +476,31 @@ Diagnostic and Debugging Tools rpctrace - For distributed debugging + Remote request/reply tracing and debugging + + + + + sec + + + Security, Kerberos, Shared Secret Key handling + + + + + snapshot + + + Filesystem snapshot management - ha + vfstrace - Failover and recovery-related information + Kernel VFS interface operations @@ -646,24 +746,29 @@ Debug log: 324 lines, 258 kept, 66 dropped. attributes for a new file or directory. The lfs getstripe command takes a Lustre filename as input and lists all the objects that form a part of this file. To obtain this information for the file - /mnt/lustre/frog in a Lustre file system, run: - $ lfs getstripe /mnt/lustre/frog + /mnt/testfs/frog in a Lustre file system, run: + $ lfs getstripe /mnt/testfs/frog lmm_stripe_count: 2 lmm_stripe_size: 1048576 +lmm_pattern: 1 +lmm_layout_gen: 0 lmm_stripe_offset: 2 obdidx objid objid group 2 818855 0xc7ea7 0 0 873123 0xd52a3 0 - The debugfs tool is provided in the e2fsprogs - package. It can be used for interactive debugging of an ldiskfs file - system. The debugfs tool can either be used to check status or modify - information in the file system. In a Lustre file system, all objects that belong to a file - are stored in an underlying ldiskfs file system on the OSTs. The file - system uses the object IDs as the file names. Once the object IDs are known, use the - debugfs tool to obtain the attributes of all objects from different - OSTs. - A sample run for the /mnt/lustre/frog file used in the above example is shown here: + The debugfs tool is provided in the + e2fsprogs package. It can be used for interactive + debugging of an ldiskfs file system. The + debugfs tool can either be used to check status or + modify information in the file system. In a Lustre file system, all + objects that belong to a file are stored in an underlying + ldiskfs file system on the OSTs. The file system + uses the object IDs as the file names. Once the object IDs are known, + use the debugfs tool to obtain the attributes of + all objects from different OSTs. + A sample run for the /mnt/testfs/frog file used + in the above example is shown here: $ debugfs -c -R "stat O/0/d$((818855 % 32))/818855" /dev/vgmyth/lvmythost2 debugfs 1.41.90.wc3 (28-May-2011) @@ -834,7 +939,7 @@ lctl> debug_kernel [filename] - Behaves similarly to CERROR(), but prints error messages for LNET if D_NETERR is set in the debug mask. This is appropriate for serious networking errors. Messages printed to the console are rate-limited. + Behaves similarly to CERROR(), but prints error messages for LNet if D_NETERR is set in the debug mask. This is appropriate for serious networking errors. Messages printed to the console are rate-limited. @@ -983,7 +1088,7 @@ lctl> debug_kernel [filename]
Accessing the <literal>ptlrpc</literal> Request History Each service maintains a request history, which can be useful for first occurrence troubleshooting. - ptlrpc is an RPC protocol layered on LNET that deals with stateful servers and has semantics and built-in support for recovery. + ptlrpc is an RPC protocol layered on LNet that deals with stateful servers and has semantics and built-in support for recovery. The ptlrpc request history works as follows: @@ -996,7 +1101,7 @@ lctl> debug_kernel [filename] Buffers are culled from the service request buffer history if it has grown above req_buffer_history_max and its reqs are removed from the service request history. - Request history is accessed and controlled using the following /proc files under the service directory: + Request history is accessed and controlled using the following parameters for each service: req_buffer_history_len @@ -1144,3 +1249,4 @@ freed 8bytes at a3116744 (called pathcopy)
+