From a239b0876f76e85a259765f2b47b1ddd588f1bcd Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Mon, 19 Nov 2018 17:01:47 -0700 Subject: [PATCH] LU-8066 misc: replace /proc with "lctl get/set_param" Replace most remaining references to /proc or /sys with equivalent "lctl get_param" or "lctl set_param" usage. Discourage direct use of /proc and /sys values from user scripts. The description of l_getidentity and associated parameters was duplicated in multiple places, differentiate the l_getidentity documentation from the parameter interface for identity_upcall and identity_info. Add in descriptions of the identity_expire and identity_acquire_expire parameters as well. Add better descriptions of the debug masks, adding in missing mask values. Replace use of sysctl for setting and getting the debug mask with lctl. Rename a few section labels to give them human-readable names. Signed-off-by: Andreas Dilger Change-Id: I69e27511563a948f2148d9d39480f83a0f67eca6 Reviewed-on: https://review.whamcloud.com/33686 Tested-by: Jenkins Reviewed-by: James Simmons Reviewed-by: Ben Evans Reviewed-by: Joseph Gmitter --- BenchmarkingTests.xml | 4 +- ConfiguringQuotas.xml | 8 +- InstallingLustre.xml | 5 +- LustreDebugging.xml | 199 ++++++++++++++++++++++++++--------- LustreNodemap.xml | 21 ++-- LustreOperations.xml | 25 +++-- LustreProc.xml | 220 +++++++++++++++++++-------------------- LustreProgrammingInterfaces.xml | 191 ++++++++++++--------------------- SystemConfigurationUtilities.xml | 28 +++-- 9 files changed, 370 insertions(+), 331 deletions(-) diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml index 83676b2..3830830 100644 --- a/BenchmarkingTests.xml +++ b/BenchmarkingTests.xml @@ -387,7 +387,7 @@ over all OSTs. On the server side, view the statistics at: - /proc/fs/lustre/obdecho/echo_srv/stats + lctl get_param obdecho.echo_srv.stats where echo_srv is the obdecho server created by the script. @@ -620,7 +620,7 @@ over all OSTs. numbers of I/Os in flight. It is also useful to monitor and record average disk I/O sizes during each test using the 'disk io size' histogram in the - file /proc/fs/lustre/obdfilter/*/brw_stats + file lctl get_param obdfilter.*.brw_stats (see for details). These numbers help identify problems in the system when full-sized I/Os are not submitted to the underlying disk. This may be caused by problems in diff --git a/ConfiguringQuotas.xml b/ConfiguringQuotas.xml index 046b484..a5292fe 100644 --- a/ConfiguringQuotas.xml +++ b/ConfiguringQuotas.xml @@ -485,8 +485,9 @@ Total allocated inode limit: 2560, total allocated block limit: 24576 Global quota limits are stored in dedicated index files (there is one such index per quota type) on the quota master target (aka QMT). The QMT - runs on MDT0000 and exports the global indexes via /proc. The global - indexes can thus be dumped via the following command: + runs on MDT0000 and exports the global indices via lctl + get_param. The global indices can thus be dumped via the + following command: # lctl get_param qmt.testfs-QMT0000.*.glb-* The format of global indexes depends on the OSD type. The ldiskfs OSD @@ -499,8 +500,7 @@ uses an IAM files while the ZFS OSD creates dedicated ZAPs. slave is disconnected, the index version is used to determine whether the slave copy of the global index isn't up to date any more. If so, the slave fetches the whole index again and updates the local copy. The slave copy of - the global index is also exported via /proc and can be accessed via the - following command: + the global index can also be accessed via the following command: lctl get_param osd-*.*.quota_slave.limit* diff --git a/InstallingLustre.xml b/InstallingLustre.xml index 294bdc1..103e427 100644 --- a/InstallingLustre.xml +++ b/InstallingLustre.xml @@ -331,9 +331,8 @@ xml:id="installinglustre"> Use the same user IDs (UID) and group IDs (GID) on all clients. If use of supplemental groups is required, see - for information about - supplementary user and group cache upcall ( - identity_upcall). + for information about + supplementary user and group cache upcall (identity_upcall). diff --git a/LustreDebugging.xml b/LustreDebugging.xml index 6d8d07b..66f1960 100644 --- a/LustreDebugging.xml +++ b/LustreDebugging.xml @@ -16,32 +16,53 @@
<indexterm><primary>debugging</primary></indexterm> Diagnostic and Debugging Tools - A variety of diagnostic and analysis tools are available to debug issues with the Lustre software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project. + A variety of diagnostic and analysis tools are available to debug + issues with the Lustre software. Some of these are provided in Linux + distributions, while others have been developed and are made available + by the Lustre project.
<indexterm> <primary>debugging</primary> <secondary>tools</secondary> </indexterm> Lustre Debugging Tools - The following in-kernel debug mechanisms are incorporated into the Lustre - software: + The following in-kernel debug mechanisms are incorporated into + the Lustre software: - Debug logs - A circular - debug buffer to which Lustre internal debug messages are written (in contrast to error - messages, which are printed to the syslog or console). Entries to the Lustre debug log - are controlled by the mask set by /proc/sys/lnet/debug. The log size - defaults to 5 MB per CPU but can be increased as a busy system will quickly overwrite 5 - MB. When the buffer fills, the oldest information is discarded. + Debug logs + - A circular debug buffer to which Lustre internal debug messages + are written (in contrast to error messages, which are printed to the + syslog or console). Entries in the Lustre debug log are controlled + by a mask set by lctl set_param debug=mask. + The log size defaults to 5 MB per CPU but can be increased as a + busy system will quickly overwrite 5 MB. When the buffer fills, + the oldest log records are discarded. - Debug daemon - The debug daemon controls logging of - debug messages. + + lctl get_param debug + - This shows the current debug mask used to delimit + the debugging information written out to the kernel debug logs. + + + + + lctl debug_kernel file + - Dump the Lustre kernel debug log to the specified + file as ASCII text for further debugging and analysis. + - /proc/sys/lnet/debug - - This file contains a mask that can be used to delimit the debugging - information written out to the kernel debug logs. + lctl set_param debug_mb=size + - This sets the maximum size of the in-kernel Lustre + debug buffer, in units of MiB. + + + + Debug daemon + - The debug daemon controls the continuous logging of debug + messages to a log file in userspace. The following tools are also provided with the Lustre software: @@ -49,9 +70,10 @@ Diagnostic and Debugging Tools lctl - - This tool is used with the debug_kernel option to manually dump the Lustre - debugging log or post-process debugging logs that are dumped automatically. For more - information about the lctl tool, see and - This tool is used with the debug_kernel option to + manually dump the Lustre debugging log or post-process debugging + logs that are dumped automatically. For more information about the + lctl tool, see and . @@ -240,7 +262,7 @@ Diagnostic and Debugging Tools trace - Entry/Exit markers + Function entry/exit markers @@ -248,7 +270,7 @@ Diagnostic and Debugging Tools dlmtrace - Locking-related information + Distributed locking-related information @@ -269,50 +291,58 @@ Diagnostic and Debugging Tools - ext2 + malloc - Anything from the ext2_debug + Memory allocation or free information - malloc + cache - Print malloc or free information + Cache-related information - cache + info - Cache-related information + Non-critical general information - info + dentry - General information + kernel namespace cache handling - ioctl + mmap - IOCTL-related information + Memory-mapped IO interface - blocks + page - Ext2 block allocation information + Page cache and bulk data transfers + + + + + info + + + Miscellaneous informational messages @@ -320,7 +350,15 @@ Diagnostic and Debugging Tools net - Networking + LNet network related debugging + + + + + console + + + Significant system events, printed to console @@ -328,63 +366,106 @@ Diagnostic and Debugging Tools warning -   + Significant but non-fatal exceptions, printed + to console - buffs + error -   + Critical error messages, printed to console - other + neterror -   + Significant LNet error messages - dentry + emerg -   + Fatal system errors, printed to console - portals + config - Entry/Exit markers + Configuration and setup, enabled by default - page + ha - Bulk page handling + Failover and recovery-related information, + enabled by default - error + hsm - Error messages + Hierarchical space management/tiering - emerg + ioctl -   + IOCTL-related information, enabled by default + + + + + layout + + + File layout handling (PFL, FLR, DoM) + + + + + lfsck + + + Filesystem consistency checking, enabled by + default + + + + + other + + + Miscellaneious other debug messages + + + + + quota + + + Space accounting and management + + + + + reada + + + Client readahead management @@ -392,15 +473,31 @@ Diagnostic and Debugging Tools rpctrace - For distributed debugging + Remote request/reply tracing and debugging + + + + + sec + + + Security, Kerberos, Shared Secret Key handling + + + + + snapshot + + + Filesystem snapshot management - ha + vfstrace - Failover and recovery-related information + Kernel VFS interface operations @@ -1001,7 +1098,7 @@ lctl> debug_kernel [filename] Buffers are culled from the service request buffer history if it has grown above req_buffer_history_max and its reqs are removed from the service request history. - Request history is accessed and controlled using the following /proc files under the service directory: + Request history is accessed and controlled using the following parameters for each service: req_buffer_history_len diff --git a/LustreNodemap.xml b/LustreNodemap.xml index f5ce501..0efd877 100644 --- a/LustreNodemap.xml +++ b/LustreNodemap.xml @@ -197,7 +197,7 @@ drwxr-xr-x 3 root root 4096 Jul 23 09:02 .. If UID 11002 or GID 11001 do not exist on the Lustre MDS or MGS, create them in LDAP or other data sources, or trust clients by setting identity_upcall to NONE. For more - information, see . + information, see . Building a larger and more complex configuration is possible by iterating through the lctl commands above. In @@ -366,19 +366,18 @@ oss# lctl set_param nodemap.add_nodemap_idmap='SiteName Verifying Settings - By using lctl nodemap_info all, existing - nodemap configuration is listed for easy export. This command - acts as a shortcut into the /proc interface for nodemap. - Within /proc/fs/lustre/nodemap/ on the Lustre MGS, the - file active contains a 1 if nodemap is active on the - system. Each policy group creates a directory containing the - following parameters: + By using lctl nodemap_info all, existing nodemap + configuration is listed for easy export. This command acts as a shortcut + into the configuration interface for nodemap. On the Lustre MGS, the + nodemap.active parameter contains a 1 + if nodemap is active on the system. Each policy group + creates a directory containing the following parameters: - admin and - trusted each contain a ‘1’ if the values - are set, and a ‘0’ otherwise. + admin and trusted each + contain a 1 if the values are set, and + 0 otherwise. diff --git a/LustreOperations.xml b/LustreOperations.xml index 7a6b82c..51daff3 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -579,8 +579,8 @@ mds# tunefs.lustre --erase-params --param= new_parameters The tunefs.lustre command can be used to set any parameter settable - in a /proc/fs/lustre file and that has its own OBD device, so it can be - specified as + via lctl conf_param and that has its own OBD device, + so it can be specified as obdname|fsname. obdtype. @@ -602,8 +602,8 @@ mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1 are active as long as the server or client is not shut down. Permanent parameters live through server and client reboots. - The lctl list_param command enables users to list all parameters - that can be set. See + The lctl list_param command enables users to + list all parameters that can be set. See . For more details about the @@ -618,7 +618,7 @@ mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1 /proc/{fs,sys}/{lnet,lustre}. The lctl set_param command uses this syntax: -lctl set_param [-n] +lctl set_param [-n] [-P] obdtype. obdname. proc_file_name= @@ -636,7 +636,7 @@ osc.myth-OST0004-osc.max_dirty_mb=32
Setting Permanent Parameters - Use the + Use lctl set_param -P or lctl conf_param command to set permanent parameters. In general, the lctl conf_param command can be used to specify any @@ -671,13 +671,12 @@ $ lctl conf_param testfs.sys.timeout=40
Setting Permanent Parameters with lctl set_param -P - Use the - lctl set_param -P to set parameters permanently. This - command must be issued on the MGS. The given parameter is set on every - host using - lctl upcall. Parameters map to items in - /proc/{fs,sys}/{lnet,lustre}. The - lctl set_param command uses this syntax: + The lctl set_param -P command can also + set parameters permanently. This command must be issued on the MGS. + The given parameter is set on every host using + lctl upcall. Parameters map to items in + /proc/{fs,sys}/{lnet,lustre}. The + lctl set_param command uses this syntax: lctl set_param -P obdtype. diff --git a/LustreProc.xml b/LustreProc.xml index 95cb9a3..64b3f9f 100644 --- a/LustreProc.xml +++ b/LustreProc.xml @@ -23,6 +23,10 @@ Typically, metrics are accessed via lctl get_param files and settings are changed by via lctl set_param. + While it is possible to access parameters in /proc + and /sys directly, the location of these parameters may + change between releases, so it is recommended to always use + lctl to access the parameters from userspace scripts. Some data is server-only, some data is client-only, and some data is exported from the client to the server and is thus duplicated in both locations. @@ -47,7 +51,7 @@ osc.testfs-OST0006-osc-ffff881071d5cc00 osc.testfs-OST0007-osc-ffff881071d5cc00 osc.testfs-OST0008-osc-ffff881071d5cc00 In this example, information about OST connections available - on a client is displayed (indicated by "osc"). + on a client is displayed (indicated by "osc"). @@ -92,7 +96,7 @@ osc.testfs-OST0000-osc-ffff881071d5cc00.rpc_stats Prepend the path with the appropriate directory component: - /{proc,sys}/{fs,sys}/{lustre,lnet} + /{proc,sys}/{fs,sys}/{lustre,lnet} For example, an lctl get_param command may look like @@ -121,7 +125,7 @@ osc.testfs-OST0001-osc-ffff881071d5cc00.uuid=594db456-0685-bd16-f59b-e72ee90e981 # hash ldlm_stats stats uuid
Identifying Lustre File Systems and Servers - Several /proc files on the MGS list existing + Several parameter files on the MGS list existing Lustre file systems and file system servers. The examples below are for a Lustre file system called testfs with one MDT and three OSTs. @@ -155,8 +159,7 @@ imperative_recovery_state: notify_count: 4 - To view the names of all live servers in the file system as listed in - /proc/fs/lustre/devices, enter: + To list all configured devices on the local node, enter: # lctl device_list 0 UP mgs MGS MGS 11 1 UP mgc MGC192.168.10.34@tcp 1f45bb57-d9be-2ddb-c0b0-5431a49226705 @@ -284,7 +287,7 @@ testfs-MDT0000 - mb_prealloc_table + prealloc_table A table of values used to preallocate space when a new request is received. By @@ -316,7 +319,7 @@ testfs-MDT0000 Buddy group cache information found in - /proc/fs/ldiskfs/disk_device/mb_groups may + /sys/fs/ldiskfs/disk_device/mb_groups may be useful for assessing on-disk fragmentation. For example:cat /proc/fs/ldiskfs/loop0/mb_groups #group: free free frags first pa [ 2^0 2^1 2^2 2^3 2^4 2^5 2^6 2^7 2^8 2^9 @@ -1856,23 +1859,26 @@ rpcs in flight rpcs % cum %
Interpreting Adaptive Timeout Information - Adaptive timeout information can be obtained from the timeouts - files in /proc/fs/lustre/*/ on each server and client using the - lctl command. To read information from a timeouts - file, enter a command similar to: + Adaptive timeout information can be obtained via + lctl get_param {osc,mdc}.*.timeouts files on each + client and lctl get_param {ost,mds}.*.*.timeouts + on each server. To read information from a + timeouts file, enter a command similar to: # lctl get_param -n ost.*.ost_io.timeouts -service : cur 33 worst 34 (at 1193427052, 0d0h26m40s ago) 1 1 33 2 - In this example, the ost_io service on this node is currently - reporting an estimated RPC service time of 33 seconds. The worst RPC service time was 34 - seconds, which occurred 26 minutes ago. - The output also provides a history of service times. Four "bins" of adaptive - timeout history are shown, with the maximum RPC time in each bin reported. In both the - 0-150s bin and the 150-300s bin, the maximum RPC time was 1. The 300-450s bin shows the - worst (maximum) RPC time at 33 seconds, and the 450-600s bin shows a maximum of RPC time - of 2 seconds. The estimated service time is the maximum value across the four bins (33 - seconds in this example). - Service times (as reported by the servers) are also tracked in the client OBDs, as - shown in this example: +service : cur 33 worst 34 (at 1193427052, 1600s ago) 1 1 33 2 + In this example, the ost_io service on this + node is currently reporting an estimated RPC service time of 33 + seconds. The worst RPC service time was 34 seconds, which occurred + 26 minutes ago. + The output also provides a history of service times. + Four "bins" of adaptive timeout history are shown, with the + maximum RPC time in each bin reported. In both the 0-150s bin and the + 150-300s bin, the maximum RPC time was 1. The 300-450s bin shows the + worst (maximum) RPC time at 33 seconds, and the 450-600s bin shows a + maximum of RPC time of 2 seconds. The estimated service time is the + maximum value in the four bins (33 seconds in this example). + Service times (as reported by the servers) are also tracked in + the client OBDs, as shown in this example: # lctl get_param osc.*.timeouts last reply : 1193428639, 0d0h00m00s ago network : cur 1 worst 2 (at 1193427053, 0d0h26m26s ago) 1 1 1 1 @@ -1881,10 +1887,11 @@ portal 28 : cur 1 worst 1 (at 1193426141, 0d0h41m38s ago) 1 1 1 1 portal 7 : cur 1 worst 1 (at 1193426141, 0d0h41m38s ago) 1 0 1 1 portal 17 : cur 1 worst 1 (at 1193426177, 0d0h41m02s ago) 1 0 0 1 - In this example, portal 6, the ost_io service portal, shows the - history of service estimates reported by the portal. - Server statistic files also show the range of estimates including min, max, sum, and - sumsq. For example: + In this example, portal 6, the ost_io service + portal, shows the history of service estimates reported by the portal. + + Server statistic files also show the range of estimates including + min, max, sum, and sum-squared. For example: # lctl get_param mdt.*.mdt.stats ... req_timeout 6 samples [sec] 1 10 15 105 @@ -2007,10 +2014,12 @@ req_timeout 6 samples [sec] 1 10 15 105 LNet proc Monitoring LNet - LNet information is located in /proc/sys/lnet in these files: + LNet information is located via lctl get_param + in these parameters: + - peers - Shows all NIDs known to this node and provides - information on the queue state. + peers - Shows all NIDs known to this node + and provides information on the queue state. Example: # lctl get_param peers nid refs state max rtr min tx min queue @@ -2250,18 +2259,18 @@ nid refs peer max tx min maximizes network balancing. The weighted allocator is used when any two OSTs are out of balance by more than a specified threshold. Free space distribution can be tuned using these two - /proc tunables: + tunable parameters: - qos_threshold_rr - The threshold at which + lod.*.qos_threshold_rr - The threshold at which the allocation method switches from round-robin to weighted is set in this file. The default is to switch to the weighted algorithm when any two OSTs are out of balance by more than 17 percent. - qos_prio_free - The weighting priority used - by the weighted allocator can be adjusted in this file. Increasing the - value of qos_prio_free puts more weighting on the + lod.*.qos_prio_free - The weighting priority + used by the weighted allocator can be adjusted in this file. Increasing + the value of qos_prio_free puts more weighting on the amount of free space available on each OST and less on how stripes are distributed across OSTs. The default value is 91 percent weighting for free space rebalancing and 9 percent for OST balancing. When the @@ -2269,14 +2278,14 @@ nid refs peer max tx min space and location is no longer used by the striping algorithm. - reserved_mb_low - The low - watermark used to stop object allocation if available space is less - than it. The default is 0.1 percent of total OST size. + osp.*.reserved_mb_low + - The low watermark used to stop object allocation if available space + is less than this. The default is 0.1% of total OST size. - reserved_mb_high - The high watermark used to start - object allocation if available space is more than it. The default is 0.2 percent of total - OST size. + osp.*.reserved_mb_high + - The high watermark used to start object allocation if available + space is more than this. The default is 0.2% of total OST size. For more information about monitoring and managing free space, see - For each service, an entry as shown below is - created:/proc/fs/lustre/service/*/threads_min|max|started + For each service, tunable parameters as shown below are available. + - To temporarily set this tunable, run: - # lctl get|set_param service.threads_min|max|started + To temporarily set these tunables, run: + # lctl set_param service.threads_min|max|started=num To permanently set this tunable, run: @@ -2480,8 +2489,8 @@ ost.OSS.ost_io.threads_max=256 To set the maximum thread count to 256 instead of 512 permanently, run: # lctl conf_param testfs.ost.ost_io.threads_max=256 - For version 2.5 or later, run: - # lctl set_param -P ost.OSS.ost_io.threads_max=256 + For version 2.5 or later, run: + # lctl set_param -P ost.OSS.ost_io.threads_max=256 ost.OSS.ost_io.threads_max=256 @@ -2503,79 +2512,69 @@ ost.OSS.ost_io.threads_max=256 proc debug Enabling and Interpreting Debugging Logs - By default, a detailed log of all operations is generated to aid in debugging. Flags that - control debugging are found in /proc/sys/lnet/debug. - The overhead of debugging can affect the performance of Lustre file system. Therefore, to - minimize the impact on performance, the debug level can be lowered, which affects the amount - of debugging information kept in the internal log buffer but does not alter the amount of - information to goes into syslog. You can raise the debug level when you need to collect logs - to debug problems. - The debugging mask can be set using "symbolic names". The symbolic format is - shown in the examples below. + By default, a detailed log of all operations is generated to aid in + debugging. Flags that control debugging are found via + lctl get_param debug. + The overhead of debugging can affect the performance of Lustre file + system. Therefore, to minimize the impact on performance, the debug level + can be lowered, which affects the amount of debugging information kept in + the internal log buffer but does not alter the amount of information to + goes into syslog. You can raise the debug level when you need to collect + logs to debug problems. + The debugging mask can be set using "symbolic names". The + symbolic format is shown in the examples below. + - To verify the debug level used, examine the sysctl that controls - debugging by running: - # sysctl lnet.debug -lnet.debug = ioctl neterror warning error emerg ha config console + To verify the debug level used, examine the parameter that + controls debugging by running: + # lctl get_param debug +debug= +ioctl neterror warning error emerg ha config console - To turn off debugging (except for network error debugging), run the following - command on all nodes concerned: + To turn off debugging except for network error debugging, run + the following command on all nodes concerned: # sysctl -w lnet.debug="neterror" -lnet.debug = neterror +debug=neterror - + + - To turn off debugging completely, run the following command on all nodes + To turn off debugging completely (except for the minimum error + reporting to the console), run the following command on all nodes concerned: - # sysctl -w lnet.debug=0 -lnet.debug = 0 - - - To set an appropriate debug level for a production environment, run: - # sysctl -w lnet.debug="warning dlmtrace error emerg ha rpctrace vfstrace" -lnet.debug = warning dlmtrace error emerg ha rpctrace vfstrace - The flags shown in this example collect enough high-level information to aid - debugging, but they do not cause any serious performance impact. + # lctl set_param debug=0 +debug=0 - - To clear all flags and set new flags, run: - # sysctl -w lnet.debug="warning" -lnet.debug = warning + To set an appropriate debug level for a production environment, + run: + # lctl set_param debug="warning dlmtrace error emerg ha rpctrace vfstrace" +debug=warning dlmtrace error emerg ha rpctrace vfstrace + The flags shown in this example collect enough high-level + information to aid debugging, but they do not cause any serious + performance impact. - + + - To add new flags to flags that have already been set, precede each one with a - "+": - # sysctl -w lnet.debug="+neterror +ha" -lnet.debug = +neterror +ha -# sysctl lnet.debug -lnet.debug = neterror warning ha + To add new flags to flags that have already been set, + precede each one with a "+": + # lctl set_param debug="+neterror +ha" +debug=+neterror +ha +# lctl get_param debug +debug=neterror warning error emerg ha console To remove individual flags, precede them with a "-": - # sysctl -w lnet.debug="-ha" -lnet.debug = -ha -# sysctl lnet.debug -lnet.debug = neterror warning - - - To verify or change the debug level, run commands such as the following: : - # lctl get_param debug -debug= -neterror warning -# lctl set_param debug=+ha -# lctl get_param debug -debug= -neterror warning ha -# lctl set_param debug=-warning -# lctl get_param debug -debug= -neterror ha + # lctl set_param debug="-ha" +debug=-ha +# lctl get_param debug +debug=neterror warning error emerg console - + + Debugging parameters include: @@ -2587,7 +2586,7 @@ neterror ha /tmp/lustre-log. - These parameters are also set using:sysctl -w lnet.debug={value} + These parameters can also be set using:sysctl -w lnet.debug={value} Additional useful parameters: panic_on_lbug - Causes ''panic'' to be called @@ -2623,11 +2622,12 @@ ost_set_info 1 obd_ping 212 Use the llstat utility to monitor statistics over time. To clear the statistics, use the -c option to - llstat. To specify how frequently the statistics should be reported (in - seconds), use the -i option. In the example below, the - -c option clears the statistics and -i10 option - reports statistics every 10 seconds: - $ llstat -c -i10 /proc/fs/lustre/ost/OSS/ost_io/stats + llstat. To specify how frequently the statistics + should be reported (in seconds), use the -i option. + In the example below, the -c option clears the + statistics and -i10 option reports statistics every + 10 seconds: +$ llstat -c -i10 ost_io /usr/bin/llstat: STATS on 06/06/07 /proc/fs/lustre/ost/OSS/ost_io/ stats on 192.168.16.35@tcp diff --git a/LustreProgrammingInterfaces.xml b/LustreProgrammingInterfaces.xml index 9765be2..cd3d44b 100644 --- a/LustreProgrammingInterfaces.xml +++ b/LustreProgrammingInterfaces.xml @@ -1,141 +1,109 @@ Programming Interfaces - This chapter describes public programming interfaces to that can be used to control various - aspects of a Lustre file system from userspace. This chapter includes the following - sections: + This chapter describes public programming interfaces to that can be + used to control various aspects of a Lustre file system from userspace. + This chapter includes the following sections: - + - + Lustre programming interface man pages are found in the lustre/doc folder. -
+
<indexterm> <primary>programming</primary> <secondary>upcall</secondary> </indexterm>User/Group Upcall - This section describes the supplementary user/group upcall, which allows the MDS to - retrieve and verify the supplementary groups to which a particular user is assigned. This - avoids the need to pass all the supplementary groups from the client to the MDS with every - RPC. + This section describes the supplementary user/group upcall, which + allows the MDS to retrieve and verify the supplementary groups to which + a particular user is assigned. This avoids the need to pass all the + supplementary groups from the client to the MDS with every RPC. - For information about universal UID/GID requirements in a Lustre file system - environment, see . + For information about universal UID/GID requirements in a Lustre + file system environment, see + .
Synopsis - The MDS uses the utility as specified by lctl get_param - mdt.${FSNAME}-MDT{xxxx}.identity_upcall to look up the supplied UID in order to - retrieve the user's supplementary group membership. The result is temporarily cached in the - kernel (for five minutes, by default) to avoid the overhead of calling into userspace - repeatedly. + The MDS uses the utility as specified by + lctl get_param mdt.${FSNAME}-MDT{xxxx}.identity_upcall + to look up the supplied UID in order to retrieve the user's supplementary + group membership. The result is temporarily cached in the kernel (for + five minutes, by default) to avoid the overhead of calling into + userspace repeatedly.
Description - The identity upcall file contains the path to an executable that is invoked to resolve a - numeric UID to a group membership list. This utility opens - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_info and fills in the - related identity_downcall_data data structure (see ). The data is persisted with lctl set_param - mdt.${FSNAME}-MDT{xxxx}.identity_info. - For a sample upcall program, see lustre/utils/l_getidentity.c in the Lustre source distribution. + The identity_upcall parameter contains the path + to an executable that is run to map a numeric UID to a group membership + list. This upcall executable opens the + mdt.${FSNAME}-MDT{xxxx}.identity_info parameter file + and writes the related identity_downcall_data data + structure (see ). The + upcall is configured with + lctl set_param mdt.${FSNAME}-MDT{xxxx}.identity_upcall. + The default identity upcall program installed is + lustre/utils/l_getidentity.c in the Lustre source + distribution.
Primary and Secondary Groups - The mechanism for the primary/secondary group is as follows: + The mechanism for the primary/secondary group is as follows: + - The MDS issues an upcall (set per MDS) to map the numeric UID to the supplementary - group(s). + The MDS issues an upcall (set per MDS) to map the numeric + UID to the supplementary group(s). - If there is no upcall or if there is an upcall and it fails, one supplementary - group at most will be added as supplied by the client. + If there is no upcall or if there is an upcall and it fails, + one supplementary group at most will be added as supplied by the + client. - The default upcall is /usr/sbin/l_getidentity, which can - interact with the user/group database to obtain UID/GID/suppgid. The user/group - database depends on how authentication is configured, such as local - /etc/passwd, Network Information Service (NIS), or Lightweight - Directory Access Protocol (LDAP). If necessary, the administrator can use a parse - utility to set - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_upcall. If the - upcall interface is set to NONE, then upcall is disabled. The MDS uses the - UID/GID/suppgid supplied by the client. + The default upcall /usr/sbin/l_getidentity + can interact with the user/group database on the MDS to map the + UID to the GID and supplementary GID. The user/group database + depends on how authentication is configured on the MDS, such as + local /etc/passwd, Network Information Service + (NIS), Lightweight Directory Access Protocol (LDAP), or SMB + Domain services, as configured. If the upcall interface is set + to NONE, then upcall is disabled, and the MDS uses only the UID, + GID, and one supplementary GID supplied by the client. - The default group upcall is set by mkfs.lustre. Use - tunefs.lustre --param or lctl set_param - mdt.${FSNAME}-MDT{xxxx}.identity_upcall={path} + The MDS will wait a limited time for the group upcall program + to complete, to avoid MDS threads and clients hanging due to + errors accessing a remote service node. The upcall must finish + within 30s before the MDS will continue without the supplementary + data. The upcall timeout can be set on the MDS using: + lctl set_param mdt.*.identity_acquire_expire=seconds - - - A Lustre file system administrator can specify permissions for a specific UID by - configuring /etc/lustre/perm.conf on the MDS. The - /usr/sbin/l_getidentity utility parses - /etc/lustre/perm.conf to obtain the permission mask for a specified - UID. - The permission file format - is:{nid} {uid} {perms}An - asterisk (*) in the nid column or - uid column matches any NID or UID respectively. When '*' is - specified as the NID, it is used for the default permissions for all NIDS, unless - permissions are specified for a particular NID. In this case the specified permissions - take precedence for that particular NID. Valid values for - perms are: - - setuid/setgid/setgrp/XXX - - enables the corresponding perm - - - nosetuid/nosetgid/nosetgrp/noXXX - - disables the corresponding perm - - Permissions can be specified in a comma-separated list. When a - perm and a noperm permission are - listed in the same line, the noperm permission takes - precedence. When they are listed on separate lines, the permission that appears later - takes precedence. - - The /usr/sbin/l_getidentity utility can parse - /etc/lustre/perm.conf to obtain the permission mask for the - specified UID. + The default group upcall is set permanently by + mkfs.lustre. To set a custom upcall for a + particular filesystem, use + tunefs.lustre --param or + lctl set_param -P mdt.FSNAME-MDTxxxx.identity_upcall=path - To avoid repeated upcalls, the MDS caches supplemental group information. Use - lctl set_param mdt.*.identity_expire=<seconds> to set the - cache time. The default cache time is 600 seconds. The kernel waits for the upcall to - complete (at most, 5 seconds) and takes the "failure" behavior as described. - Set the wait time using lctl set_param - mdt.*.identity_acquire_expire=<seconds> to change the length of time - that the kernel waits for the upcall to finish. Note that the client process is - blocked during this time. The default wait time is 15 seconds. - Cached entries are flushed using lctl set_param - mdt.${FSNAME}-MDT{xxxx}.identity_flush=0. + The group downcall data is cached by the kernel to avoid + repeated upcalls for the same user slowing down the MDS. This + cache is expired from the kernel after 1200s (20 minutes) by + default. The cache age can be set on the MDS using: + lctl set_param mdt.*.identity_expire=seconds
-
- Parameters - - - Name of the MDS service - - - Numeric UID - - -
-
+
Data Structures - struct perm_downcall_data{ + struct perm_downcall_data { __u64 pdd_nid; __u32 pdd_perm; __u32 pdd_padding; @@ -147,33 +115,4 @@ struct identity_downcall_data{ :
-
- <indexterm><primary>programming</primary><secondary>l_getidentity</secondary></indexterm><literal>l_getidentity</literal> Utility - The l_getidentity utility handles the Lustre supplementary group upcall - by default as described in the previous section. -
- Synopsis - l_getidentity ${FSNAME}-MDT{xxxx} {uid} -
-
- Description - The identity upcall file contains the path to an executable that is invoked to resolve a - numeric UID to a group membership list. This utility opens - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_info, completes the - identity_downcall_data data structure (see ) and writes the data to the - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_info pseudo file. The - data is persisted with lctl set_param - mdt.${FSNAME}-MDT{xxxx}.identity_info. - l_getidentity is the reference implementation of the user/group cache - upcall. -
-
- Files - - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_upcall - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_info - -
-
diff --git a/SystemConfigurationUtilities.xml b/SystemConfigurationUtilities.xml index 44a6426..d4a1aaf 100644 --- a/SystemConfigurationUtilities.xml +++ b/SystemConfigurationUtilities.xml @@ -7,7 +7,7 @@ - + @@ -130,23 +130,29 @@
-
+
<indexterm><primary>l_getidentity</primary></indexterm> l_getidentity - The l_getidentity utility handles Lustre user / group cache upcall. + The l_getidentity tool normally handles Lustre user/group mapping + upcall.
Synopsis - l_getidentity ${FSNAME}-MDT{xxxx} {uid} + l_getidentity { $FSNAME-MDT{xxxx}| -d} {uid}
Description - The group upcall file contains the path to an executable file that is invoked to resolve - a numeric UID to a group membership list. This utility opens - /proc/fs/lustre/mdt/${FSNAME}-MDT{xxxx}/identity_info and writes the - related identity_downcall_data structure (see .) The data is persisted with lctl set_param - mdt.${FSNAME}-MDT{xxxx}.identity_info. - The l_getidentity utility is the reference implementation of the user or group cache upcall. + The l_getidentity utility is called from the + MDS to map a numeric UID value into the list of supplementary group + values for that UID, and writes this into the + mdt.*.identity_info parameter file. The list of + supplementary groups is cached in the kernel to avoid repeated + upcalls. See for more + details. + The l_getidentity utility can also be run + directly for debugging purposes to ensure that the UID mapping for a + particular user is configured correctly, by using the + -d argument instead of the MDT name. +
Options -- 1.8.3.1