From: Andreas Dilger Date: Sat, 12 Jan 2013 01:40:27 +0000 (-0700) Subject: LUDOC-117 style: use in example text X-Git-Tag: 2.4.0~25 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=001111c84adebdce8985f9a26dba549b368d78f3;p=doc%2Fmanual.git LUDOC-117 style: use in example text Use the tag and {} instead of and <> for user-supplied text. The XML markup is intended to be semantic, so is more correct to use. The use of <> makes the XML ugly, since it needs to be written as >< in the .xml files. Some incorrect usages of were corrected. Signed-off-by: Andreas Dilger Signed-off-by: Richard Henwood Change-Id: I8edb59a100e20573e06f11ce10fe1642d45fbd4f Reviewed-on: http://review.whamcloud.com/5006 Tested-by: Hudson --- diff --git a/BackupAndRestore.xml b/BackupAndRestore.xml index d38bedd..0724bd9 100644 --- a/BackupAndRestore.xml +++ b/BackupAndRestore.xml @@ -72,7 +72,7 @@ - --source=<src> + --source=src The path to the root of the Lustre file system (source) which will be synchronized. This is a mandatory option if a valid status log created during a previous synchronization operation (--statuslog) is not specified. @@ -80,7 +80,7 @@ - --target=<tgt> + --target=tgt The path to the root where the source file system will be synchronized (target). This is a mandatory option if the status log created during a previous synchronization operation (--statuslog) is not specified. This option can be repeated if multiple synchronization targets are desired. @@ -88,7 +88,7 @@ - --mdt=<mdt> + --mdt=mdt The metadata device to be synchronized. A changelog user must be registered for this device. This is a mandatory option if a valid status log created during a previous synchronization operation (--statuslog) is not specified. @@ -96,7 +96,7 @@ - --user=<user id> + --user=userid The changelog user ID for the specified MDT. To use lustre_rsync, the changelog user must be registered. For details, see the changelog_register parameter in (lctl). This is a mandatory option if a valid status log created during a previous synchronization operation (--statuslog) is not specified. @@ -104,7 +104,7 @@ - --statuslog=<log> + --statuslog=log A log file to which synchronization status is saved. When the lustre_rsync utility starts, if the status log from a previous synchronization operation is specified, then the state is read from the log and otherwise mandatory --source, --target and --mdt options can be skipped. Specifying the --source, --target and/or --mdt options, in addition to the --statuslog option, causes the specified parameters in the status log to be overridden. Command line options take precedence over options in the status log. @@ -112,7 +112,7 @@ - --xattr <yes|no> + --xattr yes|no Specifies whether extended attributes (xattrs) are synchronized or not. The default is to synchronize extended attributes. diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml index 728e908..8f17911 100644 --- a/BenchmarkingTests.xml +++ b/BenchmarkingTests.xml @@ -132,16 +132,16 @@ File containing standard output data (same as stdout) - ${rslt}_<date/time>.summary + rslt_date_time.summary Temporary (tmp) files - ${rslt}_<date/time>_* + rslt_date_time_* Collected tmp files for post-mortem - ${rslt}_<date/time>.detail + rslt_date_time.detail @@ -189,7 +189,7 @@ Network - In this mode, the Lustre client generates I/O requests over the network but these requests are not sent to the OST file system. The OSS node runs the obdecho server to receive the requests but discards them before they are sent to the disk. - Pass the parameters case=network and target=<hostname|IP_of_server> to the script. For each network case, the script does the required setup. + Pass the parameters case=network and target=hostname|IP_of_server to the script. For each network case, the script does the required setup. For more details, see @@ -252,9 +252,9 @@ List all OSTs you want to test. - Use the target=parameter to list the OSTs separated by spaces. List the individual OSTs by name using the format - <fsname>-<OSTnumber> - (for example, lustre-OST0001). You do not have to specify an MDS or LOV. + Use the target=parameter to list the OSTs separated by spaces. List the individual OSTs by name using the format + fsname-OSTnumber + (for example, lustre-OST0001). You do not have to specify an MDS or LOV. Run the obdfilter_survey script with the target=parameter. @@ -283,16 +283,15 @@ lctl dl - Run the obdfilter_survey script with the parameters case=network and targets=<hostname|ip_of_server>. For example: + Run the obdfilter_survey script with the parameters case=network and targets=hostname|ip_of_server. For example: $ nobjhi=2 thrhi=2 size=1024 targets="oss0 oss1" \ case=network sh odbfilter-survey On the server side, view the statistics at: - /proc/fs/lustre/obdecho/<echo_srv>/stats - where - <echo_srv> - is the obdecho server created by the script. + /proc/fs/lustre/obdecho/echo_srv/stats + where echo_srv + is the obdecho server created by the script. @@ -338,10 +337,10 @@ List all OSCs you want to test. - Use the target=parameter to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format <fsname>-<OST_name>-osc-<OSC_number> (for example, lustre-OST0000-osc-ffff88007754bc00). You do not have to specify an MDS or LOV. + Use the target=parameter to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format fsname-OST_name-osc-instance (for example, lustre-OST0000-osc-ffff88007754bc00). You do not have to specify an MDS or LOV. - Run the obdfilter_survey script with the target=parameter and case=netdisk. + Run the obdfilter_survey script with the target=osc and case=netdisk. An example of a local test run with up to two objects (nobjhi), up to two threads (thrhi), and 1024 Mb (size) transfer size is shown below: $ nobjhi=2 thrhi=2 size=1024 \ targets="lustre-OST0000-osc-ffff88007754bc00 \ @@ -779,7 +778,7 @@ performanceTesting MDS Performance (mds-survey<
Using <literal>stats-collect</literal> The stats-collect utility is configured by including profiling configuration variables in the config.sh script. Each configuration variable takes the following form, where 0 indicates statistics are to be collected only when the script starts and stops and n indicates the interval in seconds at which statistics are to be collected: - <statistic>_INTERVAL=[0|n] + statistic_INTERVAL=0|n Statistics that can be collected include: @@ -821,10 +820,9 @@ performanceTesting MDS Performance (mds-survey< Stop collecting statistics on each node, clean up the temporary file, and create a profiling tarball. Enter: - sh gather_stats_everywhere.sh config.sh stop <log_name.tgz> - When - <log_name.tgz> - is specified, a profile tarball /tmp/<log_name.tgz> is created. + sh gather_stats_everywhere.sh config.sh stop log_name.tgz + When log_name.tgz + is specified, a profile tarball /tmp/log_name.tgz is created. Analyze the collected statistics and create a csv tarball for the specified profiling data. diff --git a/ConfigurationFilesModuleParameters.xml b/ConfigurationFilesModuleParameters.xml index ba11c86..d521802 100644 --- a/ConfigurationFilesModuleParameters.xml +++ b/ConfigurationFilesModuleParameters.xml @@ -203,7 +203,7 @@ forwarding ("") The acceptor is a TCP/IP service that some LNDs use to establish communications. If a local network requires it and it has not been disabled, the acceptor listens on a single port for connection requests that it redirects to the appropriate local network. The acceptor is part of the LNET module and configured by the following options: - secure - Accept connections only from reserved TCP ports (< 1023). + secure - Accept connections only from reserved TCP ports (below 1023). all - Accept connections from any TCP port. @@ -428,7 +428,7 @@ forwarding ("") TX Descriptors The ptllnd has a pool of so-called "tx descriptors", which it uses not only for outgoing messages, but also to hold state for bulk transfers requested by incoming messages. This pool should scale with the total number of peers. To enable the building of the Portals LND (ptllnd.ko) configure with this option: - ./configure --with-portals=<path-to-portals-headers> + ./configure --with-portals=/path/to/portals/headers @@ -577,7 +577,7 @@ forwarding ("") cksum - Enables small message (< 4 KB) checksums if set to a non-zero value. + Enables small message (below 4 KB) checksums if set to a non-zero value. @@ -617,7 +617,7 @@ forwarding ("") polling - Use zero (0) to block (wait). A value > 0 will poll that many times before blocking. + Use zero (0) to block (wait). A value greater than 0 will poll that many times before blocking. diff --git a/ConfiguringLNET.xml b/ConfiguringLNET.xml index a0fe27d..48e49b8 100644 --- a/ConfiguringLNET.xml +++ b/ConfiguringLNET.xml @@ -42,7 +42,7 @@ Overview of LNET Module Parameters LNET kernel module (lnet) parameters specify how LNET is to be configured to work with Lustre, including which NICs will be configured to work with Lustre and the routing to be used with Lustre. Parameters for LNET can be specified in the /etc/modprobe.d/lustre.conf file. In some cases the parameters may have been stored in /etc/modprobe.conf, but this has been deprecated since before RHEL5 and SLES10, and having a separate /etc/modprobe.d/lustre.conf file simplifies administration and distribution of the Lustre networking configuration. This file contains one or more entries with the syntax: - options lnet <parameter>=<parameter value> + options lnet parameter=value To specify the network interfaces that are to be used for Lustre, set either the networks parameter or the ip2nets parameter (only one of these parameters can be used at a time): @@ -68,7 +68,7 @@
<indexterm><primary>LNET</primary><secondary>using NID</secondary></indexterm>Using a Lustre Network Identifier (NID) to Identify a Node A Lustre network identifier (NID) is used to uniquely identify a Lustre network endpoint by node ID and network type. The format of the NID is: - <networkid>@<networktype> + network_id@network_type Examples are: 10.67.73.200@tcp0 10.67.75.100@o2ib @@ -77,13 +77,13 @@ To determine the appropriate NID to specify in the mount command, use the lctl command. To display MDS NIDs, run on the MDS : lctl list_nids To determine if a client can reach the MDS using a particular NID, run on the client: - lctl which_nid <MDS NID> + lctl which_nid MDS_NID
<indexterm><primary>LNET</primary><secondary>module parameters</secondary></indexterm>Setting the LNET Module networks Parameter If a node has more than one network interface, you'll typically want to dedicate a specific interface to Lustre. You can do this by including an entry in the lustre.conf file on the node that sets the LNET module networks parameter: - options lnet networks=<comma-separated list of networks> + options lnet networks=comma-separated list of networks This example specifies that a Lustre node will use a TCP/IP interface and an InfiniBand interface: options lnet networks=tcp0(eth0),o2ib(ib0) This example specifies that the Lustre node will use the TCP/IP interface eth1: @@ -166,7 +166,7 @@ tcp0 192.168.0.*; o2ib0 132.6.[1-3].[2-8/2]"' <indexterm><primary>LNET</primary><secondary>routes</secondary></indexterm>Setting the LNET Module routes Parameter The LNET module routes parameter is used to identify routers in a Lustre configuration. These parameters are set in modprobe.conf on each Lustre node. The LNET routes parameter specifies a colon-separated list of router definitions. Each route is defined as a network number, followed by a list of routers: - routes=<net type> <router NID(s)> + routes=net_type router_NID(s) This example specifies bi-directional routing in which TCP clients can reach Lustre resources on the IB networks and IB servers can access the TCP networks: options lnet 'ip2nets="tcp0 192.168.0.*; \ o2ib0(ib0) 132.6.1.[1-128]"' 'routes="tcp0 132.6.1.[1-8]@o2ib0; \ @@ -194,7 +194,7 @@ lctl network configure <indexterm><primary>LNET</primary><secondary>route checker</secondary></indexterm>Configuring the Router Checker In a Lustre configuration in which different types of networks, such as a TCP/IP network and an Infiniband network, are connected by routers, a router checker can be run on the clients and servers in the routed configuration to monitor the status of the routers. In a multi-hop routing configuration, router checkers can be configured on routers to monitor the health of their next-hop routers. A router checker is configured by setting lnet parameters in lustre.conf by including an entry in this form: - options lnet <router checker parameter>=<parameter value> + options lnet router_checker_parameter=value The router checker parameters are: diff --git a/ConfiguringLustre.xml b/ConfiguringLustre.xml index a6ee935..6d4d443 100644 --- a/ConfiguringLustre.xml +++ b/ConfiguringLustre.xml @@ -25,7 +25,7 @@ . For more information about hardware requirements, see . - Downloaded and installed the Lustre software. For more information about preparing for and installing the Lustre software, see . + Downloaded and installed the Lustre software. For more information about preparing for and installing the Lustre software, see . The following optional steps should also be completed, if needed, before the Lustre software is configured: @@ -53,29 +53,29 @@ Create a combined MGS/MDT file system on a block device. On the MDS node, run: - mkfs.lustre --fsname=<fsname> --mgs --mdt --index=0 <block device name> + mkfs.lustre --fsname=fsname --mgs --mdt --index=0 /dev/block_device The default file system name (fsname) is lustre. If you plan to create multiple file systems, the MGS should be created separately on its own dedicated block device, by running: - mkfs.lustre --fsname=<fsname> --mgs <block device name> + mkfs.lustre --fsname=fsname --mgs /dev/block_device See for more details. Optional for Lustre 2.4 and later. Add in additional MDTs. - mkfs.lustre --fsname=<fsname> --mgsnode=<nid> --mdt --index=1 <block device name> + mkfs.lustre --fsname=fsname --mgsnode=nid --mdt --index=1 /dev/block_device Up to 4095 additional MDTs can be added. Mount the combined MGS/MDT file system on the block device. On the MDS node, run: - mount -t lustre <block device name> <mount point> + mount -t lustre /dev/block_device /mount_point If you have created and MGS and an MDT on separate block devices, mount them both. Create the OST. On the OSS node, run: - mkfs.lustre --fsname=<fsname> --mgsnode=<NID> --ost --index=<OST index> <block device name> + mkfs.lustre --fsname=fsname --mgsnode=MGS_NID --ost --index=OST_index /dev/block_device When you create an OST, you are formatting a ldiskfs file system on a block storage device like you would with any local file system. You can have as many OSTs per OSS as the hardware or drivers allow. For more information about storage and memory requirements for a Lustre file system, see . You can only configure one OST per block device. You should create an OST that uses the raw block device and does not use partitioning. @@ -87,7 +87,7 @@ Mount the OST. On the OSS node where the OST was created, run: - mount -t lustre <block device name> <mount point> + mount -t lustre /dev/block_device /mount_point To create additional OSTs, repeat Step and Step , specifying the next higher OST index number. @@ -95,7 +95,7 @@ Mount the Lustre file system on the client. On the client node, run: - mount -t lustre <MGS node>:/<fsname> <mount point> + mount -t lustre MGS_node:/fsname /mount_point To create additional clients, repeat Step . diff --git a/ConfiguringQuotas.xml b/ConfiguringQuotas.xml index b54c793..d31f1ea 100644 --- a/ConfiguringQuotas.xml +++ b/ConfiguringQuotas.xml @@ -77,8 +77,8 @@ The ldiskfs OSD relies on the standard Linux quota to maintain accounting information on disk. As a consequence, the Linux kernel running on the Lustre servers using ldiskfs backend must have CONFIG_QUOTA, CONFIG_QUOTACTL and CONFIG_QFMT_V2 enabled. - As of Lustre 2.4.0, quota enforcement is thus turned on/off independently of space accounting which is always enabled. lfs quota<on|off> as well as the per-target quota_type parameter are deprecated in favor of a single per-filesystem quota parameter controlling inode/block quota enforcement. Like all permanent parameters, this quota parameter can be set via lctl conf_param on the MGS via the following syntax: - lctl conf_param ${FSNAME}.quota.<ost|mdt>=<u|g|ug|none> + As of Lustre 2.4.0, quota enforcement is thus turned on/off independently of space accounting which is always enabled. lfs quotaon|off as well as the per-target quota_type parameter are deprecated in favor of a single per-filesystem quota parameter controlling inode/block quota enforcement. Like all permanent parameters, this quota parameter can be set via lctl conf_param on the MGS via the following syntax: + lctl conf_param fsname.quota.ost|mdt=u|g|ug|none ost -- to configure block quota managed by OSTs @@ -170,11 +170,11 @@ group uptodate: glb[1],slv[1],reint[0] Usage: - lfs quota [-q] [-v] [-o obd_uuid] [-u|-g <uname>|uid|gname|gid>] <filesystem> -lfs quota -t <-u|-g> <filesystem> -lfs setquota <-u|--user|-g|--group> <username|groupname> [-b <block-softlimit>] \ - [-B <block-hardlimit>] [-i <inode-softlimit>] \ - [-I <inode-hardlimit>] <filesystem> + lfs quota [-q] [-v] [-o obd_uuid] [-u|-g uname|uid|gname|gid] /mount_point +lfs quota -t -u|-g /mount_point +lfs setquota -u|--user|-g|--group username|groupname [-b block-softlimit] \ + [-B block_hardlimit] [-i inode_softlimit] \ + [-I inode_hardlimit] /mount_point To display general quota information (disk usage and limits) for the user running the command and his primary group, run: $ lfs quota /mnt/testfs To display general quota information for a specific user ("bob" in this example), run: diff --git a/ConfiguringStorage.xml b/ConfiguringStorage.xml index b0d1cf6..d9c8426 100644 --- a/ConfiguringStorage.xml +++ b/ConfiguringStorage.xml @@ -69,47 +69,44 @@ <indexterm><primary>storage</primary><secondary>configuring</secondary><tertiary>RAID options</tertiary></indexterm>Formatting Options for RAID Devices When formatting a file system on a RAID device, it is beneficial to ensure that I/O requests are aligned with the underlying RAID geometry. This ensures that the Lustre RPCs do not generate unnecessary disk operations which may reduce performance dramatically. Use the --mkfsoptions parameter to specify additional parameters when formatting the OST or MDT. For RAID 5, RAID 6, or RAID 1+0 storage, specifying the following option to the --mkfsoptions parameter option improves the layout of the file system metadata, ensuring that no single disk contains all of the allocation bitmaps: - -E stride = <chunk_blocks> - The <chunk_blocks> variable is in units of 4096-byte blocks and represents the amount of contiguous data written to a single disk before moving to the next disk. This is alternately referred to as the RAID stripe size. This is applicable to both MDT and OST file systems. + -E stride = chunk_blocks + The chunk_blocks variable is in units of 4096-byte blocks and represents the amount of contiguous data written to a single disk before moving to the next disk. This is alternately referred to as the RAID stripe size. This is applicable to both MDT and OST file systems. For more information on how to override the defaults while formatting MDT or OST file systems, see .
<indexterm><primary>storage</primary><secondary>configuring</secondary><tertiary>for mkfs</tertiary></indexterm>Computing file system parameters for mkfs - For best results, use RAID 5 with 5 or 9 disks or RAID 6 with 6 or 10 disks, each on a different controller. The stripe width is the optimal minimum I/O size. Ideally, the RAID configuration should allow 1 MB Lustre RPCs to fit evenly on a single RAID stripe without an expensive read-modify-write cycle. Use this formula to determine the - <stripe_width> - , where - <number_of_data_disks> - does not include the RAID parity disks (1 for RAID 5 and 2 for RAID 6): - <stripe_width_blocks> = <chunk_blocks> * <number_of_data_disks> = 1 MB - If the RAID configuration does not allow - <chunk_blocks> - to fit evenly into 1 MB, select - <chunkblocks> - - <stripe_width_blocks> - , such that is close to 1 MB, but not larger. - The - <stripe_width_blocks> - value must equal - <chunk_blocks>*<number_of_data_disks> - . Specifying the - <stripe_width_blocks> - parameter is only relevant for RAID 5 or RAID 6, and is not needed for RAID 1 plus 0. + For best results, use RAID 5 with 5 or 9 disks or RAID 6 with 6 or 10 disks, each on a different controller. The stripe width is the optimal minimum I/O size. Ideally, the RAID configuration should allow 1 MB Lustre RPCs to fit evenly on a single RAID stripe without an expensive read-modify-write cycle. Use this formula to determine the + stripe_width, where + number_of_data_disks + does not include the RAID parity disks (1 for RAID 5 and 2 for RAID 6): + stripe_width_blocks = chunk_blocks * number_of_data_disks = 1 MB + If the RAID configuration does not allow + chunk_blocks + to fit evenly into 1 MB, select + stripe_width_blocks, + such that is close to 1 MB, but not larger. + The + stripe_width_blocks + value must equal + chunk_blocks * number_of_data_disks. + Specifying the + stripe_width_blocks + parameter is only relevant for RAID 5 or RAID 6, and is not needed for RAID 1 plus 0. Run --reformat on the file system device (/dev/sdc), specifying the RAID geometry to the underlying ldiskfs file system, where: - --mkfsoptions "<other options> -E stride=<chunk_blocks>, stripe_width=<stripe_width_blocks>" + --mkfsoptions "other_options -E stride=chunk_blocks, stripe_width=stripe_width_blocks" - A RAID 6 configuration with 6 disks has 4 data and 2 parity disks. The - <chunk_blocks> - <= 1024KB/4 = 256KB. + A RAID 6 configuration with 6 disks has 4 data and 2 parity disks. The + chunk_blocks + <= 1024KB/4 = 256KB. Because the number of data disks is equal to the power of 2, the stripe width is equal to 1 MB. - --mkfsoptions "<other options> -E stride=<chunk_blocks>, stripe_width=<stripe_width_blocks>"... + --mkfsoptions "other_options -E stride=chunk_blocks, stripe_width=stripe_width_blocks"...
<indexterm><primary>storage</primary><secondary>configuring</secondary><tertiary>external journal</tertiary></indexterm>Choosing Parameters for an External Journal If you have configured a RAID array and use it directly as an OST, it contains both data and metadata. For better performance, we recommend putting the OST journal on a separate device, by creating a small RAID 1 array and using it as an external journal for the OST. Lustre's default journal size is 400 MB. A journal size of up to 1 GB has shown increased performance but diminishing returns are seen for larger journals. Additionally, a copy of the journal is kept in RAM. Therefore, make sure you have enough memory available to hold copies of all the journals. The file system journal options are specified to mkfs.luster using the --mkfsoptions parameter. For example: - --mkfsoptions "<other options> -j -J device=/dev/mdJ" + --mkfsoptions "other_options -j -J device=/dev/mdJ" To create an external journal, perform these steps for each OST on the OSS: @@ -118,10 +115,10 @@ Create a journal device on the partition. Run: - [oss#] mke2fs -b 4096 -O journal_dev /dev/sdb <journal_size> - The value of - <journal_size> - is specified in units of 4096-byte blocks. For example, 262144 for a 1 GB journal size. + oss# mke2fs -b 4096 -O journal_dev /dev/sdb journal_size + The value of + journal_size + is specified in units of 4096-byte blocks. For example, 262144 for a 1 GB journal size. Create the OST. diff --git a/InstallingLustre.xml b/InstallingLustre.xml index 51e4ab3..2b163c2 100644 --- a/InstallingLustre.xml +++ b/InstallingLustre.xml @@ -89,7 +89,7 @@ Required Software   - kernel-<ver>_lustre.<arch> + kernel-ver_lustre.arch Lustre patched server kernel. @@ -120,7 +120,7 @@ Required Software   - lustre-modules-<ver> + lustre-modules-ver For Lustre-patched kernel. @@ -137,7 +137,7 @@ Required Software   - lustre-client-modules-<ver> + lustre-client-modules-ver For clients. @@ -168,7 +168,7 @@ Required Software   - lustre-<ver> + lustre-ver Lustre utilities package. This includes userspace utilities to configure and run Lustre. @@ -188,7 +188,7 @@ Required Software   - lustre-client-<ver> + lustre-client-ver Lustre utilities for clients. @@ -205,7 +205,7 @@ Required Software   - lustre-ldiskfs-<ver> + lustre-ldiskfs-ver Lustre-patched backing file system kernel module package for the ldiskfs file system. @@ -223,7 +223,7 @@ Required Software   - e2fsprogs-<ver> + e2fsprogs-ver Utilities package used to maintain the ldiskfs backing file system. @@ -361,8 +361,8 @@ Environmental Requirements It is not recommended that you use the rpm -Uvh command to install a kernel, because this may leave you with an unbootable system if the new kernel doesn't work for some reason. For example, the command in the following example would install required packages on a server with Infiniband networking - $ rpm -ivh kernel-<ver>_lustre-<ver> kernel-ib-<ver> \ -lustre-modules-<ver> lustre-ldiskfs-<ver> + $ rpm -ivh kernel-ver_lustre-ver kernel-ib-ver \ +lustre-modules-ver lustre-ldiskfs-ver @@ -382,17 +382,17 @@ lustre-modules-<ver> lustre-ldiskfs-<ver> Install the utilities/userspace packages. Use the rpm -ivh command to install the utilities packages. For example: - $ rpm -ivh lustre-<ver> + $ rpm -ivh lustre-ver Install the e2fsprogs package. Use the rpm -ivh command to install the e2fsprogs package. For example: - $ rpm -ivh e2fsprogs-<ver> + $ rpm -ivh e2fsprogs-ver If e2fsprogs is already installed on your Linux system, install the Lustre-specific e2fsprogs version by using rpm -Uvh to upgrade the existing e2fsprogs package. For example: - $ rpm -Uvh e2fsprogs-<ver> + $ rpm -Uvh e2fsprogs-ver The rpm command options --force or --nodeps should not be used to install or update the Lustre-specific e2fsprogs package. If errors are reported, file a bug (for instructions see the topic Reporting Bugs @@ -415,8 +415,8 @@ lustre-modules-<ver> lustre-ldiskfs-<ver> Install the module packages for clients. - Use the rpm -ivh command to install the lustre-client and lustre-client-modules-<ver> packages. For example: - $ rpm -ivh lustre-client-modules-<ver> kernel-ib-<ver> + Use the rpm -ivh command to install the lustre-client and lustre-client-modules-ver packages. For example: + $ rpm -ivh lustre-client-modules-ver kernel-ib-ver Install the utilities/userspace packages for clients. diff --git a/InstallingLustreFromSourceCode.xml b/InstallingLustreFromSourceCode.xml index 73b7b49..be1a2b5 100644 --- a/InstallingLustreFromSourceCode.xml +++ b/InstallingLustreFromSourceCode.xml @@ -158,7 +158,7 @@ $ quilt push -av Configure the patched kernel to run with Lustre. Run: - $ cd <path to kernel tree> + $ cd /path/to/kernel/tree $ cp /boot/config-`uname -r` .config $ make oldconfig || make menuconfig $ make include/asm @@ -168,10 +168,10 @@ $ make include/linux/utsrelease.h Run the Lustre configure script against the patched kernel and create the Lustre packages. - $ cd <path to lustre source tree> -$ ./configure --with-linux=<path to kernel tree> + $ cd /path/to/lustre/source/tree +$ ./configure --with-linux=/path/to/kernel/tree $ make rpms - This creates a set of .rpms in /usr/src/redhat/RPMS/<arch> with an appended date-stamp. The SuSE path is /usr/src/packages. + This creates a set of .rpms in /usr/src/redhat/RPMS/arch with an appended date-stamp. The SuSE path is /usr/src/packages. You do not need to run the Lustre configure script against an unpatched kernel. @@ -210,20 +210,20 @@ lustre-source-1.6.5.1-\2.6.18_53.xx.xx.el5_lustre.1.6.5.1.custom_20081021.i686.r Install the kernel, modules and ldiskfs packages. Navigate to the directory where the RPMs are stored, and use the rpm -ivh command to install the kernel, module and ldiskfs packages. - $ rpm -ivh kernel-lustre-smp-<ver> \ -kernel-ib-<ver> \ -lustre-modules-<ver> \ -lustre-ldiskfs-<ver> + $ rpm -ivh kernel-lustre-smp-ver \ +kernel-ib-ver \ +lustre-modules-ver \ +lustre-ldiskfs-ver Install the utilities/userspace packages. Use the rpm -ivh command to install the utilities packages. For example: - $ rpm -ivh lustre-<ver> + $ rpm -ivh lustre-ver Install the e2fsprogs package. Make sure the e2fsprogs package is unpacked, and use the rpm -i command to install it. For example: - $ rpm -i e2fsprogs-<ver> + $ rpm -i e2fsprogs-ver (Optional) If you want to add optional packages to your Lustre system, install them now. diff --git a/LNETSelfTest.xml b/LNETSelfTest.xml index 5aeea28..f26a44e 100644 --- a/LNETSelfTest.xml +++ b/LNETSelfTest.xml @@ -120,10 +120,10 @@ lst add_group writers 192.168.1.[2-254/2]@o2ib
Defining and Running the Tests - A test generates a network load between two groups of nodes, a source group identified using the --from parameter and a target group identified using the --to parameter. When a test is running, each node in the --from<group> simulates a client by sending requests to nodes in the --to<group>, which are simulating a set of servers, and then receives responses in return. This activity is designed to mimic Lustre RPC traffic. + A test generates a network load between two groups of nodes, a source group identified using the --from parameter and a target group identified using the --to parameter. When a test is running, each node in the --from group simulates a client by sending requests to nodes in the --to group, which are simulating a set of servers, and then receives responses in return. This activity is designed to mimic Lustre RPC traffic. A batch is a collection of tests that are started and stopped together and run in parallel. A test must always be run as part of a batch, even if it is just a single test. Users can only run or stop a test batch, not individual tests. Tests in a batch are non-destructive to the file system, and can be run in a normal Lustre environment (provided the performance impact is acceptable). - A simple batch might contain a single test, for example, to determine whether the network bandwidth presents an I/O bottleneck. In this example, the --to<group> could be comprised of Lustre OSSs and --from<group> the compute nodes. A second test could be added to perform pings from a login node to the MDS to see how checkpointing affects the ls -l process. + A simple batch might contain a single test, for example, to determine whether the network bandwidth presents an I/O bottleneck. In this example, the --to group could be comprised of Lustre OSSs and --from group the compute nodes. A second test could be added to perform pings from a login node to the MDS to see how checkpointing affects the ls -l process. Two types of tests are available: @@ -211,7 +211,7 @@ lst end_session - --timeout<seconds> + --timeout seconds Console timeout value of the session. The session ends automatically if it remains idle (i.e., no commands are issued) for this period. @@ -227,9 +227,7 @@ lst end_session - - <name> - + name A human-readable string to print when listing sessions or reporting session conflicts. @@ -241,12 +239,12 @@ lst end_session Example: $ lst new_session --force read_write - end_session + end_session Stops all operations and tests in the current session and clears the session's status. $ lst end_session - show_session + show_session Shows the session information. This command prints information about the current session. It does not require LST_SESSION to be defined in the process environment. $ lst show_session @@ -255,8 +253,8 @@ lst end_session Group Commands This section describes lst group commands. - add_group - <name> <NIDS> [<NIDs>...] + add_group + name NIDs [NIDs ...] Creates the group and adds a list of test nodes to the group. @@ -277,7 +275,7 @@ lst end_session - <name> + name @@ -287,7 +285,7 @@ lst end_session - <NIDs> + NIDs @@ -302,13 +300,10 @@ lst end_session $ lst add_group clients 192.168.1.[10-100]@tcp 192.168.[2,4].\ [10-20]@tcp - update_group - <name> - [--refresh] [--clean - <status> - ] [--remove - <NIDs> - ] + update_group + name + [--refresh] [--clean status] + [--remove NIDs] Updates the state of nodes in a group or adjusts a group's membership. This command is useful if some nodes have crashed and should be excluded from the group. @@ -339,7 +334,7 @@ $ lst add_group clients 192.168.1.[10-100]@tcp 192.168.[2,4].\ - --clean<status> + --clean status Removes nodes with a specified status from the group. Status may be: @@ -402,7 +397,7 @@ $ lst add_group clients 192.168.1.[10-100]@tcp 192.168.[2,4].\ - --remove<NIDs> + --remove NIDs Removes specified nodes from the group. @@ -418,9 +413,9 @@ $ lst update_group clients --clean invalid // \ invalid == busy || down || unknown $ lst update_group clients --remove \192.168.1.[10-20]@tcp - list_group [ - <name> - ] [--active] [--busy] [--down] [--unknown] [--all] + list_group [ + name + ] [--active] [--busy] [--down] [--unknown] [--all] Prints information about a group or lists all groups in the current session if no group is specified. @@ -441,7 +436,7 @@ $ lst update_group clients --remove \192.168.1.[10-20]@tcp - <name> + name @@ -515,12 +510,12 @@ $ lst list_group clients --busy 192.168.1.12@tcp Busy Total 1 node - del_group - <name> + del_group + name Removes a group from the session. If the group is referred to by any test, then the operation fails. If nodes in the group are referred to only by this group, then they are kicked out from the current session; otherwise, they are still in the current session. $ lst del_group clients - lstclient --sesid <NID> --group <name> [--server_mode] + lstclient --sesid NID --group name [--server_mode] Use lstclient to run the userland self-test client. The lstclient command should be executed after creating a session on the console. There are only two mandatory options for lstclient: @@ -539,7 +534,7 @@ Total 1 node - --sesid<NID> + --sesid NID The first console's NID. @@ -547,7 +542,7 @@ Total 1 node - --group<name> + --group name The test group to join. @@ -574,13 +569,14 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients Batch and Test Commands This section describes lst batch and test commands. - add_batch NAME + add_batch name A default batch test set named batch is created when the session is started. You can specify a batch name by using add_batch: $ lst add_batch bulkperf Creates a batch test called bulkperf. - add_test --batch <batchname> [--loop<#>] [--concurrency<#>] [--distribute<#:#>] \ --from <group> --to <group> {brw|ping} <test options> +add_test --batch batchname [--loop loop_count] [--concurrency active_count] [--distribute source_count:sink_count] \ + --from group --to group brw|ping test_options Adds a test to a batch. The parameters are described below. @@ -601,7 +597,7 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - --batch<batchname> + --batch batchname Names a group of tests for later execution. @@ -609,7 +605,7 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - --loop<#> + --loop loop_count Number of times to run the test. @@ -617,7 +613,7 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - --concurrency<#> + --concurrency active_count The number of requests that are active at one time. @@ -625,7 +621,7 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - --distribute<#:#> + --distribute source_count:sink_count Determines the ratio of client nodes to server nodes for the specified test. This allows you to specify a wide range of topologies, including one-to-one and all-to-all. Distribution divides the source group into subsets, which are paired with equivalent subsets from the target group so only nodes in matching subsets communicate. @@ -633,7 +629,7 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - --from<group> + --from group The source group (test client). @@ -641,7 +637,7 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - --to<group> + --to group The target group (test server). @@ -677,10 +673,10 @@ Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients - size=<#>| <#>K | <#>M + size=bytes[KM] - I/O size in bytes, KB or MB (i.e., size=1024, size=4K, size=1M). The default is 4K bytes. + I/O size in bytes, kilobytes, or Megabytes (i.e., size=1024, size=4K, size=1M). The default is 4 kilobytes. @@ -706,8 +702,8 @@ Server: (S1, S2, S3) --distribute 4:2 (C1,C2,C3,C4->S1,S2), (C5, C6->S3, S1) --distribute 6:3 (C1,C2,C3,C4,C5,C6->S1,S2,S3) The setting --distribute 1:1 is the default setting where each source node communicates with one target node. - When the setting --distribute 1:<n> (where - <n> + When the setting --distribute 1:n (where + n is the size of the target group) is used, each source node communicates with every node in the target group. Note that if there are more source nodes than target nodes, some source nodes may share the same target nodes. Also, if there are more target nodes than source nodes, some higher-ranked target nodes will be idle. Example showing a brw test: @@ -726,7 +722,7 @@ $ lst add_test --batch bulkperf --loop 100 --concurrency 4 \ - list_batch [<name>] [--test <index>] [--active] [--invalid] [--server | client] + list_batch [name] [--test index] [--active] [--invalid] [--server|client] Lists batches in the current session or lists client and server nodes in a batch or a test. @@ -747,7 +743,7 @@ $ lst add_test --batch bulkperf --loop 100 --concurrency 4 \ - --test<index> + --test index Lists tests in a batch. If no option is used, all tests in the batch are listed. If one of these options are used, only specified tests in the batch are listed: @@ -806,34 +802,26 @@ $ lst list_batch bulkperf --server --active 192.168.10.102@tcp Active 192.168.10.103@tcp Active - run - - <name> - + run + name Runs the batch. $ lst run bulkperf - - stop - <name> - + stop + name Stops the batch. $ lst stop bulkperf - query - <name> - [--test - <index> - ] [--timeout - <seconds> - ] [--loop - <#> - ] [--delay - <seconds> - ] [--all] + query + name + [--test index] + [--timeout seconds] + [--loop #] + [--delay seconds] + [--all] Queries the batch status. @@ -854,7 +842,7 @@ $ lst list_batch bulkperf --server --active - --test<index> + --test index Only queries the specified test. The test index starts from 1. @@ -862,7 +850,7 @@ $ lst list_batch bulkperf --server --active - --timeout<seconds> + --timeout seconds The timeout value to wait for RPC. The default is 5 seconds. @@ -870,7 +858,7 @@ $ lst list_batch bulkperf --server --active - --loop<#> + --loop loop_count The loop count of the query. @@ -878,7 +866,7 @@ $ lst list_batch bulkperf --server --active - --delay<seconds> + --delay seconds The interval of each query. The default is 5 seconds. @@ -920,17 +908,11 @@ Batch is idle Other Commands This section describes other lst commands. - - ping [-session] [--group - <name> - ] [--nodes - <NIDs> - ] [--batch - <name> - ] [--server] [--timeout - <seconds> - ] - + ping [-session] [--group name] + [--nodes NIDs] + [--batch name] + [--server] + [--timeout seconds] Sends a 'hello' query to the nodes. @@ -958,7 +940,7 @@ Batch is idle - --group<name> + --group name Pings all nodes in a specified group. @@ -966,7 +948,7 @@ Batch is idle - --nodes<NIDs> + --nodes NIDs Pings all specified nodes. @@ -974,7 +956,7 @@ Batch is idle - --batch<name> + --batchname Pings all client nodes in a batch. @@ -985,12 +967,12 @@ Batch is idle --server - Sends RPC to all server nodes instead of client nodes. This option is only used with --batch<name>. + Sends RPC to all server nodes instead of client nodes. This option is only used with --batch name. - --timeout<seconds> + --timeout seconds The RPC timeout value. @@ -1000,7 +982,7 @@ Batch is idle Example: - $ lst ping 192.168.10.[15-20]@tcp + # lst ping 192.168.10.[15-20]@tcp 192.168.1.15@tcp Active [session: liang id: 192.168.1.3@tcp] 192.168.1.16@tcp Active [session: liang id: 192.168.1.3@tcp] 192.168.1.17@tcp Active [session: liang id: 192.168.1.3@tcp] @@ -1008,21 +990,11 @@ Batch is idle 192.168.1.19@tcp Down [session: <NULL> id: LNET_NID_ANY] 192.168.1.20@tcp Down [session: <NULL> id: LNET_NID_ANY] - - stat [--bw] [--rate] [--read] [--write] [--max] [--min] [--avg] " " [--timeout - <seconds> - ] [--delay - <seconds> - ] - <group> - |< - NIDs> - [ - <group> - | - <NIDs> - ] - + stat [--bw] [--rate] [--read] [--write] [--max] [--min] [--avg] " " + [--timeout seconds] + [--delay seconds] + group|NIDs + [group|NIDs] The collection performance and RPC statistics of one or more nodes. @@ -1098,7 +1070,7 @@ Batch is idle - --timeout<seconds> + --timeout seconds The timeout of the statistics RPC. The default is 5 seconds. @@ -1106,7 +1078,7 @@ Batch is idle - --delay<seconds> + --delay seconds The interval of the statistics (in seconds). @@ -1135,7 +1107,7 @@ $ lst stat clients Only LNET performance statistics are available. By default, all statistics information is displayed. Users can specify additional information with these options. - show_error [--session] [<group>|<NIDs>]... + show_error [--session] [group|NIDs]... Lists the number of failed RPCs on test nodes. diff --git a/LustreDebugging.xml b/LustreDebugging.xml index 4b21340..761deea 100644 --- a/LustreDebugging.xml +++ b/LustreDebugging.xml @@ -42,7 +42,7 @@ Lustre Debugging Tools - This tool is used with the debug_kernel option to manually dump the Lustre debugging log or post-process debugging logs that are dumped automatically. For more information about the lctl tool, see and . - Lustre subsystem asserts - A panic-style assertion (LBUG) in the kernel causes Lustre to dump the debug log to the file /tmp/lustre-log.<timestamp> where it can be retrieved after a reboot. For more information, see . + Lustre subsystem asserts - A panic-style assertion (LBUG) in the kernel causes Lustre to dump the debug log to the file /tmp/lustre-log.timestamp where it can be retrieved after a reboot. For more information, see . @@ -105,7 +105,7 @@ Lustre Debugging Tools - leak_finder.pl + leak_finder.pl . This program provided with Lustre is useful for finding memory leaks in the code. @@ -471,13 +471,13 @@ Lustre Debugging Tools Obtain a list of all the types and subsystems: - lctl > debug_list {subs | types} + lctl > debug_list subsystems|types Filter the debug log: - lctl > filter {subsystem name | debug type} + lctl > filter subsystem_name|debug_type @@ -486,16 +486,16 @@ Lustre Debugging Tools Show debug messages belonging to certain subsystem or type: - lctl > show {subsystem name | debug type} + lctl > show subsystem_name|debug_type debug_kernel pulls the data from the kernel logs, filters it appropriately, and displays or saves it as per the specified options - lctl > debug_kernel [output filename] + lctl > debug_kernel [output filename] If the debugging is being done on User Mode Linux (UML), it might be useful to save the logs on the host machine so that they can be used at a later time. Filter a log on disk, if you already have a debug log saved to disk (likely from a crash): - lctl > debug_file {input filename} [output filename] + lctl > debug_file input_file [output_file] During the debug session, you can add markers or breaks to the log for any reason: lctl > mark [marker text] The marker text defaults to the current date and time in the debug log (similar to the example shown below): @@ -540,10 +540,10 @@ Debug log: 324 lines, 258 kept, 66 dropped.
<literal>lctl debug_daemon</literal> Commands To initiate the debug_daemon to start dumping the debug_buffer into a file, run as the root user: - lctl debug_daemon start {filename} [{megabytes}] + lctl debug_daemon start filename [megabytes] The debug log will be written to the specified filename from the kernel. The file will be limited to the optionally specified number of megabytes. The daemon wraps around and dumps data to the beginning of the file when the output file size is over the limit of the user-specified file size. To decode the dumped file to ASCII and sort the log entries by time, run: - lctl debug_file {filename} > {newfile} + lctl debug_file filename > newfile The output is internally sorted by the lctl command. To stop the debug_daemon operation and flush the file output, run: lctl debug_daemon stop @@ -551,7 +551,7 @@ Debug log: 324 lines, 258 kept, 66 dropped. This is an example using debug_daemon with the interactive mode of lctl to dump debug logs to a 40 MB file. lctl lctl > debug_daemon start /var/log/lustre.40.bin 40 - {run filesystem operations to debug} + run filesystem operations to debug lctl > debug_daemon stop lctl > debug_file /var/log/lustre.bin /var/log/lustre.log To start another daemon with an unlimited file size, run: @@ -561,7 +561,7 @@ Debug log: 324 lines, 258 kept, 66 dropped.
<indexterm><primary>debugging</primary><secondary>kernel debug log</secondary></indexterm>Controlling Information Written to the Kernel Debug Log - The lctl set_param subsystem_debug={subsystem_mask} and lctl set_param debug={debug_mask} are used to determine which information is written to the debug log. The subsystem_debug mask determines the information written to the log based on the functional area of the code (such as lnet, osc, or ldlm). The debug mask controls information based on the message type (such as info, error, trace, or malloc). + The lctl set_param subsystem_debug=subsystem_mask and lctl set_param debug=debug_mask are used to determine which information is written to the debug log. The subsystem_debug mask determines the information written to the log based on the functional area of the code (such as lnet, osc, or ldlm). The debug mask controls information based on the message type (such as info, error, trace, or malloc). To turn off Lustre debugging completely: lctl set_param debug=0 To turn on full Lustre debugging: @@ -578,11 +578,11 @@ Debug log: 324 lines, 258 kept, 66 dropped. <indexterm><primary>debugging</primary><secondary>using strace</secondary></indexterm>Troubleshooting with <literal>strace</literal> The strace utility provided with the Linux distribution enables system calls to be traced by intercepting all the system calls made by a process and recording the system call name, arguments, and return values. To invoke strace on a program, enter: - $ strace {program} {args} + $ strace program [arguments] Sometimes, a system call may fork child processes. In this situation, use the -f option of strace to trace the child processes: - $ strace -f {program} {args} + $ strace -f program [arguments] To redirect the strace output to a file, enter: - $ strace -o {filename} {program} {args} + $ strace -o filename program [arguments] Use the -ff option, along with -o, to save the trace output in filename.pid, where pid is the process ID of the process being traced. Use the -ttt option to timestamp all lines in the strace output, so they can be correlated to operations in the lustre kernel debug log.
@@ -648,7 +648,7 @@ Filesystem volume name: myth-OST0004 Lustre has a specific debug type category for tracing lock traffic. Use: lctl> filter all_types lctl> show dlmtrace -lctl> debug_kernel [filename] +lctl> debug_kernel [filename]
@@ -829,7 +829,7 @@ lctl> debug_kernel [filename] - Allows insertion of failure points into the Lustre code. This is useful to generate regression tests that can hit a very specific sequence of events. This works in conjunction with "lctl set_param fail_loc={fail_loc}" to set a specific failure point for which a given OBD_FAIL_CHECK() will test. + Allows insertion of failure points into the Lustre code. This is useful to generate regression tests that can hit a very specific sequence of events. This works in conjunction with "lctl set_param fail_loc=fail_loc" to set a specific failure point for which a given OBD_FAIL_CHECK() will test. @@ -932,7 +932,7 @@ lctl> debug_kernel [filename] Requests in the history include "live" requests that are currently being handled. Each line in req_history looks like: - <seq>:<target NID>:<client ID>:<xid>:<length>:<phase> <svc specific> + sequence:target_NID:client_NID:cliet_xid:request_length:rpc_phase service_specific_data @@ -961,7 +961,7 @@ lctl> debug_kernel [filename] - target NID + target NID @@ -971,7 +971,7 @@ lctl> debug_kernel [filename] - client ID + client ID @@ -1046,7 +1046,7 @@ lctl> debug_kernel [filename] Run the leak finder on the newly-created log dump: - perl leak_finder.pl {ascii-logname} + perl leak_finder.pl ascii-logname The output is: diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml index c92ac1f..8647673 100644 --- a/LustreMaintenance.xml +++ b/LustreMaintenance.xml @@ -52,9 +52,9 @@ maintenanceinactive OSTs Working with Inactive OSTs To mount a client or an MDT with one or more inactive OSTs, run commands similar to this: - client> mount -o exclude=testfs-OST0000 -t lustre \ + client# mount -o exclude=testfs-OST0000 -t lustre \ uml1:/testfs /mnt/testfs - client> cat /proc/fs/lustre/lov/testfs-clilov-*/target_obd + client# cat /proc/fs/lustre/lov/testfs-clilov-*/target_obd To activate an inactive OST on a live client or MDT, use the lctl activate command on the OSC device. For example: lctl --device 7 activate @@ -79,7 +79,7 @@ Finding Nodes in the Lustre File System lustre-OST0000 lustre-OST0001 To get the names of all OSTs, run this command on the MDS: - # cat /proc/fs/lustre/lov/<fsname>-mdtlov/target_obd + # cat /proc/fs/lustre/lov/fsname-mdtlov/target_obd This command must be run on the MDS. @@ -93,8 +93,8 @@ Finding Nodes in the Lustre File System <indexterm><primary>maintenance</primary><secondary>mounting a server</secondary></indexterm> Mounting a Server Without Lustre Service If you are using a combined MGS/MDT, but you only want to start the MGS and not the MDT, run this command: - mount -t lustre <MDT partition> -o nosvc <mount point> - The <MDT partition> variable is the combined MGS/MDT. + mount -t lustre /dev/mdt_partition -o nosvc /mount_point + The mdt_partition variable is the combined MGS/MDT block device. In this example, the combined MGS/MDT is testfs-MDT0000 and the mount point is /mnt/test/mdt. $ mount -t lustre -L testfs-MDT0000 -o nosvc /mnt/test/mdt
@@ -165,13 +165,13 @@ Regenerating Lustre Configuration Logs On the MDT, run: - <mdt node>$ tunefs.lustre --writeconf <device> + mdt# tunefs.lustre --writeconf /dev/mdt_device On each OST, run: - <ost node>$ tunefs.lustre --writeconf <device> + ost# tunefs.lustre --writeconf /dev/ost_device @@ -238,15 +238,15 @@ Changing a Server NID On the MDT, run: - <mdt node>$ tunefs.lustre --writeconf <device> + mdt# tunefs.lustre --writeconf /dev/mdt_device On each OST, run: - <ost node>$ tunefs.lustre --writeconf <device> + ost# tunefs.lustre --writeconf /dev/ost_device If the NID on the MGS was changed, communicate the new MGS location to each server. Run: - tunefs.lustre --erase-param --mgsnode=<new_nid(s)> --writeconf /dev/.. + tunefs.lustre --erase-param --mgsnode=new_nid(s) --writeconf /dev/device @@ -287,13 +287,13 @@ client$ lctl dl | grep mdc Add the new block device as a new MDT at the next available index. In this example, the next available index is 4. -mkfs.lustre --reformat --fsname=<filesystemname> --mdt --mgsnode=<mgsnode> --index 4 <blockdevice> +mds# mkfs.lustre --reformat --fsname=filesystem_name --mdt --mgsnode=mgsnode --index 4 /dev/mdt4_device Mount the MDTs. -mount –t lustre <blockdevice> /mnt/mdt4 +mds# mount –t lustre /dev/mdt4_blockdevice /mnt/mdt4
@@ -305,9 +305,9 @@ Adding a New OST to a Lustre File System Add a new OST by passing on the following commands, run: - $ mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda -$ mkdir -p /mnt/test/ost12 -$ mount -t lustre /dev/sda /mnt/test/ost12 + oss# mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda +oss# mkdir -p /mnt/test/ost12 +oss# mount -t lustre /dev/sda /mnt/test/ost12 Migrate the data (possibly). @@ -315,9 +315,9 @@ $ mount -t lustre /dev/sda /mnt/test/ost12 New files being created will preferentially be placed on the empty OST. As old files are deleted, they will release space on the old OST. Files existing prior to the expansion can optionally be rebalanced with an in-place copy, which can be done with a simple script. The basic method is to copy existing files to a temporary file, then move the temp file over the old one. This should not be attempted with files which are currently being written to by users or applications. This operation redistributes the stripes over the entire set of OSTs. For example, to rebalance all files within /mnt/lustre/dir, enter: - lfs_migrate /mnt/lustre/file + client# lfs_migrate /mnt/lustre/file To migrate files within the /test filesystem on OST0004 that are larger than 4GB in size, enter: - lfs find /test -obd test-OST0004 -size +4G | lfs_migrate -y + client# lfs find /test -obd test-OST0004 -size +4G | lfs_migrate -y See for more details. @@ -370,7 +370,7 @@ client$ lfs getstripe -M /mnt/lustre/local_dir0 List all OSCs on the node, along with their device numbers. Run: - lctl dl|grep " osc " + lctl dl|grep osc This is sample lctl dl | grep 11 UP osc lustre-OST-0000-osc-cac94211 4ea5b30f-6a8e-55a0-7519-2f20318ebdb4 5 12 UP osc lustre-OST-0001-osc-cac94211 4ea5b30f-6a8e-55a0-7519-2f20318ebdb4 5 @@ -384,10 +384,10 @@ client$ lfs getstripe -M /mnt/lustre/local_dir0 Temporarily deactivate the OSC on the MDT. On the MDT, run: - $ mdt> lctl --device <devno> deactivate + mds# lctl --device lustre_devno deactivate For example, based on the command output in Step 1, to deactivate device 13 (the MDT’s OSC for OST-0000), the command would be: - $ mdt> lctl --device 13 deactivate + mds# lctl --device 13 deactivate This marks the OST as inactive on the MDS, so no new objects are assigned to the OST. This does not prevent use of existing objects for reads or writes. @@ -405,11 +405,11 @@ client$ lfs getstripe -M /mnt/lustre/local_dir0 If the OST is still online and available, find all files with objects on the deactivated OST, and copy them to other OSTs in the file system to: - [client]# lfs find --obd <OST UUID> <mount_point> | lfs_migrate -y + client# lfs find --obd ost_name /mount/point | lfs_migrate -y If the OST is no longer available, delete the files on that OST and restore them from backup: - [client]# lfs find --obd <OST UUID> -print0 <mount_point> | \ + client# lfs find --obd ost_uuid -print0 /mount/point | \ tee /tmp/files_to_restore | xargs -0 -n 1 unlink The list of files that need to be restored from backup is stored in /tmp/files_to_restore. Restoring these files is beyond the scope of this document. @@ -419,12 +419,12 @@ client$ lfs getstripe -M /mnt/lustre/local_dir0 Deactivate the OST. - To temporarily disable the deactivated OST, enter: [client]# lctl set_param osc.<fsname>-<OST name>-*.active=0If there is expected to be a replacement OST in some short time (a few days), the OST can temporarily be deactivated on the clients: + To temporarily disable the deactivated OST, enter: [client]# lctl set_param osc.fsname-OSTnumber-*.active=0If there is expected to be a replacement OST in some short time (a few days), the OST can temporarily be deactivated on the clients: This setting is only temporary and will be reset if the clients or MDS are rebooted. It needs to be run on all clients. - If there is not expected to be a replacement for this OST in the near future, permanently deactivate the OST on all clients and the MDS: [mgs]# lctl conf_param {OST name}.osc.active=0 + If there is not expected to be a replacement for this OST in the near future, permanently deactivate the OST on all clients and the MDS: [mgs]# lctl conf_param ost_name.osc.active=0 A removed OST still appears in the file system; do not create a new OST with the same name. @@ -439,20 +439,20 @@ Backing Up OST Configuration Files Mount the OST filesystem. - [oss]# mkdir -p /mnt/ost -[oss]# mount -t ldiskfs {ostdev} /mnt/ost + oss# mkdir -p /mnt/ost +[oss]# mount -t ldiskfs /dev/ost_device /mnt/ost Back up the OST configuration files. - [oss]# tar cvf {ostname}.tar -C /mnt/ost last_rcvd \ + oss# tar cvf ost_name.tar -C /mnt/ost last_rcvd \ CONFIGS/ O/0/LAST_ID Unmount the OST filesystem. - [oss]# umount /mnt/ost + oss# umount /mnt/ost @@ -471,20 +471,20 @@ Restoring OST Configuration Files Format the OST file system. - [oss]# mkfs.lustre --ost --index={old OST index} {other options} \ - {newdev} + oss# mkfs.lustre --ost --index=old_ost_index other_options \ + /dev/new_ost_dev Mount the OST filesystem. - [oss]# mkdir /mnt/ost -[oss]# mount -t ldiskfs {newdev} /mnt/ost + oss# mkdir /mnt/ost +oss# mount -t ldiskfs /dev/new_ost_dev /mnt/ost Restore the OST configuration files, if available. - [oss]# tar xvf {ostname}.tar -C /mnt/ost + oss# tar xvf ost_name.tar -C /mnt/ost Recreate the OST configuration files, if unavailable. @@ -493,14 +493,14 @@ Restoring OST Configuration Files The CONFIGS/mountdata file is created by mkfs.lustre at format time, but has flags set that request it to register itself with the MGS. It is possible to copy these flags from another working OST (which should be the same): -[oss1]# debugfs -c -R "dump CONFIGS/mountdata /tmp/ldd" {other_osdev} -[oss1]# scp /tmp/ldd oss:/tmp/ldd -[oss0]# dd if=/tmp/ldd of=/mnt/ost/CONFIGS/mountdata bs=4 count=1 seek=5 skip=5 +oss1# debugfs -c -R "dump CONFIGS/mountdata /tmp/ldd" /dev/other_osdev +oss1# scp /tmp/ldd oss0:/tmp/ldd +oss0# dd if=/tmp/ldd of=/mnt/ost/CONFIGS/mountdata bs=4 count=1 seek=5 skip=5 Unmount the OST filesystem. - [oss]# umount /mnt/ost + oss# umount /mnt/ost @@ -508,15 +508,15 @@ The CONFIGS/mountdata file is created by mkfs.lustre
<indexterm><primary>maintenance</primary><secondary>reintroducing an OSTs</secondary></indexterm> Returning a Deactivated OST to Service - If the OST was permanently deactivated, it needs to be reactivated in the MGS configuration. [mgs]# lctl conf_param {OST name}.osc.active=1 If the OST was temporarily deactivated, it needs to be reactivated on the MDS and clients. [mds]# lctl --device <devno> activate - [client]# lctl set_param osc.<fsname>-<OST name>-*.active=0 + If the OST was permanently deactivated, it needs to be reactivated in the MGS configuration. mgs# lctl conf_param ost_name.osc.active=1 If the OST was temporarily deactivated, it needs to be reactivated on the MDS and clients. mds# lctl --device lustre_devno activate + client# lctl set_param osc.fsname-OSTnumber-*.active=0
<indexterm><primary>maintenance</primary><secondary>aborting recovery</secondary></indexterm> <indexterm><primary>backup</primary><secondary>aborting recovery</secondary></indexterm> Aborting Recovery - You can abort recovery with either the lctl utility or by mounting the target with the abort_recov option (mount -o abort_recov). When starting a target, run: $ mount -t lustre -L <MDT name> -o abort_recov <mount_point> + You can abort recovery with either the lctl utility or by mounting the target with the abort_recov option (mount -o abort_recov). When starting a target, run: mds# mount -t lustre -L mdt_name -o abort_recov /mount_point The recovery process is blocked until all OSTs are available. @@ -524,7 +524,7 @@ Aborting Recovery
<indexterm><primary>maintenance</primary><secondary>identifying OST host</secondary></indexterm> Determining Which Machine is Serving an OST - In the course of administering a Lustre file system, you may need to determine which machine is serving a specific OST. It is not as simple as identifying the machine’s IP address, as IP is only one of several networking protocols that Lustre uses and, as such, LNET does not use IP addresses as node identifiers, but NIDs instead. To identify the NID that is serving a specific OST, run one of the following commands on a client (you do not need to be a root user): client$ lctl get_param osc.${fsname}-${OSTname}*.ost_conn_uuidFor example: client$ lctl get_param osc.*-OST0000*.ost_conn_uuid + In the course of administering a Lustre file system, you may need to determine which machine is serving a specific OST. It is not as simple as identifying the machine’s IP address, as IP is only one of several networking protocols that Lustre uses and, as such, LNET does not use IP addresses as node identifiers, but NIDs instead. To identify the NID that is serving a specific OST, run one of the following commands on a client (you do not need to be a root user): client$ lctl get_param osc.fsname-OSTnumber*.ost_conn_uuidFor example: client$ lctl get_param osc.*-OST0000*.ost_conn_uuid osc.lustre-OST0000-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp- OR - client$ lctl get_param osc.*.ost_conn_uuid osc.lustre-OST0000-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp osc.lustre-OST0001-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp @@ -536,9 +536,9 @@ osc.lustre-OST0004-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp <indexterm><primary>maintenance</primary><secondary>changing failover node address</secondary></indexterm> Changing the Address of a Failover Node To change the address of a failover node (e.g, to use node X instead of node Y), run this command on the OSS/OST partition: - tunefs.lustre --erase-params --failnode=<NID> <device> + oss# tunefs.lustre --erase-params --failnode=NID /dev/ost_device or - tunefs.lustre --erase-params --servicenode=<NID> <device> + oss# tunefs.lustre --erase-params --servicenode=NID /dev/ost_device
@@ -550,36 +550,36 @@ Separate a combined MGS/MDT Stop the MDS. Unmount the MDT - umount -f <device> + umount -f /dev/mdt_device Create the MGS. - mkfs.lustre --mgs --device-size=<size> <mgs-device> + mds# mkfs.lustre --mgs --device-size=size /dev/mgs_device Copy the configuration data from MDT disk to the new MGS disk. - mount -t ldiskfs -o ro <mdt-device> <mdt-mount-point> - mount -t ldiskfs -o rw <mgs-device> <mgs-mount-point> - cp -r <mdt-mount-point>/CONFIGS/<filesystem-name>-* <mgs-mount-point>/CONFIGS/. - umount <mgs-mount-point> - umount <mdt-mount-point> + mds# mount -t ldiskfs -o ro /dev/mdt_device /mdt_mount_point + mds# mount -t ldiskfs -o rw /dev/mgs_device /mgs_mount_point + mds# cp -r /mdt_mount_point/CONFIGS/filesystem_name-* /mgs_mount_point/CONFIGS/. + mds# umount /mgs_mount_point + mds# umount /mdt_mount_point See for alternative method. Start the MGS. - mount -t lustre <mgs-device> <mgs-mount-point> + mgs# mount -t lustre /dev/mgs_device /mgs_mount_point Check to make sure it knows about all your filesystem cat /proc/fs/lustre/mgs/MGS/filesystems Remove the MGS option from the MDT, and set the new MGS nid. - tunefs.lustre --nomgs --mgsnode=<new-MGS-nid> <mdt-device> + mds# tunefs.lustre --nomgs --mgsnode=new_mgs_nid /dev/mdt-device Start the MDT. - mount -t lustre <mdt-device> <mdt-mount-point> + mds# mount -t lustre /dev/mdt_device /mdt_mount_point Check to make sure the MGS configuration look right - cat /proc/fs/lustre/mgs/MGS/live/<filesystem-name> + mds# cat /proc/fs/lustre/mgs/MGS/live/filesystem_name
diff --git a/LustreMonitoring.xml b/LustreMonitoring.xml index 54c03b0..18a7fde 100644 --- a/LustreMonitoring.xml +++ b/LustreMonitoring.xml @@ -190,7 +190,7 @@ Working with Changelogs Because changelog records take up space on the MDT, the system administration must register changelog users. The registrants specify which records they are "done with", and the system purges up to the greatest common record. To register a new changelog user, run: - lctl --device <mdt_device> changelog_register + lctl --device /dev/mdt_device changelog_register Changelog entries are not purged beyond a registered user's set point (see lfs changelog_clear).
@@ -199,7 +199,7 @@ Working with Changelogs lfs changelog To display the metadata changes on an MDT (the changelog records), run: - lfs changelog <MDT name> [startrec [endrec]] + lfs changelog fsname-MDTnumber [startrec [endrec]] It is optional whether to specify the start and end records. These are sample changelog records: 2 02MKDIR 4298396676 0x0 t=[0x200000405:0x15f9:0x0] p=[0x13:0x15e5a7a3:0x0]\ @@ -216,8 +216,8 @@ x0] chloe.jpg lfs changelog_clear To clear old changelog records for a specific user (records that the user no longer needs), run: - lfs changelog_clear <MDT name> <user ID> <endrec> - The changelog_clear command indicates that changelog records previous to <endrec> are no longer of interest to a particular user <user ID>, potentially allowing the MDT to free up disk space. An <endrec> value of 0 indicates the current last record. To run changelog_clear, the changelog user must be registered on the MDT node using lctl. + lfs changelog_clear mdt_name userid endrec + The changelog_clear command indicates that changelog records previous to endrec are no longer of interest to a particular user userid, potentially allowing the MDT to free up disk space. An endrec value of 0 indicates the current last record. To run changelog_clear, the changelog user must be registered on the MDT node using lctl. When all changelog users are done with records < X, the records are deleted.
@@ -225,7 +225,7 @@ x0] chloe.jpg lctl changelog_deregister To deregister (unregister) a changelog user, run: - lctl --device <mdt_device> changelog_deregister <user ID> + lctl --device mdt_device changelog_deregister userid changelog_deregister cl1 effectively does a changelog_clear cl1 0 as it deregisters.
diff --git a/LustreOperations.xml b/LustreOperations.xml index bcf545d..0913a96 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -51,16 +51,16 @@ operationsmounting by label Mounting by Label The file system name is limited to 8 characters. We have encoded the file system and target information in the disk label, so you can mount by label. This allows system administrators to move disks around without worrying about issues such as SCSI disk reordering or getting the /dev/device wrong for a shared target. Soon, file system naming will be made as fail-safe as possible. Currently, Linux disk labels are limited to 16 characters. To identify the target within the file system, 8 characters are reserved, leaving 8 characters for the file system name: - <fsname>-MDT0000 or <fsname>-OST0a19 + fsname-MDT0000 or fsname-OST0a19 To mount by label, use this command: - $ mount -t lustre -L <file system label> <mount point> + mount -t lustre -L file_system_label /mount_point This is an example of mount-by-label: - $ mount -t lustre -L testfs-MDT0000 /mnt/mdt + mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt - Mount-by-label should NOT be used in a multi-path environment. + Mount-by-label should NOT be used in a multi-path environment or when snapshots are being created of the device, since multiple block devices will have the same label. Although the file system name is internally limited to 8 characters, you can mount the clients at any mount point, so file system users are not subjected to short names. Here is an example: - mount -t lustre mds0@tcp0:/shortfs /mnt/<long-file_system-name> + client# mount -t lustre mds0@tcp0:/short /dev/long_mountpoint_name
<indexterm><primary>operations</primary><secondary>starting</secondary></indexterm>Starting Lustre @@ -99,7 +99,7 @@ LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0
<indexterm><primary>operations</primary><secondary>unmounting</secondary></indexterm>Unmounting a Server - To stop a Lustre server, use the umount <mount point> command. + To stop a Lustre server, use the umount /mount point command. For example, to stop ost0 on mount point /mnt/test, run: $ umount /mnt/test Gracefully stopping a server with the umount command preserves the state of the connected clients. The next time the server is started, it waits for clients to reconnect, and then goes through the recovery procedure. @@ -120,15 +120,15 @@ LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 By default, the Lustre file system uses failover mode for OSTs. To specify failout mode instead, use the --param="failover.mode=failout" option: - $ mkfs.lustre --fsname=<fsname> --mgsnode=<MGS node NID> --param="failover.mode=failout" --ost --index="OST index" <block device name> + oss# mkfs.lustre --fsname=fsname --mgsnode=mgs_NID --param=failover.mode=failout --ost --index=ost_index /dev/ost_block_device In this example, failout mode is specified for the OSTs on MGS mds0, file system testfs. - $ mkfs.lustre --fsname=testfs --mgsnode=mds0 --param="failover.mode=failout" --ost --index=3 /dev/sdb + oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout --ost --index=3 /dev/sdb Before running this command, unmount all OSTs that will be affected by the change in the failover/failout mode. After initial file system configuration, use the tunefs.lustre utility to change the failover/failout mode. For example, to set the failout mode, run: - $ tunefs.lustre --param failover.mode=failout <OST partition> + $ tunefs.lustre --param failover.mode=failout /dev/ost_device
@@ -151,19 +151,19 @@ LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 Lustre supports multiple file systems provided the combination of NID:fsname is unique. Each file system must be allocated a unique name during creation with the --fsname parameter. Unique names for file systems are enforced if a single MGS is present. If multiple MGSs are present (for example if you have an MGS on every MDS) the administrator is responsible for ensuring file system names are unique. A single MGS and unique file system names provides a single point of administration and allows commands to be issued against the file system even if it is not mounted. Lustre supports multiple file systems on a single MGS. With a single MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs to co-exist. For example, multiple MGSs will be necessary if multiple file systems on different Lustre software versions are to be concurrently available. With multiple MGSs additional care must be taken to ensure file system names are unique. Each file system should have a unique fsname among all systems that may interoperate in the future. By default, the mkfs.lustre command creates a file system named lustre. To specify a different file system name (limited to 8 characters) at format time, use the --fsname option: - mkfs.lustre --fsname=<file system name> + mkfs.lustre --fsname=file_system_name The MDT, OSTs and clients in the new file system must use the same filesystem name (prepended to the device name). For example, for a new file system named foo, the MDT and two OSTs would be named foo-MDT0000, foo-OST0000, and foo-OST0001. To mount a client on the file system, run: - mount -t lustre mgsnode:/<new fsname> <mountpoint> + client# mount -t lustre mgsnode:/new_fsname /mount_point For example, to mount a client on file system foo at mount point /mnt/foo, run: - mount -t lustre mgsnode:/foo /mnt/foo + client# mount -t lustre mgsnode:/foo /mnt/foo If a client(s) will be mounted on several file systems, add the following line to /etc/xattr.conf file to avoid problems when files are moved between the file systems: lustre.* skip - To ensure that a new MDT is added to an existing MGS create the MDT by specifying: --mdt --mgsnode=<MGS node NID>. + To ensure that a new MDT is added to an existing MGS create the MDT by specifying: --mdt --mgsnode=mgs_NID. A Lustre installation with two file systems (foo and bar) could look like this, where the MGS node is mgsnode@tcp0 and the mount points are /mnt/foo and /mnt/bar. mgsnode# mkfs.lustre --mgs /dev/sda @@ -174,14 +174,14 @@ mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0 /dev ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0 /dev/sdc ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 /dev/sdd To mount a client on file system foo at mount point /mnt/foo, run: - mount -t lustre mgsnode@tcp0:/foo /mnt/foo + client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo To mount a client on file system bar at mount point /mnt/bar, run: - mount -t lustre mgsnode@tcp0:/bar /mnt/bar + client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar
<indexterm><primary>operations</primary><secondary>remote directory</secondary></indexterm>Creating a sub-directory on a given MDT Lustre 2.4 enables individual sub-directories to be serviced by unique MDTs. An administrator can allocate a sub-directory to a given MDT using the command: - lfs mkdir –i <mdtindex> <remote_dir> + client# lfs mkdir –i mdt_index /mount_point/remote_dir This command will allocate the sub-directory remote_dir onto the MDT of index mdtindex. For more information on adding additional MDTs and mdtindex see . An administrator can allocate remote sub-directories to separate MDTs. Creating remote sub-directories in parent directories not hosted on MDT0 is not recommended. This is because the failure of the parent MDT will leave the namespace below it inaccessible. For this reason, by default it is only possible to create remote sub-directories off MDT0. To relax this restriction and enable remote sub-directories off any MDT, an administrator must issue the command lctl set_param mdd.*.enable_remote_dir=1. @@ -203,17 +203,17 @@ ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 /dev
Setting Tunable Parameters with <literal>mkfs.lustre</literal> When the file system is first formatted, parameters can simply be added as a --param option to the mkfs.lustre command. For example: - $ mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda + mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda For more details about creating a file system,see . For more details about mkfs.lustre, see .
Setting Parameters with <literal>tunefs.lustre</literal> If a server (OSS or MDS) is stopped, parameters can be added to an existing filesystem using the --param option to the tunefs.lustre command. For example: - $ tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda - With tunefs.lustre, parameters are "additive" -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run: - $ tunefs.lustre --erase-params --param=<new parameters> - The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as <obd|fsname>.<obdtype>.<proc_file_name>=<value>. For example: - $ tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1 + oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda + With tunefs.lustre, parameters are additive -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run: + mds# tunefs.lustre --erase-params --param=new_parameters + The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as obdname|fsname.obdtype.proc_file_name=value. For example: + mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1 For more details about tunefs.lustre, see .
@@ -226,7 +226,7 @@ ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 /dev
Setting Temporary Parameters Use lctl set_param to set temporary parameters on the node where it is run. These parameters map to items in /proc/{fs,sys}/{lnet,lustre}. The lctl set_param command uses this syntax: - lctl set_param [-n] <obdtype>.<obdname>.<proc_file_name>=<value> + lctl set_param [-n] obdtype.obdname.proc_file_name=value For example: # lctl set_param osc.*.max_dirty_mb=1024 osc.myth-OST0000-osc.max_dirty_mb=32 @@ -238,9 +238,9 @@ osc.myth-OST0004-osc.max_dirty_mb=32
Setting Permanent Parameters Use the lctl conf_param command to set permanent parameters. In general, the lctl conf_param command can be used to specify any parameter settable in a /proc/fs/lustre file, with its own OBD device. The lctl conf_param command uses this syntax (same as the mkfs.lustre and tunefs.lustre commands): - <obd|fsname>.<obdtype>.<proc_file_name>=<value>) + obdname|fsname.obdtype.proc_file_name=value) Here are a few examples of lctl conf_param commands: - $ mgs> lctl conf_param testfs-MDT0000.sys.timeout=40 + mgs# lctl conf_param testfs-MDT0000.sys.timeout=40 $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE $ lctl conf_param testfs.llite.max_read_ahead_mb=16 $ lctl conf_param testfs-MDT0000.lov.stripesize=2M @@ -254,22 +254,22 @@ $ lctl conf_param testfs.sys.timeout=40
Listing Parameters To list Lustre or LNET parameters that are available to set, use the lctl list_param command. For example: - lctl list_param [-FR] <obdtype>.<obdname> + lctl list_param [-FR] obdtype.obdname The following arguments are available for the lctl list_param command. -F Add '/', '@' or '=' for directories, symlinks and writeable files, respectively -R Recursively lists all parameters under the specified path For example: - $ lctl list_param obdfilter.lustre-OST0000 + oss# lctl list_param obdfilter.lustre-OST0000
Reporting Current Parameter Values To report current Lustre parameter values, use the lctl get_param command with this syntax: - lctl get_param [-n] <obdtype>.<obdname>.<proc_file_name> + lctl get_param [-n] obdtype.obdname.proc_file_name This example reports data on RPC service times. - $ lctl get_param -n ost.*.ost_io.timeouts + oss# lctl get_param -n ost.*.ost_io.timeouts service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 This example reports the amount of space this client has reserved for writeback cache with each OST: - # lctl get_param osc.*.cur_grant_bytes + client# lctl get_param osc.*.cur_grant_bytes osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152 osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304 osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112 @@ -284,16 +284,16 @@ osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384 lctl list_nids This displays the server's NIDs (networks configured to work with Lustre). This example has a combined MGS/MDT failover pair on mds0 and mds1, and a OST failover pair on oss0 and oss1. There are corresponding Elan addresses on mds0 and mds1. - mds0> mkfs.lustre --fsname=testfs --mdt --mgs --failnode=mds1,2@elan /dev/sda1 -mds0> mount -t lustre /dev/sda1 /mnt/test/mdt -oss0> mkfs.lustre --fsname=testfs --failnode=oss1 --ost --index=0 \ + mds0# mkfs.lustre --fsname=testfs --mdt --mgs --failnode=mds1,2@elan /dev/sda1 +mds0# mount -t lustre /dev/sda1 /mnt/test/mdt +oss0# mkfs.lustre --fsname=testfs --failnode=oss1 --ost --index=0 \ --mgsnode=mds0,1@elan --mgsnode=mds1,2@elan /dev/sdb -oss0> mount -t lustre /dev/sdb /mnt/test/ost0 -client> mount -t lustre mds0,1@elan:mds1,2@elan:/testfs /mnt/testfs -mds0> umount /mnt/mdt -mds1> mount -t lustre /dev/sda1 /mnt/test/mdt -mds1> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status - Where multiple NIDs are specified, comma-separation (for example, mds1,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the "best" one for communication. Colon-separation (for example, mds0:mds1) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.) +oss0# mount -t lustre /dev/sdb /mnt/test/ost0 +client# mount -t lustre mds0,1@elan:mds1,2@elan:/testfs /mnt/testfs +mds0# umount /mnt/mdt +mds1# mount -t lustre /dev/sda1 /mnt/test/mdt +mds1# cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status + Where multiple NIDs are specified, comma-separation (for example, mds1,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the best one for communication. Colon-separation (for example, mds0:mds1) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.) Two options exist to specify failover nodes. --failnode and --servicenode. --failnode specifies the NIDs of failover nodes. --servicenode specifies all service NIDs, including those of the primary node and of failover nodes. Option --servicenode makes the MDT or OST treat all its service nodes equally. The first service node to load the target device becomes the primary service node. Other node NIDs will become failover locations for the target device. If you have an MGS or MDT configured for failover, perform these steps: @@ -329,7 +329,7 @@ mds1> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status If you have a separate MGS (that you do not want to reformat), then add the "--writeconf" flag to mkfs.lustre on the MDT, run: - $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=<MGS node NID> --mdt --index=0 /dev/{mdsdev} + $ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode=mgs_nid --mdt --index=0 /dev/mds_device diff --git a/LustreProc.xml b/LustreProc.xml index c53066c..2424b27 100644 --- a/LustreProc.xml +++ b/LustreProc.xml @@ -23,7 +23,7 @@ All known file systems - # cat /proc/fs/lustre/mgs/MGS/filesystems + mgs# cat /proc/fs/lustre/mgs/MGS/filesystems spfs lustre @@ -31,15 +31,15 @@ lustre The server names participating in a file system (for each file system that has at least one server running) - # cat /proc/fs/lustre/mgs/MGS/live/spfs + mgs# cat /proc/fs/lustre/mgs/MGS/live/spfs fsname: spfs flags: 0x0 gen: 7 spfs-MDT0000 spfs-OST0000 - All servers are named according to this convention: <fsname>-<MDT|OST><XXXX>. This can be shown for live servers under /proc/fs/lustre/devices: - # cat /proc/fs/lustre/devices + All servers are named according to this convention: fsname-MDT|OSTnumber. This can be shown for live servers under /proc/fs/lustre/devices: + mds# cat /proc/fs/lustre/devices 0 UP mgs MGS MGS 11 1 UP mgc MGC192.168.10.34@tcp 1f45bb57-d9be-2ddb-c0b0-5431a49226705 2 UP mdt MDS MDS_uuid 3 @@ -52,7 +52,7 @@ spfs-OST0000 9 UP osc lustre-OST0000-osc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05 10 UP osc lustre-OST0001-osc-ce63ca00 08ac6584-6c4a-3536-2c6d-b36cf9cbdaa05 Or from the device label at any time: - # e2label /dev/sda + mds# e2label /dev/sda lustre-MDT0000
@@ -60,7 +60,7 @@ lustre-MDT0000 Lustre uses two types of timeouts. - LND timeouts that ensure point-to-point communications complete in finite time in the presence of failures. These timeouts are logged with the S_LND flag set. They may not be printed as console messages, so you should check the Lustre log for D_NETERROR messages, or enable printing of D_NETERROR messages to the console (echo + neterror > /proc/sys/lnet/printk). + LND timeouts that ensure point-to-point communications complete in finite time in the presence of failures. These timeouts are logged with the S_LND flag set. They may not be printed as console messages, so you should check the Lustre log for D_NETERROR messages, or enable printing of D_NETERROR messages to the console (lctl set_param printk=+neterror). Congested routers can be a source of spurious LND timeouts. To avoid this, increase the number of LNET router buffers to reduce back-pressure and/or increase LND timeouts on all nodes on all connected networks. You should also consider increasing the total number of LNET router nodes in the system so that the aggregate router bandwidth matches the aggregate server bandwidth. @@ -174,7 +174,7 @@ lustre-MDT0000 Adaptive timeouts are enabled, by default. To disable adaptive timeouts, at run time, set at_max to 0. On the MGS, run: - $ lctl conf_param <fsname>.sys.at_max=0 + $ lctl conf_param fsname.sys.at_max=0 Changing adaptive timeouts status at runtime may cause transient timeout, reconnect, recovery, etc. @@ -416,9 +416,9 @@ nid refs peer max
<indexterm><primary>proc</primary><secondary>free space</secondary></indexterm>Free Space Distribution Free-space stripe weighting, as set, gives a priority of "0" to free space (versus trying to place the stripes "widely" -- nicely distributed across OSSs and OSTs to maximize network balancing). To adjust this priority (as a percentage), use the qos_prio_free proc tunable: - $ cat /proc/fs/lustre/lov/<fsname>-mdtlov/qos_prio_free + $ cat /proc/fs/lustre/lov/fsname-mdtlov/qos_prio_free Currently, the default is 90%. You can permanently set this value by running this command on the MGS: - $ lctl conf_param <fsname>-MDT0000.lov.qos_prio_free=90 + $ lctl conf_param fsname-MDT0000.lov.qos_prio_free=90 Setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via weighting. If OST 2 has twice as much free space as OST 1, it is twice as likely to be used, but it is NOT guaranteed to be used. Also note that free-space stripe weighting does not activate until two OSTs are imbalanced by more than 20%. Until then, a faster round-robin stripe allocator is used. (The new round-robin order also maximizes network balancing.)
@@ -454,8 +454,9 @@ nid refs peer max
<indexterm><primary>proc</primary><secondary>I/O tunables</secondary></indexterm>Lustre I/O Tunables The section describes I/O tunables. - /proc/fs/lustre/llite/<fsname>-<uid>/max_cache_mb - # cat /proc/fs/lustre/llite/lustre-ce63ca00/max_cached_mb 128 + llite.fsname-instance/max_cache_mb + client# lctl get_param llite.lustre-ce63ca00.max_cached_mb +128 This tunable is the maximum amount of inactive data cached by the client (default is 3/4 of RAM).
<indexterm><primary>proc</primary><secondary>RPC tunables</secondary></indexterm>Client I/O RPC Stream Tunables @@ -468,21 +469,21 @@ $ ls /proc/fs/lustre/osc/OSC_uml0_ost1_MNT_localhost blocksizefilesfree max_dirty_mb ost_server_uuid stats ... and so on. RPC stream tunables are described below. - /proc/fs/lustre/osc/<object name>/max_dirty_mb + osc.osc_instance.max_dirty_mb This tunable controls how many MBs of dirty data can be written and queued up in the OSC. POSIX file writes that are cached contribute to this count. When the limit is reached, additional writes stall until previously-cached writes are written to the server. This may be changed by writing a single ASCII integer to the file. Only values between 0 and 512 are allowable. If 0 is given, no writes are cached. Performance suffers noticeably unless you use large writes (1 MB or more). - /proc/fs/lustre/osc/<object name>/cur_dirty_bytes + osc.osc_instance.cur_dirty_bytes This tunable is a read-only value that returns the current amount of bytes written and cached on this OSC. - /proc/fs/lustre/osc/<object name>/max_pages_per_rpc + osc.osc_instance.max_pages_per_rpc This tunable is the maximum number of pages that will undergo I/O in a single RPC to the OST. The minimum is a single page and the maximum for this setting is platform dependent (256 for i386/x86_64, possibly less for ia64/PPC with larger PAGE_SIZE), though generally amounts to a total of 1 MB in the RPC. - /proc/fs/lustre/osc/<object name>/max_rpcs_in_flight + osc.osc_instance.max_rpcs_in_flight This tunable is the maximum number of concurrent RPCs in flight from an OSC to its OST. If the OSC tries to initiate an RPC but finds that it already has the same number of RPCs outstanding, it will wait to issue further RPCs until some complete. The minimum setting is 1 and maximum setting is 256. If you are looking to improve small file I/O performance, increase the max_rpcs_in_flight value. To maximize performance, the value for max_dirty_mb is recommended to be 4 * max_pages_per_rpc * max_rpcs_in_flight. The - <object name> + osc_instance - varies depending on the specific Lustre configuration. For <object name> examples, refer to the sample command output. + is typically fsname-OSTost_index-osc-mountpoint_instance. The mountpoint_instance is a unique value per mountpoint to allow associating osc, mdc, lov, lmv, and llite parameters for the same mountpoint. For osc_instance examples, refer to the sample command output.
@@ -645,7 +646,7 @@ R 8385 500 600 100 Difference from the previous range end to the current range start. For example, Smallest-Extent indicates that the writes in the range 100 to 1110 were sequential, with a minimum write of 10 and a maximum write of 500. This range was started with an offset of -150. That means this is the difference between the last entry's range-end and this entry's range-start for the same file. The rw_offset_stats file can be cleared by writing to it: - echo > /proc/fs/lustre/llite/lustre-f57dee00/rw_offset_stats + lctl set_param llite.*.rw_offset_stats=0 @@ -656,7 +657,7 @@ R 8385 500 600 100 <indexterm><primary>proc</primary><secondary>client stats</secondary></indexterm>Client stats The stats parameter maintains statistics of activity across the VFS interface of the Lustre file system. Only non-zero parameters are displayed in the file. This section of the manual covers the statistics that will accumulate during typical operation of a client. Client statistics are enabled by default. The statistics can be cleared by echoing an empty string into the stats file or with the command: lctl set_param llite.*.stats=0. Statistics for an individual file system can be displayed, for example: - # cat /proc/fs/lustre/llite/lustre-ffff88000449a800/stats + client# lctl get_param llite.*.stats snapshot_time 1308343279.169704 secs.usecs dirty_pages_hits 14819716 samples [regs] dirty_pages_misses 81473472 samples [regs] @@ -860,7 +861,7 @@ getxattr 61169 samples [regs] Client-Based I/O Extent Size Survey The rw_extent_stats histogram in the llite directory shows you the statistics for the sizes of the read-write I/O extents. This file does not maintain the per-process statistics. Example: - $ cat /proc/fs/lustre/llite/lustre-ee5af200/extents_stats + client# lctl get_param llite.testfs-*.extents_stats snapshot_time: 1213828728.348516 (secs.usecs) read | write extents calls % cum% | calls % cum% @@ -876,11 +877,11 @@ extents calls % cum% | calls % cum% 512K - 1024K : 0 0 0 | 0 0 86 1M - 2M : 0 0 0 | 11 13 100 The file can be cleared by issuing the following command: - $ echo > cat /proc/fs/lustre/llite/lustre-ee5af200/extents_stats + client# lctl set_param llite.testfs-*.extents_stats=0 Per-Process Client I/O Statistics The extents_stats_per_process file maintains the I/O extent size statistics on a per-process basis. So you can track the per-process statistics for the last MAX_PER_PROCESS_HIST processes. Example: - $ cat /proc/fs/lustre/llite/lustre-ee5af200/extents_stats_per_process + lctl get_param llite.testfs-*.extents_stats_per_process snapshot_time: 1213828762.204440 (secs.usecs) read | write extents calls % cum% | calls % cum% @@ -922,7 +923,7 @@ PID: 11429
<indexterm><primary>proc</primary><secondary>block I/O</secondary></indexterm>Watching the OST Block I/O Stream Similarly, there is a brw_stats histogram in the obdfilter directory which shows you the statistics for number of I/O requests sent to the disk, their size and whether they are contiguous on the disk or not. - cat /proc/fs/lustre/obdfilter/lustre-OST0000/brw_stats + oss# lctl get_param obdfilter.testfs-OST0000.brw_stats snapshot_time: 1174875636.764630 (secs:usecs) read write pages per brw brws % cum % | rpcs % cum % @@ -1028,9 +1029,9 @@ disk io size rpcs % cum % | rpcs % cum %
Tuning File Readahead File readahead is triggered when two or more sequential reads by an application fail to be satisfied by the Linux buffer cache. The size of the initial readahead is 1 MB. Additional readaheads grow linearly, and increment until the readahead cache on the client is full at 40 MB. - /proc/fs/lustre/llite/<fsname>-<uid>/max_read_ahead_mb + llite.fsname-instance.max_read_ahead_mb This tunable controls the maximum amount of data readahead on a file. Files are read ahead in RPC-sized chunks (1 MB or the size of read() call, if larger) after the second sequential read on a file descriptor. Random reads are done at the size of the read() call only (no readahead). Reads to non-contiguous regions of the file reset the readahead algorithm, and readahead is not triggered again until there are sequential reads again. To disable readahead, set this tunable to 0. The default value is 40 MB. - /proc/fs/lustre/llite/<fsname>-<uid>/max_read_ahead_whole_mb + llite.fsname-instance.max_read_ahead_whole_mb This tunable controls the maximum size of a file that is read in its entirety, regardless of the size of the read().
@@ -1039,17 +1040,11 @@ disk io size rpcs % cum % | rpcs % cum % /proc/fs/lustre/llite/*/statahead_max This proc interface controls whether directory statahead is enabled and the maximum statahead windows size (which means how many files can be pre-fetched by the statahead thread). By default, statahead is enabled and the value of statahead_max is 32. To disable statahead, run: - echo 0 > /proc/fs/lustre/llite/*/statahead_max - or lctl set_param llite.*.statahead_max=0 To set the maximum statahead windows size (n), run: - echo n > /proc/fs/lustre/llite/*/statahead_max - or lctl set_param llite.*.statahead_max=n The maximum value of n is 8192. The AGL can be controlled as follows: - echo n > /proc/fs/lustre/llite/*/statahead_agl - or lctl set_param llite.*.statahead_agl=n If "n" is 0, then the AGL is disabled, else the AGL is enabled. /proc/fs/lustre/llite/*/statahead_stats @@ -1140,7 +1135,7 @@ obdfilter.{OST_name}.readcache_max_filesize=-1 Asynchronous journal commit cannot work with O_DIRECT writes, a journal flush is still forced. - When asynchronous journal commit is enabled, client nodes keep data in the page cache (a page reference). Lustre clients monitor the last committed transaction number (transno) in messages sent from the OSS to the clients. When a client sees that the last committed transno reported by the OSS is >=bulk write transno, it releases the reference on the corresponding pages. To avoid page references being held for too long on clients after a bulk write, a 7 second ping request is scheduled (jbd commit time is 5 seconds) after the bulk write reply is received, so the OSS has an opportunity to report the last committed transno. + When asynchronous journal commit is enabled, client nodes keep data in the page cache (a page reference). Lustre clients monitor the last committed transaction number (transno) in messages sent from the OSS to the clients. When a client sees that the last committed transno reported by the OSS is at least the bulk write transno, it releases the reference on the corresponding pages. To avoid page references being held for too long on clients after a bulk write, a 7 second ping request is scheduled (jbd commit time is 5 seconds) after the bulk write reply is received, so the OSS has an opportunity to report the last committed transno. If the OSS crashes before the journal commit occurs, then the intermediate data is lost. However, new OSS recovery functionality (introduced in the asynchronous journal commit feature), causes clients to replay their write requests and compensate for the missing disk updates by restoring the state of the file system. To enable asynchronous journal commit, set the sync_journal parameter to zero (sync_journal=0): $ lctl set_param obdfilter.*.sync_journal=0 @@ -1355,77 +1350,7 @@ obdfilter.lol-OST0001.sync_on_lock_cancel=never - - stats - - - - Enables/disables the collection of statistics. Collected statistics can be found - in /proc/fs/ldiskfs2/<dev>/mb_history. - - - - - max_to_scan - - - Maximum number of free chunks that mballoc finds before a final decision to avoid livelock. - - - - - min_to_scan - - - Minimum number of free chunks that mballoc finds before a final decision. This is useful for a very small request, to resist fragmentation of big free chunks. - - - - - order2_req - - - For requests equal to 2^N (where N >= order2_req), a very fast search via buddy structures is used. - - - - - stream_req - - - Requests smaller or equal to this value are packed together to form large write I/Os. - - - - - - The following tunables, providing more control over allocation policy, will be available in the next version: - - - - - - - - Field - - - Description - - - - - - - stats - - - Enables/disables the collection of statistics. Collected statistics can be found in /proc/fs/ldiskfs2/<dev>/mb_history. - - - - - max_to_scan + mb_max_to_scan Maximum number of free chunks that mballoc finds before a final decision to avoid livelock. @@ -1433,15 +1358,15 @@ obdfilter.lol-OST0001.sync_on_lock_cancel=never - min_to_scan + mb_min_to_scan - Minimum number of free chunks that mballoc finds before a final decision. This is useful for a very small request, to resist fragmentation of big free chunks. + Minimum number of free chunks that mballoc searches before picking the best chunk for allocation. This is useful for a very small request, to resist fragmentation of big free chunks. - order2_req + mb_order2_req For requests equal to 2^N (where N >= order2_req), a very fast search via buddy structures is used. @@ -1449,7 +1374,7 @@ obdfilter.lol-OST0001.sync_on_lock_cancel=never - small_req + mb_small_req All requests are divided into 3 categories: @@ -1461,12 +1386,12 @@ obdfilter.lol-OST0001.sync_on_lock_cancel=never - large_req + mb_large_req - prealloc_table + mb_prealloc_table The amount of space to preallocate depends on the current file size. The idea is that for small files we do not need 1 MB preallocations and for large files, 1 MB preallocations are not large enough; it is better to preallocate 4 MB. @@ -1474,10 +1399,10 @@ obdfilter.lol-OST0001.sync_on_lock_cancel=never - group_prealloc + mb_group_prealloc - The amount of space preallocated for small requests to be grouped. + The amount of space (in kilobytes) preallocated for groups of small requests. @@ -1486,19 +1411,19 @@ obdfilter.lol-OST0001.sync_on_lock_cancel=never
<indexterm><primary>proc</primary><secondary>locking</secondary></indexterm>Locking - /proc/fs/lustre/ldlm/ldlm/namespaces/<OSC name|MDC name>/lru_size + ldlm.namespaces.osc_name|mdc_name.lru_size The lru_size parameter is used to control the number of client-side locks in an LRU queue. LRU size is dynamic, based on load. This optimizes the number of locks available to nodes that have different workloads (e.g., login/build nodes vs. compute nodes vs. backup nodes). - The total number of locks available is a function of the server's RAM. The default limit is 50 locks/1 MB of RAM. If there is too much memory pressure, then the LRU size is shrunk. The number of locks on the server is limited to {number of OST/MDT on node} * {number of clients} * {client lru_size}. + The total number of locks available is a function of the server's RAM. The default limit is 50 locks/1 MB of RAM. If there is too much memory pressure, then the LRU size is shrunk. The number of locks on the server is limited to targets_on_server * client_count * client_lru_size. To enable automatic LRU sizing, set the lru_size parameter to 0. In this case, the lru_size parameter shows the current number of locks being used on the export. LRU sizing is enabled by default starting with Lustre 1.6.5.1. - To specify a maximum number of locks, set the lru_size parameter to a value > 0 (former numbers are okay, 100 * CPU_NR). We recommend that you only increase the LRU size on a few login nodes where users access the file system interactively. + To specify a maximum number of locks, set the lru_size parameter to a value other than 0 (former numbers are okay, 100 * core_count). We recommend that you only increase the LRU size on a few login nodes where users access the file system interactively. To clear the LRU on a single client, and as a result flush client cache, without changing the lru_size value: - $ lctl set_param ldlm.namespaces.<osc_name|mdc_name>.lru_size=clear + $ lctl set_param ldlm.namespaces.osc_name|mdc_name.lru_size=clear If you shrink the LRU size below the number of existing unused locks, then the unused locks are canceled immediately. Use echo clear to cancel all locks without changing the value. Currently, the lru_size parameter can only be set temporarily with lctl set_param; it cannot be set permanently. @@ -1678,13 +1603,16 @@ lnet.debug = -ha # sysctl lnet.debug lnet.debug = neterror warning You can verify and change the debug level using the /proc interface in Lustre. To use the flags with /proc, run: - # cat /proc/sys/lnet/debug + # lctl get_param debug +debug= neterror warning -# echo "+ha" > /proc/sys/lnet/debug -# cat /proc/sys/lnet/debug +# lctl set_param debug=+ha +# lctl get_param debug +debug= neterror warning ha -# echo "-warning" > /proc/sys/lnet/debug -# cat /proc/sys/lnet/debug +# lctl set_param debug=-warning +# lctl get_param debug +debug= neterror ha /proc/sys/lnet/subsystem_debug This controls the debug logs for subsystems (see S_* definitions). diff --git a/LustreRecovery.xml b/LustreRecovery.xml index 6ff87c5..f1f9c23 100644 --- a/LustreRecovery.xml +++ b/LustreRecovery.xml @@ -94,7 +94,7 @@ To force an OST recovery, unmount the OST and then mount it again. If the OST was connected to clients before it failed, then a recovery process starts after the remount, enabling clients to reconnect to the OST and replay transactions in their queue. When the OST is in recovery mode, all new client connections are refused until the recovery finishes. The recovery is complete when either all previously-connected clients reconnect and their transactions are replayed or a client connection attempt times out. If a connection attempt times out, then all clients waiting to reconnect (and their transactions) are lost. If you know an OST will not recover a previously-connected client (if, for example, the client has crashed), you can manually abort the recovery using this command: - lctl --device <OST device number> abort_recovery + oss# lctl --device lustre_device_number abort_recovery To determine an OST's device number and device name, run the lctl dl command. Sample lctl dl command output is shown below: 7 UP obdfilter ddn_data-OST0009 ddn_data-OST0009_UUID 1159 In this example, 7 is the OST device number. The device name is ddn_data-OST0009. In most instances, the device name can be used in place of the device number. diff --git a/LustreTroubleshooting.xml b/LustreTroubleshooting.xml index b14f0aa..38b5f27 100644 --- a/LustreTroubleshooting.xml +++ b/LustreTroubleshooting.xml @@ -194,7 +194,7 @@ Lustre logs are dumped to /proc/sys/lnet/debug_path. Collect the first group of messages related to a problem, and any messages that precede "LBUG" or "assertion failure" errors. Messages that mention server nodes (OST or MDS) are specific to that server; you must collect similar messages from the relevant server console logs. Another Lustre debug log holds information for Lustre action for a short period of time which, in turn, depends on the processes on the node to use Lustre. Use the following command to extract debug logs on each of the nodes, run - $ lctl dk <filename> + $ lctl dk filename LBUG freezes the thread to allow capture of the panic stack. A system reboot is needed to clear the thread. @@ -207,7 +207,7 @@ You can also post a question to the lustre-discuss mailing list or search the lustre-discuss Archives for information about your issue. A Lustre diagnostics tool is available for downloading at: http://downloads.whamcloud.com/public/tools/ You can run this tool to capture diagnostics output to include in the reported bug. To run this tool, enter one of these commands: - # lustre-diagnostics -t <bugzilla bug #> + # lustre-diagnostics -t bug_number # lustre-diagnostics. Output is sent directly to the terminal. Use normal file redirection to send the output to a file, and then manually attach the file to the bug you are submitting. @@ -259,12 +259,12 @@ Deactivate the OST (on the OSS at the MDS). Run: - $ lctl --device <OST device name or number> deactivate + $ lctl --device lustre_device_number deactivate The OST device number or device name is generated by the lctl dl command. The deactivate command prevents clients from creating new objects on the specified OST, although you can still access the OST for reading. If the OST later becomes available it needs to be reactivated, run: - # lctl --device <OST device name or number> activate + # lctl --device lustre_device_number activate @@ -319,7 +319,7 @@ obdid 3438673 last_id 3478673" The file system must be stopped on all servers before performing this procedure. - For hex < -> decimal translations: + For hex-to-decimal translations: Use GDB: (gdb) p /x 15028 $2 = 0x3ab4 @@ -328,7 +328,7 @@ $2 = 0x3ab4 Determine a reasonable value for the LAST_ID file. Check on the MDS: - # mount -t ldiskfs /dev/<mdsdev> /mnt/mds + # mount -t ldiskfs /dev/mdt_device /mnt/mds # od -Ax -td8 /mnt/mds/lov_objid There is one entry for each OST, in OST index order. This is what the MDS thinks is the last in-use object. @@ -538,7 +538,7 @@ ptlrpc_main+0x42e/0x7c0 [ptlrpc]
Setting SCSI I/O Sizes Some SCSI drivers default to a maximum I/O size that is too small for good Lustre performance. we have fixed quite a few drivers, but you may still find that some drivers give unsatisfactory performance with Lustre. As the default value is hard-coded, you need to recompile the drivers to change their default. On the other hand, some drivers may have a wrong default set. - If you suspect bad I/O performance and an analysis of Lustre statistics indicates that I/O is not 1 MB, check /sys/block/<device>/queue/max_sectors_kb. If the max_sectors_kb value is less than 1024, set it to at least 1024 to improve performance. If changing max_sectors_kb does not change the I/O size as reported by Lustre, you may want to examine the SCSI driver code. + If you suspect bad I/O performance and an analysis of Lustre statistics indicates that I/O is not 1 MB, check /sys/block/device/queue/max_sectors_kb. If the max_sectors_kb value is less than 1024, set it to at least 1024 to improve performance. If changing max_sectors_kb does not change the I/O size as reported by Lustre, you may want to examine the SCSI driver code.
diff --git a/LustreTuning.xml b/LustreTuning.xml index 37fc0f6..d9eb216 100644 --- a/LustreTuning.xml +++ b/LustreTuning.xml @@ -164,7 +164,7 @@ Portal round-robin defines the policy LNet applies to deliver events and messages to the upper layers. The upper layers are ptlrpc service or LNet selftest. If portal round-robin is disabled, LNet will deliver messages to CPTs based on a hash of the source NID. Hence, all messages from a specific peer will be handled by the same CPT. This can reduce data traffic between CPUs. However, for some workloads, this behavior may result in poorly balancing loads across the CPU. If portal round-robin is enabled, LNet will round-robin incoming events across all CPTs. This may balance load better across the CPU but can incur a cross CPU overhead. - The current policy can be changed by an administrator with echo <VALUE> > /proc/sys/lnet/portal_rotor. There are four options for <VALUE>: + The current policy can be changed by an administrator with echo value > /proc/sys/lnet/portal_rotor. There are four options for value: OFF diff --git a/ManagingFailover.xml b/ManagingFailover.xml index 5f9a18e..4285f19 100644 --- a/ManagingFailover.xml +++ b/ManagingFailover.xml @@ -28,22 +28,22 @@ With MMP enabled, mounting a clean file system takes at least 10 seconds. If the file system was not cleanly unmounted, then the file system mount may require additional time. - The MMP feature is only supported on Linux kernel versions >= 2.6.9. + The MMP feature is only supported on Linux kernel versions newer than 2.6.9.
<indexterm><primary>failover</primary><secondary>multiple-mount protection</secondary></indexterm>Working with Multiple-Mount Protection On a new Lustre file system, MMP is automatically enabled by mkfs.lustre at format time if failover is being used and the kernel and e2fsprogs version support it. On an existing file system, a Lustre administrator can manually enable MMP when the file system is unmounted. Use the following commands to determine whether MMP is running in Lustre and to enable or disable the MMP feature. To determine if MMP is enabled, run: - dumpe2fs -h <device>|grep mmp + dumpe2fs -h /dev/block_device | grep mmp Here is a sample command: dumpe2fs -h /dev/sdc | grep mmp Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent mmp sparse_super large_file uninit_bg To manually disable MMP, run: - tune2fs -O ^mmp <device> + tune2fs -O ^mmp /dev/block_device To manually enable MMP, run: - tune2fs -O mmp <device> + tune2fs -O mmp /dev/block_device When MMP is enabled, if ldiskfs detects multiple mount attempts after the file system is mounted, it blocks these later mount attempts and reports the time when the MMP block was last updated, the node name, and the device name of the node where the file system is currently mounted.
diff --git a/ManagingFileSystemIO.xml b/ManagingFileSystemIO.xml index 7360e6f..0823f46 100644 --- a/ManagingFileSystemIO.xml +++ b/ManagingFileSystemIO.xml @@ -28,7 +28,7 @@
<indexterm><primary>I/O</primary><secondary>OST space usage</secondary></indexterm>Checking OST Space Usage The example below shows an unbalanced file system: - root@LustreClient01 ~]# lfs df -h + client# lfs df -h UUID bytes Used Available \ Use% Mounted on lustre-MDT0000_UUID 4.4G 214.5M 3.9G \ @@ -49,10 +49,9 @@ lustre-OST0005_UUID 2.0G 743.3M 1.1G \ filesystem summary: 11.8G 5.4G 5.8G \ 45% /mnt/lustre In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows: - [root@LustreClient01 ~]# lfs setstripe /mnt/lustre 4M 0 -1 -[root@LustreClient01 ~]# dd if=/dev/zero of=/mnt/lustre/test_3 \ bs=10M cou\ -nt=100 -dd: writing `/mnt/lustre/test_3': No space left on device + client# lfs setstripe /mnt/lustre 4M 0 -1 +client# dd if=/dev/zero of=/mnt/lustre/test_3 bs=10M count=100 +dd: writing '/mnt/lustre/test_3': No space left on device 98+0 records in 97+0 records out 1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s @@ -63,13 +62,13 @@ dd: writing `/mnt/lustre/test_3': No space left on device Log into the MDS server: - [root@LustreClient01 ~]# ssh root@192.168.0.10 + client# ssh root@192.168.0.10 root@192.168.0.10's password: Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6 Use the lctl dl command to show the status of all file system components: - [root@mds ~]# lctl dl + mds# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 @@ -84,11 +83,11 @@ Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6 Use lctl deactivate to take the full OST offline: - [root@mds ~]# lctl --device 7 deactivate + mds# lctl --device 7 deactivate Display the status of the file system components: - [root@mds ~]# lctl dl + mds# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 @@ -114,24 +113,24 @@ Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6 Identify the file(s) to be moved. In the example below, output from the getstripe command indicates that the file test_2 is located entirely on OST2: - [client]# lfs getstripe /mnt/lustre/test_2 + client# lfs getstripe /mnt/lustre/test_2 /mnt/lustre/test_2 obdidx objid objid group 2 8 0x8 0 To move single object(s), create a new copy and remove the original. Enter: - [client]# cp -a /mnt/lustre/test_2 /mnt/lustre/test_2.tmp -[client]# mv /mnt/lustre/test_2.tmp /mnt/lustre/test_2 + client# cp -a /mnt/lustre/test_2 /mnt/lustre/test_2.tmp +client# mv /mnt/lustre/test_2.tmp /mnt/lustre/test_2 To migrate large files from one or more OSTs, enter: - [client]# lfs find --ost {OST_UUID} -size +1G | lfs_migrate -y + client# lfs find --ost ost_name -size +1G | lfs_migrate -y Check the file system balance. The df output in the example below shows a more balanced system compared to the df output in the example in . - [client]# lfs df -h + client# lfs df -h UUID bytes Used Available Use% \ Mounted on lustre-MDT0000_UUID 4.4G 214.5M 3.9G 4% \ @@ -226,47 +225,47 @@ filesystem summary: 11.8G 7.3G 3.9G 61% \ Running the writeconf command on the MDS erases all pools information (as well as any other parameters set using lctl conf_param). We recommend that the pools definitions (and conf_param settings) be executed using a script, so they can be reproduced easily after a writeconf is performed. To create a new pool, run: - lctl pool_new <fsname>.<poolname> + mgs# lctl pool_new fsname.poolname The pool name is an ASCII string up to 16 characters. To add the named OST to a pool, run: - lctl pool_add <fsname>.<poolname> <ost_list> + mgs# lctl pool_add fsname.poolname ost_list Where: - <ost_list> is <fsname->OST<index_range>[_UUID] + ost_list is fsname-OSTindex_range - <index_range> is <ost_index_start>-<ost_index_end>[,<index_range>] or <ost_index_start>-<ost_index_end>/<step> + index_range is ost_index_start-ost_index_end[,index_range] or ost_index_start-ost_index_end/step - If the leading <fsname> and/or ending _UUID are missing, they are automatically added. + If the leading fsname and/or ending _UUID are missing, they are automatically added. For example, to add even-numbered OSTs to pool1 on file system lustre, run a single command (add) to add many OSTs to the pool at one time: lctl pool_add lustre.pool1 OST[0-10/2] Each time an OST is added to a pool, a new llog configuration record is created. For convenience, you can run a single command. To remove a named OST from a pool, run: - lctl pool_remove <fsname>.<poolname> <ost_list> + mgs# lctl pool_remove fsname.poolname ost_list To destroy a pool, run: - lctl pool_destroy <fsname>.<poolname> + mgs# lctl pool_destroy fsname.poolname All OSTs must be removed from a pool before it can be destroyed. To list pools in the named file system, run: - lctl pool_list <fsname> | <pathname> + mgs# lctl pool_list fsname|pathname To list OSTs in a named pool, run: - lctl pool_list <fsname>.<poolname> + lctl pool_list fsname.poolname
Using the lfs Command with OST Pools Several lfs commands can be run with OST pools. Use the lfs setstripe command to associate a directory with an OST pool. This causes all new regular files and directories in the directory to be created in the pool. The lfs command can be used to list pools in a file system and OSTs in a named pool. To associate a directory with a pool, so all new files and directories will be created in the pool, run: - lfs setstripe --pool|-p pool_name <filename|dirname> + client# lfs setstripe --pool|-p pool_name filename|dirname To set striping patterns, run: - lfs setstripe [--size|-s stripe_size] [--offset|-o start_ost] + client# lfs setstripe [--size|-s stripe_size] [--offset|-o start_ost] [--count|-c stripe_count] [--pool|-p pool_name] - <dir|filename> + dir|filename If you specify striping with an invalid pool name, because the pool does not exist or the pool name was mistyped, lfs setstripe returns an error. Run lfs pool_list to make sure the pool exists and the pool name is entered correctly. @@ -297,9 +296,9 @@ filesystem summary: 11.8G 7.3G 3.9G 61% \ Add a new OST by passing on the following commands, run: - $ mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda -$ mkdir -p /mnt/test/ost12 -$ mount -t lustre /dev/sda /mnt/test/ost12 + oss# mkfs.lustre --fsname=spfs --mgsnode=mds16@tcp0 --ost --index=12 /dev/sda +oss# mkdir -p /mnt/test/ost12 +oss# mount -t lustre /dev/sda /mnt/test/ost12 Migrate the data (possibly). @@ -334,7 +333,7 @@ $ mount -t lustre /dev/sda /mnt/test/ost12
Making File System Objects Immutable An immutable file or directory is one that cannot be modified, renamed or removed. To do this: - chattr +i <file> + chattr +i file To remove this flag, use chattr -i
@@ -361,7 +360,7 @@ from 192.168.1.1@tcp inum 8991479/2386814769 object 1127239/0 extent [10240\ To check which checksum algorithm is being used by Lustre, run: $ lctl get_param osc.*.checksum_type To change the wire checksum algorithm used by Lustre, run: - $ lctl set_param osc.*.checksum_type=<algorithm name> + $ lctl set_param osc.*.checksum_type=algorithm The in-memory checksum always uses the adler32 algorithm, if available, and only falls back to crc32 if adler32 cannot be used. diff --git a/ManagingLNET.xml b/ManagingLNET.xml index 74e5eaf..a60e2b8 100644 --- a/ManagingLNET.xml +++ b/ManagingLNET.xml @@ -54,8 +54,8 @@ $ lctl network up
This command tells you the network(s) configured to work with Lustre If the networks are not correctly setup, see the modules.conf "networks=" line and make sure the network layer modules are correctly installed and configured. To get the best remote NID, run: - $ lctl which_nid <NID list> - where <NID list> is the list of available NIDs. + $ lctl which_nid NIDs + where NIDs is the list of available NIDs. This command takes the "best" NID from a list of the NIDs of a remote host. The "best" NID is the one that the local node uses when trying to communicate with the remote node.
Starting Clients @@ -73,7 +73,7 @@ $ lctl network up Attempting to remove Lustre modules prior to stopping the network may result in a crash or an LNET hang. if this occurs, the node must be rebooted (in most cases). Make sure that the Lustre network and Lustre are stopped prior to unloading the modules. Be extremely careful using rmmod -f. To unconfigure the LNET network, run: - modprobe -r <any lnd and the lnet modules> + modprobe -r lnd_and_lnet_modules To remove all Lustre modules, run: @@ -138,32 +138,27 @@ To remove all Lustre modules, run: Run the modprobe lnet command and create a combined MGS/MDT file system. - The following commands create the MGS/MDT file system and mount the servers (MGS/MDT and OSS). + The following commands create an MGS/MDT or OST file system and mount the targets on the servers. modprobe lnet -$ mkfs.lustre --fsname lustre --mgs --mdt <block device name> -$ mkdir -p <mount point> -$ mount -t lustre <block device> <mount point> -$ mount -t lustre <block device> <mount point> -$ mkfs.lustre --fsname lustre --mgs --mdt <block device name> -$ mkdir -p <mount point> -$ mount -t lustre <block device> <mount point> -$ mount -t lustre <block device> <mount point> +# mkfs.lustre --fsname lustre --mgs --mdt /dev/mdt_device +# mkdir -p /mount_point +# mount -t lustre /dev/mdt_device /mount_point For example: modprobe lnet -$ mkfs.lustre --fsname lustre --mdt --mgs /dev/sda -$ mkdir -p /mnt/test/mdt -$ mount -t lustre /dev/sda /mnt/test/mdt -$ mount -t lustre mgs@o2ib0:/lustre /mnt/mdt -$ mkfs.lustre --fsname lustre --mgsnode=mds@o2ib0 --ost --index=0 /dev/sda -$ mkdir -p /mnt/test/mdt -$ mount -t lustre /dev/sda /mnt/test/ost -$ mount -t lustre mgs@o2ib0:/lustre /mnt/ost0 +mds# mkfs.lustre --fsname lustre --mdt --mgs /dev/sda +mds# mkdir -p /mnt/test/mdt +mds# mount -t lustre /dev/sda /mnt/test/mdt +mds# mount -t lustre mgs@o2ib0:/lustre /mnt/mdt +oss# mkfs.lustre --fsname lustre --mgsnode=mds@o2ib0 --ost --index=0 /dev/sda +oss# mkdir -p /mnt/test/mdt +oss# mount -t lustre /dev/sda /mnt/test/ost +oss# mount -t lustre mgs@o2ib0:/lustre /mnt/ost0 Mount the clients. - mount -t lustre <MGS node>:/<fsname> <mount point> + client# mount -t lustre mgs_node:/fsname /mount_point This example shows an IB client being mounted. - mount -t lustre + client# mount -t lustre 192.168.10.101@o2ib0,192.168.10.102@o2ib1:/mds/client /mnt/lustre @@ -174,7 +169,7 @@ Clients 192.168.[2-127].* 192.168.[128-253].* You could create these configurations: - A cluster with more clients than servers. The fact that an individual client cannot get two rails of bandwidth is unimportant because the servers are the actual bottleneck. + A cluster with more clients than servers. The fact that an individual client cannot get two rails of bandwidth is unimportant because the servers are typically the actual bottleneck. ip2nets="o2ib0(ib0), o2ib1(ib1) 192.168.[0-1].* \ diff --git a/ManagingSecurity.xml b/ManagingSecurity.xml index 7b327cb..cfeb39a 100644 --- a/ManagingSecurity.xml +++ b/ManagingSecurity.xml @@ -79,7 +79,7 @@ other::---
<indexterm><primary>root squash</primary></indexterm>Using Root Squash - Lustre 1.6 introduced root squash functionality, a security feature which controls super user access rights to an Lustre file system. Before the root squash feature was added, Lustre users could run rm -rf * as root, and remove data which should not be deleted. Using the root squash feature prevents this outcome. + Root squash is a security feature which restricts super-user access rights to a Lustre file system. Without the root squash feature enabled, Lustre users on untrusted clients could access or modify files owned by root on the filesystem, including deleting them. Using the root squash feature restricts file access/modifications as the root user to only the specified clients. Note, however, that this does not prevent users on insecure clients from accessing files owned by other users. The root squash feature works by re-mapping the user ID (UID) and group ID (GID) of the root user to a UID and GID specified by the system administrator, via the Lustre configuration management server (MGS). The root squash feature also enables the Lustre administrator to specify a set of client for which UID/GID re-mapping does not apply.
<indexterm><primary>root squash</primary><secondary>configuring</secondary></indexterm>Configuring Root Squash @@ -99,7 +99,7 @@ other::--- <indexterm><primary>root squash</primary><secondary>enabling</secondary></indexterm>Enabling and Tuning Root Squash The default value for nosquash_nids is NULL, which means that root squashing applies to all clients. Setting the root squash UID and GID to 0 turns root squash off. Root squash parameters can be set when the MDT is created (mkfs.lustre --mdt). For example: - mkfs.lustre --reformat --fsname=Lustre --mdt --mgs \ + mds# mkfs.lustre --reformat --fsname=testfs --mdt --mgs \ --param "mdt.root_squash=500:501" \ --param "mdt.nosquash_nids='0@elan1 192.168.1.[10,11]'" /dev/sda1 Root squash parameters can also be changed on an unmounted device with tunefs.lustre. For example: @@ -107,8 +107,8 @@ other::--- --param "mdt.nosquash_nids=192.168.0.13@tcp0" /dev/sda1 Root squash parameters can also be changed with the lctl conf_param command. For example: - lctl conf_param Lustre.mdt.root_squash="1000:101" -lctl conf_param Lustre.mdt.nosquash_nids="*@tcp" + mgs# lctl conf_param testfs.mdt.root_squash="1000:101" +mgs# lctl conf_param testfs.mdt.nosquash_nids="*@tcp" When using the lctl conf_param command, keep in mind: @@ -124,17 +124,17 @@ lctl conf_param Lustre.mdt.nosquash_nids="*@tcp" The nosquash_nids list can be cleared with: - lctl conf_param Lustre.mdt.nosquash_nids="NONE" + mgs# lctl conf_param testfs.mdt.nosquash_nids="NONE" - OR - - lctl conf_param Lustre.mdt.nosquash_nids="clear" + mgs# lctl conf_param testfs.mdt.nosquash_nids="clear" If the nosquash_nids value consists of several NID ranges (e.g. 0@elan, 1@elan1), the list of NID ranges must be quoted with single (') or double ('') quotation marks. List elements must be separated with a space. For example: - mkfs.lustre ... --param "mdt.nosquash_nids='0@elan1 1@elan2'" /dev/sda1 -lctl conf_param Lustre.mdt.nosquash_nids="24@elan 15@elan1" + mds# mkfs.lustre ... --param "mdt.nosquash_nids='0@elan1 1@elan2'" /dev/sda1 +lctl conf_param testfs.mdt.nosquash_nids="24@elan 15@elan1" These are examples of incorrect syntax: - mkfs.lustre ... --param "mdt.nosquash_nids=0@elan1 1@elan2" /dev/sda1 -lctl conf_param Lustre.mdt.nosquash_nids=24@elan 15@elan1 + mds# mkfs.lustre ... --param "mdt.nosquash_nids=0@elan1 1@elan2" /dev/sda1 +lctl conf_param testfs.mdt.nosquash_nids=24@elan 15@elan1 To check root squash parameters, use the lctl get_param command: - lctl get_param mdt.Lustre-MDT0000.root_squash + mds# lctl get_param mdt.testfs-MDT0000.root_squash lctl get_param mdt.*.nosquash_nids An empty nosquash_nids list is reported as NONE. @@ -151,7 +151,7 @@ lctl get_param mdt.*.nosquash_nids mkfs.lustre and tunefs.lustre do not perform parameter syntax checking. If the root squash parameters are incorrect, they are ignored on mount and the default values are used instead. - Root squash parameters are parsed with rigorous syntax checking. The root_squash parameter should be specified as <decnum>':'<decnum>. The nosquash_nids parameter should follow LNET NID range list syntax. + Root squash parameters are parsed with rigorous syntax checking. The root_squash parameter should be specified as <decnum>:<decnum>. The nosquash_nids parameter should follow LNET NID range list syntax. LNET NID range syntax: diff --git a/ManagingStripingFreeSpace.xml b/ManagingStripingFreeSpace.xml index 0fbe78d..7983091 100644 --- a/ManagingStripingFreeSpace.xml +++ b/ManagingStripingFreeSpace.xml @@ -90,7 +90,7 @@ Setting the File Layout/Striping Configuration (lfs setstripe) Use the lfs setstripe command to create new files with a specific file layout (stripe pattern) configuration. lfs setstripe [--size|-s stripe_size] [--count|-c stripe_count] -[--index|-i start_ost] [--pool|-p pool_name] <filename|dirname> +[--index|-i start_ost] [--pool|-p pool_name] filename|dirname stripe_size @@ -108,7 +108,7 @@ The start OST is the first OST to which files are written. The default value for start_ost is -1, which allows the MDS to choose the starting index. This setting is strongly recommended, as it allows space and load balancing to be done by the MDS as needed. Otherwise, the file starts on the specified OST index. The numbering of the OSTs starts at 0. If you pass a start_ost value of 0 and a stripe_count value of 1, all files are written to OST 0, until space is exhausted. This is probably not what you meant to do. If you only want to adjust the stripe count and keep the other parameters at their default settings, do not specify any of the other parameters: - lfs setstripe -c <stripe_count> <file> + client# lfs setstripe -c stripe_count filename pool_name @@ -194,8 +194,8 @@ bob
Inspecting the File Tree To inspect an entire tree of files, use the lfs find command: - lfs find [--recursive | -r] <file or directory> ... - You can also use ls -l /proc/<pid>/fd/ to find open files using Lustre. For example: + lfs find [--recursive | -r] file|directory ... + You can also use ls -l /proc/pid/fd/ to find open files using Lustre. For example: $ lfs getstripe $(readlink /proc/$(pidof cat)/fd/1) Typical output is: /mnt/lustre/foo @@ -363,7 +363,7 @@ filesystem summary: 2211572 41924 \
<indexterm><primary>space</primary><secondary>location weighting</secondary></indexterm>>Adjusting the Weighting Between Free Space and Location The weighting priority can be adjusted in the /proc file /proc/fs/lustre/lov/lustre-mdtlov/qos_prio_free proc. The default value is 90%. Use this command on the MGS to permanently change this weighting: - lctl conf_param <fsname>-MDT0000.lov.qos_prio_free=90 + lctl conf_param fsname-MDT0000.lov.qos_prio_free=90 Increasing this value puts more weighting on free space. When the free space priority is set to 100%, then location is no longer used in stripe-ordering calculations and weighting is based entirely on free space. Setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via a weighting. For example, if OST2 has twice as much free space as OST1, then OST2 is twice as likely to be used, but it is not guaranteed to be used. diff --git a/SettingUpLustreSystem.xml b/SettingUpLustreSystem.xml index e621ef6..242aeff 100644 --- a/SettingUpLustreSystem.xml +++ b/SettingUpLustreSystem.xml @@ -133,24 +133,24 @@
<indexterm><primary>inodes</primary><secondary>MDS</secondary></indexterm><indexterm><primary>setup</primary><secondary>inodes</secondary></indexterm>Setting the Number of Inodes for the MDS The number of inodes on the MDT is determined at format time based on the total size of the file system to be created. The default MDT inode ratio is one inode for every 4096 bytes of file system space. To override the inode ratio, use the following option: - -i <bytes per inode> + -i bytes_per_inode For example, use the following option to create one inode per 2048 bytes of file system space. --mkfsoptions="-i 2048" To avoid mke2fs creating an unusable file system, do not specify the -i option with an inode ratio below one inode per 1024 bytes. Instead, specify an absolute number of inodes, using this option: - -N <number of inodes> + -N number_of_inodes For example, by default, a 2 TB MDT will have 512M inodes. The largest currently-supported file system size is 16 TB, which would hold 4B inodes, the maximum possible number of inodes in a ldiskfs file system. With an MDS inode ratio of 1024 bytes per inode, a 2 TB MDT would hold 2B inodes, and a 4 TB MDT would hold 4B inodes.
<indexterm><primary>inodes</primary><secondary>MDT</secondary></indexterm>Setting the Inode Size for the MDT Lustre uses "large" inodes on backing file systems to efficiently store Lustre metadata with each file. On the MDT, each inode is at least 512 bytes in size (by default), while on the OST each inode is 256 bytes in size. The backing ldiskfs file system also needs sufficient space for other metadata like the journal (up to 400 MB), bitmaps and directories and a few files that Lustre uses to maintain cluster consistency. - To specify a larger inode size, use the -I <inodesize> option. We recommend you do NOT specify a smaller-than-default inode size, as this can lead to serious performance problems; and you cannot change this parameter after formatting the file system. The inode ratio must always be larger than the inode size. + To specify a larger inode size, use the -I inode_size option. We recommend you do NOT specify a smaller-than-default inode size, as this can lead to serious performance problems; and you cannot change this parameter after formatting the file system. The inode ratio must always be larger than the inode size.
<indexterm><primary>inodes</primary><secondary>OST</secondary></indexterm>Setting the Number of Inodes for an OST When formatting OST file systems, it is normally advantageous to take local file system usage into account. Try to minimize the number of inodes on each OST, while keeping enough margin for potential variance in future usage. This helps reduce the format and file system check time, and makes more space available for data. The current default is to create one inode per 16 KB of space in the OST file system, but in many environments, this is far too many inodes for the average file size. As a good rule of thumb, the OSTs should have at least: - num_ost_inodes = 4 * <num_mds_inodes> * <default_stripe_count> / <number_osts> + num_ost_inodes = 4 * number_of_mds_inodes * default_stripe_count / number_of_osts Inode Ratio to be considered @@ -173,7 +173,7 @@ - < 10GB + over 10GB 1 inode/16KB @@ -206,7 +206,7 @@ - > 8TB + over 8TB 1 inode/1MB @@ -220,9 +220,9 @@
  You can specify the number of inodes on the OST file systems using the following option to the --mkfs option: - -N <num_inodes> + -N num_inodes Alternately, if you know the average file size, then you can specify the OST inode count for the OST file systems using: - -i <average_file_size / (number_of_stripes * 4)> + -i average_file_size / (number_of_stripes * 4) For example, if the average file size is 16 MB and there are, by default 4 stripes per file, then --mkfsoptions='-i 1048576' would be appropriate. In addition to the number of inodes, file system check time on OSTs is affected by a number of other variables: size of the file system, number of allocated blocks, distribution of allocated blocks on the disk, disk speed, CPU speed, and amount of RAM on the server. Reasonable file system check times (without serious file system problems), are expected to take five and thirty minutes per TB. diff --git a/SystemConfigurationUtilities.xml b/SystemConfigurationUtilities.xml index 1aef062..2caeaa2 100644 --- a/SystemConfigurationUtilities.xml +++ b/SystemConfigurationUtilities.xml @@ -94,7 +94,7 @@ - -b inode buffer blocks + -b inode buffer blocks Sets the readahead inode blocks to get excellent performance when scanning the block device. @@ -103,7 +103,7 @@ - -o output file + -o output file If an output file is specified, modified pathnames are written to this file. Otherwise, modified parameters are written to stdout. @@ -111,7 +111,7 @@ - -t inode | pathname + -t inode| pathname Sets the e2scan type if type is inode. The e2scan utility prints modified inode numbers to stdout. By default, the type is set as pathname. @@ -120,7 +120,7 @@ - -u + -u Rebuilds the parent database from scratch. Otherwise, the current parent database is used. @@ -163,7 +163,7 @@ l_getidentity - mdtname + mdtname Metadata server target name @@ -171,7 +171,7 @@ l_getidentity - uid + uid User identifier @@ -193,8 +193,7 @@ lctl The lctl utility is used for root control and configuration. With lctl you can directly control Lustre via an ioctl interface, allowing various configuration, maintenance and debugging features to be accessed.
Synopsis - lctl -lctl --device <devno> <command [args]> + lctl [--device devno] command [args]
Description @@ -202,37 +201,37 @@ lctl --device <devno> <command [args]> dl dk device -network <up/down> +network up|down list_nids -ping nidhelp +ping nidhelp quit - For a complete list of available commands, type help at the lctl prompt. To get basic help on command meaning and syntax, type helpcommand. Command completion is activated with the TAB key, and command history is available via the up- and down-arrow keys. + For a complete list of available commands, type help at the lctl prompt. To get basic help on command meaning and syntax, type help command. Command completion is activated with the TAB key (depending on compile options), and command history is available via the up- and down-arrow keys. For non-interactive use, use the second invocation, which runs the command after connecting to the device.
Setting Parameters with lctl Lustre parameters are not always accessible using the procfs interface, as it is platform-specific. As a solution, lctl {get,set}_param has been introduced as a platform-independent interface to the Lustre tunables. Avoid direct references to /proc/{fs,sys}/{lustre,lnet}. For future portability, use lctl {get,set}_param . - When the file system is running, use the lctl set_param command to set temporary parameters (mapping to items in /proc/{fs,sys}/{lnet,lustre}). The lctl set_param command uses this syntax: - lctl set_param [-n] <obdtype>.<obdname>.<proc_file_name>=<value> + When the file system is running, use the lctl set_param command on the affected node(s) to temporarily set parameters (mapping to items in /proc/{fs,sys}/{lnet,lustre}). The lctl set_param command uses this syntax: + lctl set_param [-n] obdtype.obdname.property=value For example: - $ lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100)) - Many permanent parameters can be set with lctl conf_param. In general, lctl conf_param can be used to specify any parameter settable in a /proc/fs/lustre file, with its own OBD device. The lctl conf_param command uses this syntax: - <obd|fsname>.<obdtype>.<proc_file_name>=<value>) + mds# lctl set_param mdt.testfs-MDT0000.identity_upcall=NONE + Many permanent parameters can be set with lctl conf_param. In general, lctl conf_param can be used to specify any OBD device parameter settable in a /proc/fs/lustre file. The lctl conf_param command must be run on the MGS node, and uses this syntax: + obd|fsname.obdtype.property=value) For example: - $ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE + mgs# lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - The lctl conf_param command permanently sets parameters in the file system configuration. + The lctl conf_param command permanently sets parameters in the file system configuration for all nodes of the specified type. - To get current Lustre parameter settings, use the lctl get_param command with this syntax: - lctl get_param [-n] <obdtype>.<obdname>.<proc_file_name> + To get current Lustre parameter settings, use the lctl get_param command on the desired node with the same parameter name as lctl set_param: + lctl get_param [-n] obdtype.obdname.parameter For example: - $ lctl get_param -n ost.*.ost_io.timeouts - To list Lustre parameters that are available to set, use the lctl list_param command, with this syntax: - lctl list_param [-n] <obdtype>.<obdname> - For example: - $ lctl list_param obdfilter.lustre-OST0000 - For more information on using lctl to set temporary and permanent parameters, see (Setting Parameters with lctl). + mds# lctl get_param mdt.testfs-MDT0000.identity_upcall + To list Lustre parameters that are available to set, use the lctl list_param command, with this syntax: + lctl list_param [-R] [-F] obdtype.obdname.* + For example, to list all of the parameters on the MDT: + oss# lctl list_param -RF mdt + For more information on using lctl to set temporary and permanent parameters, see . Network Configuration @@ -251,15 +250,15 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - network <up/down>|<tcp/elan/myrinet> + network up|down|tcp|elan|myrinet - Starts or stops LNET, or selects a network type for other lctl LNET commands. + Starts or stops LNET, or selects a network type for other lctl LNET commands. - list_nids + list_nids Prints all NIDs on the local node. LNET must be running. @@ -267,7 +266,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - which_nid <nidlist> + which_nid nidlist From a list of NIDs for a remote node, identifies the NID on which interface communication will occur. @@ -275,7 +274,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - ping <nid> + ping nid Checks LNET connectivity via an LNET ping. This uses the fabric appropriate to the specified NID. @@ -283,39 +282,39 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - interface_list + interface_list - Prints the network interface information for a given network type. + Prints the network interface information for a given network type. - peer_list + peer_list - Prints the known peers for a given network type. + Prints the known peers for a given network type. - conn_list + conn_list - Prints all the connected remote NIDs for a given network type. + Prints all the connected remote NIDs for a given network type. - active_tx + active_tx - This command prints active transmits. It is only used for the Elan network type. + This command prints active transmits. It is only used for the Elan network type. - route_list + route_list Prints the complete routing table. @@ -346,7 +345,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - device <devname> + device devname   @@ -357,13 +356,13 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - device_list + device_list   - Shows the local Lustre OBDs, a/k/a dl. + Shows the local Lustre OBDs, a/k/a dl. @@ -388,7 +387,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - list_param[-F|-R] <param_path ...> + list_param [-F|-R] parameter [parameter ...] Lists the Lustre or LNET parameter name. @@ -400,7 +399,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -F + -F Adds '/', '@' or '=' for directories, symlinks and writeable files, respectively. @@ -411,15 +410,15 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -R + -R - Recursively lists all parameters under the specified path. If param_path is unspecified, all parameters are shown. + Recursively lists all parameters under the specified path. If param_path is unspecified, all parameters are shown. - get_param[-n|-N|-F] <param_path ...> + get_param [-n|-N|-F] parameter [parameter ...] Gets the value of a Lustre or LNET parameter from the specified path. @@ -430,7 +429,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -n + -n Prints only the parameter value and not the parameter name. @@ -441,7 +440,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -N + -N Prints only matched parameter names and not the values; especially useful when using patterns. @@ -452,15 +451,15 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -F + -F - When -N is specified, adds '/', '@' or '=' for directories, symlinks and writeable files, respectively. + When -N is specified, adds '/', '@' or '=' for directories, symlinks and writeable files, respectively. - set_param[-n]<param_path=value...> + set_param [-n] parameter=value Sets the value of a Lustre or LNET parameter from the specified path. @@ -471,7 +470,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -n + -n Disables printing of the key name when printing values. @@ -479,12 +478,12 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - conf_param[-d] <device|fsname>.<parameter>=<value> + conf_param [-d] device|fsname parameter=value Sets a permanent configuration parameter for any device via the MGS. This command must be run on the MGS node. - All writeable parameters under lctl list_param (e.g. lctl list_param -F osc.*.* | grep =) can be permanently set using lctl conf_param, but the format is slightly different. For conf_param, the device is specified first, then the obdtype. Wildcards are not supported. Additionally, failover nodes may be added (or removed), and some system-wide parameters may be set as well (sys.at_max, sys.at_min, sys.at_extra, sys.at_early_margin, sys.at_history, sys.timeout, sys.ldlm_timeout). For system-wide parameters, <device> is ignored. - For more information on setting permanent parameters and lctl conf_param command examples, see (Setting Permanent Parameters). + All writeable parameters under lctl list_param (e.g. lctl list_param -F osc.*.* | grep =) can be permanently set using lctl conf_param, but the format is slightly different. For conf_param, the device is specified first, then the obdtype. Wildcards are not supported. Additionally, failover nodes may be added (or removed), and some system-wide parameters may be set as well (sys.at_max, sys.at_min, sys.at_extra, sys.at_early_margin, sys.at_history, sys.timeout, sys.ldlm_timeout). For system-wide parameters, device is ignored. + For more information on setting permanent parameters and lctl conf_param command examples, see (Setting Permanent Parameters). @@ -492,24 +491,24 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16   - -d <device|fsname>.<parameter> + -d device|fsname.parameter   - Deletes a parameter setting (use the default value at the next restart). A null value for <value> also deletes the parameter setting. + Deletes a parameter setting (use the default value at the next restart). A null value for value also deletes the parameter setting. - activate + activate - Re-activates an import after the deactivate operation. This setting is only effective until the next restart (see conf_param). + Re-activates an import after the deactivate operation. This setting is only effective until the next restart (see conf_param). - deactivate + deactivate Deactivates an import, in particular meaning do not assign new file stripes to an OSC. Running lctl deactivate on the MDS stops new objects from being allocated on the OST. Running lctl deactivate on Lustre clients causes them to return -EIO when accessing objects on the OST instead of waiting for recovery. @@ -517,7 +516,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - abort_recovery + abort_recovery Aborts the recovery process on a re-starting MDT or OST. @@ -548,15 +547,15 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - blockdev_attach<file name> <device node> + blockdev_attach filename /dev/lloop_device - Attaches a regular Lustre file to a block device. If the device node does not exist, lctl creates it. We recommend that you create the device node by lctl since the emulator uses a dynamical major number. + Attaches a regular Lustre file to a block device. If the device node does not exist, lctl creates it. We recommend that you create the device node by lctl since the emulator uses a dynamical major number. - blockdev_detach<device node> + blockdev_detach /dev/lloop_device Detaches the virtual block device. @@ -564,7 +563,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - blockdev_info<device node> + blockdev_info /dev/lloop_device Provides information about the Lustre file attached to the device node. @@ -591,15 +590,15 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - changelog_register + changelog_register - Registers a new changelog user for a particular device. Changelog entries are not purged beyond a registered user's set point (see lfs changelog_clear). + Registers a new changelog user for a particular device. Changelog entries are not purged beyond a registered user's set point (see lfs changelog_clear). - changelog_deregister<id> + changelog_deregister id Unregisters an existing changelog user. If the user's "clear" record number is the minimum for the device, changelog records are purged until the next minimum. @@ -626,7 +625,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - debug_daemon + debug_daemon Starts and stops the debug daemon, and controls the output filename and size. @@ -634,7 +633,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - debug_kernel[file] [raw] + debug_kernel [file] [raw] Dumps the kernel debug buffer to stdout or a file. @@ -642,7 +641,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - debug_file<input> [output] + debug_file input_file [output_file] Converts the kernel-dumped debug log from binary to plain text format. @@ -650,7 +649,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - clear + clear Clears the kernel debug buffer. @@ -658,7 +657,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - mark<text> + mark text Inserts marker text in the kernel debug buffer. @@ -666,7 +665,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - filter<subsystem id/debug mask> + filter subsystem_id|debug_mask Filters kernel debug messages by subsystem or mask. @@ -674,7 +673,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - show<subsystem id/debug mask> + show subsystem_id|debug_mask Shows specific types of messages. @@ -682,7 +681,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - debug_list<subs/types> + debug_list subsystems|types Lists all subsystem and debug types. @@ -690,7 +689,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - modules<path> + modules path Provides GDB-friendly module information. @@ -720,7 +719,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - --device + --device Device to be used for the operation (specified by name or number). See device_list. @@ -728,7 +727,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16 - --ignore_errors | ignore_errors + --ignore_errors | ignore_errors Ignores errors during script processing. @@ -740,7 +739,7 @@ $ lctl conf_param testfs.llite.max_read_ahead_mb=16
Examples - lctl + lctl $ lctl lctl > dl 0 UP mgc MGC192.168.0.20@tcp btbb24e3-7deb-2ffa-eab0-44dffe00f692 5 @@ -825,14 +824,14 @@ ll_recover_lost_found_objs Option -  Description + Description - -h + -h Prints a help message @@ -840,7 +839,7 @@ ll_recover_lost_found_objs - -v + -v Increases verbosity @@ -848,7 +847,7 @@ ll_recover_lost_found_objs - -d directory + -d directory Sets the lost and found directory path @@ -873,7 +872,7 @@ llobdstat
Description - The llobdstat utility displays a line of OST statistics for the given ost_name every interval seconds. It should be run directly on an OSS node. Type CTRL-C to stop statistics printing. + The llobdstat utility displays a line of OST statistics for the given ost_name every interval seconds. It should be run directly on an OSS node. Type CTRL-C to stop statistics printing.
Example @@ -895,7 +894,7 @@ Timestamp Read-delta ReadRate Write-delta WriteRate
Files - /proc/fs/lustre/obdfilter/<ostname>/stats + /proc/fs/lustre/obdfilter/ostname/stats
@@ -930,12 +929,12 @@ llstat The llstat utility displays Lustre statistics.
Synopsis - llstat [-c] [-g] [-i interval] stats_file - + llstat [-c] [-g] [-i interval] stats_file +
Description - The llstat utility displays statistics from any of the Lustre statistics files that share a common format and are updated at interval seconds. To stop statistics printing, use ctrl-c. + The llstat utility displays statistics from any of the Lustre statistics files that share a common format and are updated at interval seconds. To stop statistics printing, use ctrl-c.
Options @@ -949,14 +948,14 @@ llstat Option -  Description + Description - -c + -c Clears the statistics file. @@ -964,7 +963,7 @@ llstat - -i + -i Specifies the polling period (in seconds). @@ -972,7 +971,7 @@ llstat - -g + -g Specifies graphable output format. @@ -980,7 +979,7 @@ llstat - -h + -h Displays help information. @@ -988,10 +987,10 @@ llstat - stats_file + stats_file - Specifies either the full path to a statistics file or the shorthand reference, mds or ost + Specifies either the full path to a statistics file or the shorthand reference, mds or ost @@ -1026,7 +1025,7 @@ llverdev The llverdev verifies a block device is functioning properly over its full size.
Synopsis - llverdev [-c chunksize] [-f] [-h] [-o offset] [-l] [-p] [-r] [-t timestamp] [-v] [-w] device + llverdev [-c chunksize] [-f] [-h] [-o offset] [-l] [-p] [-r] [-t timestamp] [-v] [-w] device
Description @@ -1058,7 +1057,7 @@ llverdev - -c|--chunksize + -c|--chunksize I/O chunk size in bytes (default value is 1048576). @@ -1066,7 +1065,7 @@ llverdev - -f|--force + >-f|--force> Forces the test to run without a confirmation that the device will be overwritten and all data will be permanently destroyed. @@ -1074,7 +1073,7 @@ llverdev - -h|--help + -h|--help Displays a brief help message. @@ -1082,7 +1081,7 @@ llverdev - -ooffset + -o >offset Offset (in kilobytes) of the start of the test (default value is 0). @@ -1090,7 +1089,7 @@ llverdev - -l|--long + -l|--long Runs a full check, writing and then reading and verifying every block on the disk. @@ -1098,7 +1097,7 @@ llverdev - -p|--partial + -p|--partial Runs a partial check, only doing periodic checks across the device (1 GB steps). @@ -1106,15 +1105,15 @@ llverdev - -r|--read + -r|--read - Runs the test in read (verify) mode only, after having previously run the test in -w mode. + Runs the test in read (verify) mode only, after having previously run the test in -w mode. - -ttimestamp + -t timestamp Sets the test start time as printed at the start of a previously-interrupted test to ensure that validation data is the same across the entire filesystem (default value is the current time()). @@ -1122,7 +1121,7 @@ llverdev - -v|--verbose + -v|--verbose Runs the test in verbose mode, listing each read and write operation. @@ -1130,7 +1129,7 @@ llverdev - -w|--write + -w|--write Runs the test in write (test-pattern) mode (default runs both read and write). @@ -1187,7 +1186,7 @@ lshowmount - -e|--enumerate + -e|--enumerate Causes lshowmount to list each client mounted on a separate line instead of trying to compress the list of clients into a hostrange string. @@ -1195,7 +1194,7 @@ lshowmount - -h|--help + -h|--help Causes lshowmount to print out a usage message. @@ -1203,7 +1202,7 @@ lshowmount - -l|--lookup + -l|--lookup Causes lshowmount to try to look up the hostname for NIDs that look like IP addresses. @@ -1211,7 +1210,7 @@ lshowmount - -v|--verbose + -v|--verbose Causes lshowmount to output export information for each service instead of only displaying the aggregate information for all Lustre services on the server. @@ -1223,8 +1222,9 @@ lshowmount
Files - /proc/fs/lustre/mgs/<server>/exports/<uuid>/nid /proc/fs/lustre/mds/<server>/expo\ -rts/<uuid>/nid /proc/fs/lustre/obdfilter/<server>/exports/<uuid>/nid + /proc/fs/lustre/mgs/server/exports/uuid/nid +/proc/fs/lustre/mds/server/exports/uuid/nid +/proc/fs/lustre/obdfilter/server/exports/uuid/nid
@@ -1291,15 +1291,15 @@ lustre_rsync The lustre_rsync utility synchronizes (replicates) a Lustre file system to a target file system.
Synopsis - lustre_rsync --source|-s <src> --target|-t <tgt> - --mdt|-m <mdt> [--user|-u <user id>] - [--xattr|-x <yes|no>] [--verbose|-v] - [--statuslog|-l <log>] [--dry-run] [--abort-on-err] + lustre_rsync --source|-s src --target|-t tgt + --mdt|-m mdt [--user|-u userid] + [--xattr|-x yes|no] [--verbose|-v] + [--statuslog|-l log] [--dry-run] [--abort-on-err] -lustre_rsync --statuslog|-l <log> +lustre_rsync --statuslog|-l log -lustre_rsync --statuslog|-l <log> --source|-s <source> - --target|-t <tgt> --mdt|-m <mdt> +lustre_rsync --statuslog|-l log --source|-s source + --target|-t tgt --mdt|-m mdt
Description @@ -1336,7 +1336,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --source=<src> + --source=src The path to the root of the Lustre file system (source) which will be synchronized. This is a mandatory option if a valid status log created during a previous synchronization operation (--statuslog) is not specified. @@ -1344,7 +1344,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --target=<tgt> + --target=tgt The path to the root where the source file system will be synchronized (target). This is a mandatory option if the status log created during a previous synchronization operation (--statuslog) is not specified. This option can be repeated if multiple synchronization targets are desired. @@ -1352,7 +1352,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --mdt=<mdt> + --mdt=mdt The metadata device to be synchronized. A changelog user must be registered for this device. This is a mandatory option if a valid status log created during a previous synchronization operation (--statuslog) is not specified. @@ -1360,7 +1360,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --user=<user id> + --user=userid The changelog user ID for the specified MDT. To use lustre_rsync, the changelog user must be registered. For details, see the changelog_register parameter in the lctl man page. This is a mandatory option if a valid status log created during a previous synchronization operation (--statuslog) is not specified. @@ -1368,7 +1368,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --statuslog=<log> + --statuslog=log A log file to which synchronization status is saved. When lustre_rsync starts, the state of a previous replication is read from here. If the status log from a previous synchronization operation is specified, otherwise mandatory options like --source, --target and --mdt options may be skipped. By specifying options like --source, --target and/or --mdt in addition to the --statuslog option, parameters in the status log can be overridden. Command line options take precedence over options in the status log. @@ -1376,7 +1376,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --xattr<yes|no> + --xattryes|no Specifies whether extended attributes (xattrs) are synchronized or not. The default is to synchronize extended attributes. @@ -1385,7 +1385,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --verbose + --verbose Produces a verbose output. @@ -1393,7 +1393,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --dry-run + --dry-run Shows the output of lustre_rsync commands (copy, mkdir, etc.) on the target file system without actually executing them. @@ -1401,7 +1401,7 @@ lustre_rsync --statuslog|-l <log> --source|-s <source> - --abort-on-err + --abort-on-err Shows the output of lustre_rsync commands (copy, mkdir, etc.) on the target file system without actually executing them. @@ -1462,8 +1462,8 @@ mkfs.lustre The mkfs.lustre utility formats a disk for a Lustre service.
Synopsis - mkfs.lustre <target_type> [options] device - where <target_type> is one of the following: + mkfs.lustre target_type [options] device + where target_type is one of the following: @@ -1481,7 +1481,7 @@ mkfs.lustre - --ost + --ost Object Storage Target (OST) @@ -1489,7 +1489,7 @@ mkfs.lustre - --mdt + --mdt Metadata Storage Target (MDT) @@ -1497,7 +1497,7 @@ mkfs.lustre - --network=net,... + --network=net,... Network(s) to which to restrict this OST/MDT. This option can be repeated as necessary. @@ -1505,10 +1505,10 @@ mkfs.lustre - --mgs + --mgs - Configuration Management Service (MGS), one per site. This service can be combined with one --mdt service by specifying both types. + Configuration Management Service (MGS), one per site. This service can be combined with one --mdt service by specifying both types. @@ -1537,7 +1537,7 @@ mkfs.lustre - --backfstype=fstype + --backfstype=fstype Forces a particular format for the backing file system (such as ext3, ldiskfs). @@ -1545,7 +1545,7 @@ mkfs.lustre - --comment=comment + --comment=comment Sets a user comment about this disk, ignored by Lustre. @@ -1553,7 +1553,7 @@ mkfs.lustre - --device-size=KB + --device-size=#>KB Sets the device size for loop devices. @@ -1561,7 +1561,7 @@ mkfs.lustre - --dryrun + --dryrun Only prints what would be done; it does not affect the disk. @@ -1569,25 +1569,25 @@ mkfs.lustre - --failnode=nid,... + --failnode=nid,... Sets the NID(s) of a failover partner. This option can be repeated as needed. - CAUTION: Cannot be used with --servicenode. + This cannot be used with --servicenode. - --servicenode=nid,... + --servicenode=nid,... Sets the NID(s) of all service node, including failover partner as well as primary node service nids. This option can be repeated as needed. - CAUTION: Cannot be used with --failnode. + This cannot be used with --failnode. - --fsname=filesystem_name + --fsname=filesystem_name The Lustre file system of which this service/node will be a part. The default file system name is 'lustreâ€. @@ -1599,7 +1599,7 @@ mkfs.lustre - --index=index + --index=index Specifies the OST or MDT number. This should always be used when formatting OSTs, in order to ensure that there is a simple mapping between the OST index and the OSS node and device it is located on. @@ -1607,7 +1607,7 @@ mkfs.lustre - --mkfsoptions=opts + --mkfsoptions=opts Formats options for the backing file system. For example, ext3 options could be set here. @@ -1615,20 +1615,20 @@ mkfs.lustre - --mountfsoptions=opts + --mountfsoptions=opts Sets the mount options used when the backing file system is mounted. - CAUTION: Unlike earlier versions of mkfs.lustre, this version completely replaces the default mount options with those specified on the command line, and issues a warning on stderr if any default mount options are omitted. + Unlike earlier versions of mkfs.lustre, this version completely replaces the default mount options with those specified on the command line, and issues a warning on stderr if any default mount options are omitted. The defaults for ldiskfs are: - OST: errors=remount-ro; - MGS/MDT: errors=remount-ro,iopen_nopriv,user_xattr + OST: errors=remount-ro; + MGS/MDT: errors=remount-ro,iopen_nopriv,user_xattr Do not alter the default mount options unless you know what you are doing. - --network=net,... + --network=net,...   @@ -1637,7 +1637,7 @@ mkfs.lustre - --mgsnode=nid,... + --mgsnode=nid,... Sets the NIDs of the MGS node, required for all targets other than the MGS. @@ -1645,10 +1645,10 @@ mkfs.lustre - --paramkey=value + --param key=value - Sets the permanent parameter key to value value. This option can be repeated as necessary. Typical options might include: + Sets the permanent parameter key to value value. This option can be repeated as necessary. Typical options might include: @@ -1656,7 +1656,7 @@ mkfs.lustre   - --param sys.timeout=40 + --param sys.timeout=40> System obd timeout. @@ -1667,7 +1667,7 @@ mkfs.lustre   - --param lov.stripesize=2M + --param lov.stripesize=2M Default stripe size. @@ -1678,7 +1678,7 @@ mkfs.lustre   - --param lov.stripecount=2 + param lov.stripecount=2 Default stripe count. @@ -1689,7 +1689,7 @@ mkfs.lustre   - --param failover.mode=failout + --param failover.mode=failout Returns errors instead of waiting for recovery. @@ -1697,7 +1697,7 @@ mkfs.lustre - --quiet + --quiet Prints less information. @@ -1705,7 +1705,7 @@ mkfs.lustre - --reformat + --reformat Reformats an existing Lustre disk. @@ -1713,7 +1713,7 @@ mkfs.lustre - --stripe-count-hint=stripes + --stripe-count-hint=stripes Used to optimize the MDT's inode size. @@ -1721,7 +1721,7 @@ mkfs.lustre - --verbose + --verbose Prints more information. @@ -1733,13 +1733,13 @@ mkfs.lustre
Examples - Creates a combined MGS and MDT for file system testfs on, e.g., node cfs21: + Creates a combined MGS and MDT for file system testfs on, e.g., node cfs21: mkfs.lustre --fsname=testfs --mdt --mgs /dev/sda1 - Creates an OST for file system testfs on any node (using the above MGS): + Creates an OST for file system testfs on any node (using the above MGS): mkfs.lustre --fsname=testfs --mgsnode=cfs21@tcp0 --ost --index=0 /dev/sdb - Creates a standalone MGS on, e.g., node cfs22: + Creates a standalone MGS on, e.g., node cfs22: mkfs.lustre --mgs /dev/sda1 - Creates an MDT for file system myfs1 on any node (using the above MGS): + Creates an MDT for file system myfs1 on any node (using the above MGS): mkfs.lustre --fsname=myfs1 --mdt --mgsnode=cfs22@tcp0 /dev/sda2
@@ -1787,19 +1787,19 @@ mount.lustre - <mgsspec>:/<fsname> + mgs_nid:/fsname   - Mounts the Lustre file system named fsname on the client by contacting the Management Service at mgsspec on the pathname given by directory. The format for mgsspec is defined below. A mounted client file system appears in fstab(5) and is usable, like any local file system, and provides a full POSIX-compliant interface. + Mounts the Lustre file system named fsname on the client by contacting the Management Service at mgsspec on the pathname given by directory. The format for mgsspec is defined below. A mounted client file system appears in fstab(5) and is usable, like any local file system, and provides a full POSIX-compliant interface. - <disk_device> + block_device - Starts the target service defined by the mkfs.lustre command on the physical disk disk_device. A mounted target service file system is only useful for df(1) operations and appears in fstab(5) to show the device is in use. + Starts the target service defined by the mkfs.lustre command on the physical disk block_device. A mounted target service file system is only useful for df(1) operations and appears in fstab(5) to show the device is in use. @@ -1825,7 +1825,7 @@ mount.lustre - <mgsspec>:=<mgsnode>[:<mgsnode>] + mgsspec:=mgsnode[:mgsnode]   @@ -1834,7 +1834,7 @@ mount.lustre - <mgsnode>:=<mgsnid>[,<mgsnid>] + mgsnode:=mgsnid[,mgsnid] Each node may be specified by a comma-separated list of NIDs. @@ -1861,7 +1861,7 @@ mount.lustre - flock + flock Enables full flock support, coherent across all client nodes. @@ -1869,7 +1869,7 @@ mount.lustre - localflock + localflock Enables local flock support, using only client-local flock (faster, for applications that require flock, but do not run on multiple nodes). @@ -1877,15 +1877,15 @@ mount.lustre - noflock + noflock - Disables flock support entirely. Applications calling flock get an error. It is up to the administrator to choose either localflock (fastest, low impact, not coherent between nodes) or flock (slower, performance impact for use, coherent between nodes). + Disables flock support entirely. Applications calling flock get an error. It is up to the administrator to choose either localflock (fastest, low impact, not coherent between nodes) or flock (slower, performance impact for use, coherent between nodes). - user_xattr + user_xattr Enables get/set of extended attributes by regular users. See the attr(5) manual page. @@ -1893,7 +1893,7 @@ mount.lustre - nouser_xattr + nouser_xattr Disables use of extended attributes by regular users. Root and system processes can still use extended attributes. @@ -1901,7 +1901,7 @@ mount.lustre - acl + acl Enables POSIX Access Control List support. See the acl(5) manual page. @@ -1909,7 +1909,7 @@ mount.lustre - noacl + noacl Disables Access Control List support. @@ -1936,7 +1936,7 @@ mount.lustre - nosvc + nosvc Starts the MGC (and MGS, if co-located) for a target service, not the actual service. @@ -1944,7 +1944,7 @@ mount.lustre - nomsgs + nomsgs Starts only the MDT (with a co-located MGS), without starting the MGS. @@ -1952,7 +1952,7 @@ mount.lustre - exclude=<ostlist> + exclude=ostlist Starts a client or MDT with a colon-separated list of known inactive OSTs. @@ -1960,7 +1960,7 @@ mount.lustre - nosvc + nosvc Only starts the MGC (and MGS, if co-located) for a target service, not the actual service. @@ -1968,7 +1968,7 @@ mount.lustre - nomsgs + nomsgs Starts a MDT with a co-located MGS, without starting the MGS. @@ -1976,7 +1976,7 @@ mount.lustre - exclude=ostlist + exclude=ostlist Starts a client or MDT with a (colon-separated) list of known inactive OSTs. @@ -1984,7 +1984,7 @@ mount.lustre - abort_recov + abort_recov Aborts client recovery and starts the target service immediately. @@ -1992,7 +1992,7 @@ mount.lustre - md_stripe_cache_size + md_stripe_cache_size Sets the stripe cache size for server-side disk with a striped RAID configuration. @@ -2000,18 +2000,18 @@ mount.lustre - recovery_time_soft=timeout + recovery_time_soft=timeout - Allows timeout seconds for clients to reconnect for recovery after a server crash. This timeout is incrementally extended if it is about to expire and the server is still handling new connections from recoverable clients. The default soft recovery timeout is 300 seconds (5 minutes). + Allows timeout seconds for clients to reconnect for recovery after a server crash. This timeout is incrementally extended if it is about to expire and the server is still handling new connections from recoverable clients. The default soft recovery timeout is 300 seconds (5 minutes). - recovery_time_hard=timeout + recovery_time_hard=timeout - The server is allowed to incrementally extend its timeout up to a hard maximum of timeout seconds. The default hard recovery timeout is set to 900 seconds (15 minutes). + The server is allowed to incrementally extend its timeout up to a hard maximum of timeout seconds. The default hard recovery timeout is set to 900 seconds (15 minutes). @@ -2079,7 +2079,7 @@ plot-llstat - results_filename + results_filename Output generated by plot-llstat @@ -2087,7 +2087,7 @@ plot-llstat - parameter_index + parameter_index   @@ -2113,11 +2113,11 @@ routerstat The routerstat utility prints Lustre router statistics.
Synopsis - routerstat [interval] + routerstat [interval]
Description - The routerstat utility watches LNET router statistics. If no interval is specified, then statistics are sampled and printed only one time. Otherwise, statistics are sampled and printed at the specified interval (in seconds). + The routerstat utility watches LNET router statistics. If no interval is specified, then statistics are sampled and printed only one time. Otherwise, statistics are sampled and printed at the specified interval (in seconds).
Options @@ -2139,7 +2139,7 @@ routerstat - M + M msgs_alloc(msgs_max) @@ -2147,7 +2147,7 @@ routerstat - E + E errors @@ -2155,7 +2155,7 @@ routerstat - S + S send_count/send_length @@ -2163,7 +2163,7 @@ routerstat - R + R recv_count/recv_length @@ -2171,7 +2171,7 @@ routerstat - F + F route_count/route_length @@ -2179,7 +2179,7 @@ routerstat - D + D drop_count/drop_length @@ -2201,7 +2201,7 @@ tunefs.lustre The tunefs.lustre utility modifies configuration information on a Lustre target disk.
Synopsis - tunefs.lustre [options] <device> + tunefs.lustre [options] /dev/device
Description @@ -2210,8 +2210,8 @@ tunefs.lustre Changes made here affect a file system only when the target is mounted the next time. With tunefs.lustre, parameters are "additive" -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run: - $ tunefs.lustre --erase-params --param=<new parameters> - The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as <obd|fsname>.<obdtype>.<proc_file_name>=<value>. For example: + $ tunefs.lustre --erase-params --param=new_parameters + The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as {obd|fsname}.obdtype.proc_file_name=value. For example: $ tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1
@@ -2234,7 +2234,7 @@ tunefs.lustre - --comment=comment + --comment=comment Sets a user comment about this disk, ignored by Lustre. @@ -2242,7 +2242,7 @@ tunefs.lustre - --dryrun + --dryrun Only prints what would be done; does not affect the disk. @@ -2250,7 +2250,7 @@ tunefs.lustre - --erase-params + --erase-params Removes all previous parameter information. @@ -2258,25 +2258,25 @@ tunefs.lustre - --failnode=nid,... + --failnode=nid,... Sets the NID(s) of a failover partner. This option can be repeated as needed. - CAUTION: Cannot be used with --servicenode. + Cannot be used with --servicenode. - --servicenode=nid,... + --servicenode=nid,... Sets the NID(s) of all service node, including failover partner as well as local service nids. This option can be repeated as needed. - CAUTION: Cannot be used with --failnode. + : Cannot be used with --failnode. - --fsname=filesystem_name + --fsname=filesystem_name The Lustre file system of which this service will be a part. The default file system name is 'lustreâ€. @@ -2284,7 +2284,7 @@ tunefs.lustre - --index=index + --index=index Forces a particular OST or MDT index. @@ -2292,20 +2292,20 @@ tunefs.lustre - --mountfsoptions=opts + --mountfsoptions=opts Sets the mount options used when the backing file system is mounted. - CAUTION: Unlike earlier versions of tunefs.lustre, this version completely replaces the existing mount options with those specified on the command line, and issues a warning on stderr if any default mount options are omitted. + Unlike earlier versions of tunefs.lustre, this version completely replaces the existing mount options with those specified on the command line, and issues a warning on stderr if any default mount options are omitted. The defaults for ldiskfs are: - OST: errors=remount-ro,mballoc,extents; - MGS/MDT: errors=remount-ro,iopen_nopriv,user_xattr + OST: errors=remount-ro,mballoc,extents; + MGS/MDT: errors=remount-ro,iopen_nopriv,user_xattr Do not alter the default mount options unless you know what you are doing. - --network=net,... + --network=net,... Network(s) to which to restrict this OST/MDT. This option can be repeated as necessary. @@ -2313,7 +2313,7 @@ tunefs.lustre - --mgs + --mgs Adds a configuration management service to this target. @@ -2321,7 +2321,7 @@ tunefs.lustre - --msgnode=nid,... + --msgnode=nid,... Sets the NID(s) of the MGS node; required for all targets other than the MGS. @@ -2329,7 +2329,7 @@ tunefs.lustre - --nomgs + --nomgs Removes a configuration management service to this target. @@ -2337,7 +2337,7 @@ tunefs.lustre - --quiet + --quiet Prints less information. @@ -2345,7 +2345,7 @@ tunefs.lustre - --verbose + --verbose Prints more information. @@ -2353,7 +2353,7 @@ tunefs.lustre - --writeconf + --writeconf Erases all configuration logs for the file system to which this MDT belongs, and regenerates them. This is dangerous operation. All clients must be unmounted and servers for this file system should be stopped. All targets (OSTs/MDTs) must then be restarted to regenerate the logs. No clients should be started until all targets have restarted. @@ -2361,7 +2361,7 @@ tunefs.lustre The correct order of operations is: * Unmount all clients on the file system * Unmount the MDT and all OSTs on the file system - * Run tunefs.lustre --writeconf <device> on every server + * Run tunefs.lustre --writeconf device on every server * Mount the MDT and OSTs * Mount the clients @@ -2373,7 +2373,7 @@ tunefs.lustre
Examples Change the MGS's NID address. (This should be done on each target disk, since they should all contact the same MGS.) - tunefs.lustre --erase-param --mgsnode=<new_nid> --writeconf /dev/sda + tunefs.lustre --erase-param --mgsnode=new_nid --writeconf /dev/sda Add a failover NID location for this target. tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda
@@ -2403,26 +2403,26 @@ Additional System Configuration Utilities <indexterm><primary>utilities</primary><secondary>application profiling</secondary></indexterm> Application Profiling Utilities The following utilities are located in /usr/bin. - lustre_req_history.sh + lustre_req_history.sh The lustre_req_history.sh utility (run from a client), assembles as much Lustre RPC request history as possible from the local node and from the servers that were contacted, providing a better picture of the coordinated network activity. - llstat.sh + llstat.sh The llstat.sh utility handles a wider range of statistics files, and has command line switches to produce more graphable output. - plot-llstat.sh + plot-llstat.sh The plot-llstat.sh utility plots the output from llstat.sh using gnuplot.
More /proc Statistics for Application Profiling The following utilities provide additional statistics. - vfs_ops_stats + vfs_ops_stats The client vfs_ops_stats utility tracks Linux VFS operation calls into Lustre for a single PID, PPID, GID or everything. /proc/fs/lustre/llite/*/vfs_ops_stats /proc/fs/lustre/llite/*/vfs_track_[pid|ppid|gid] - extents_stats + extents_stats The client extents_stats utility shows the size distribution of I/O calls from the client (cumulative and by process). /proc/fs/lustre/llite/*/extents_stats, extents_stats_per_process - offset_stats + offset_stats The client offset_stats utility shows the read/write seek activity of a client by offsets and ranges. /proc/fs/lustre/llite/*/offset_stats @@ -2523,7 +2523,7 @@ loadgen> The loadgen utility prints periodic status messages; message output can be controlled with the verbose command. To insure that a file can be written to (a requirement of write cache), OSTs reserve ("grants"), chunks of space for each newly-created file. A grant may cause an OST to report that it is out of space, even though there is plenty of space on the disk, because the space is "reserved" by other files. The loadgen utility estimates the number of simultaneous open files as the disk size divided by the grant size and reports that number when the write tests are first started. - Echo Server + Echo Server The loadgen utility can start an echo server. On another node, loadgen can specify the echo server as the device, thus creating a network-only test environment. loadgen> echosrv loadgen> dl @@ -2557,7 +2557,7 @@ wait quit EOF - Feature Requests + Feature Requests The loadgen utility is intended to grow into a more comprehensive test tool; feature requests are encouraged. The current feature requests include: @@ -2609,8 +2609,7 @@ llog_reader /tmp/tfs-client Although they are stored in the CONFIGS directory, mountdata files do not use the config log format and will confuse llog_reader. - See Also - + See Also
<indexterm><primary>lr_reader</primary></indexterm> @@ -2654,7 +2653,7 @@ sgpdd_survey - local mode + local mode In this mode, locks are coherent on one node (a single-node flock), but not across all clients. To enable it, use -o localflock. This is a client-mount option. @@ -2665,12 +2664,12 @@ sgpdd_survey - consistent mode + consistent mode In this mode, locks are coherent across all clients. To enable it, use the -o flock. This is a client-mount option. - CAUTION: This mode affects the performance of the file being flocked and may affect stability, depending on the Lustre version used. Consider using a newer Lustre version which is more stable. If the consistent mode is enabled and no applications are using flock, then it has no effect. + This mode affects the performance of the file being flocked and may affect stability, depending on the Lustre version used. Consider using a newer Lustre version which is more stable. If the consistent mode is enabled and no applications are using flock, then it has no effect. diff --git a/TroubleShootingRecovery.xml b/TroubleShootingRecovery.xml index d86f6e8..38fc00e 100644 --- a/TroubleShootingRecovery.xml +++ b/TroubleShootingRecovery.xml @@ -232,14 +232,14 @@ lfsck: fixed 0 errors Manually Starting LFSCK
Synopsis - lctl lfsck_start <-M | --device MDT_device> \ - [-e | --error error_handle] \ + lctl lfsck_start -M | --device MDT_device \ + [-e | --error error_handle] \ [-h | --help] \ - [-m | --method iteration_method] \ - [-n | --dryrun switch] \ + [-m | --method iteration_method] \ + [-n | --dryrun switch] \ [-r | --reset] \ - [-s | --speed speed_limit] \ - [-t | --type lfsck_type[,lfsck_type...]] + [-s | --speed speed_limit] \ + [-t | --type lfsck_type[,lfsck_type...]]
@@ -337,7 +337,7 @@ lfsck: fixed 0 errors Manually Stopping <literal>lfsck</literal>
Synopsis - lctl lfsck_stop <-M | --device MDT_device> \ + lctl lfsck_stop -M | --device MDT_device \ [-h | --help]
@@ -391,7 +391,7 @@ lfsck: fixed 0 errors LFSCK status via <literal>procfs</literal>
Synopsis - lctl get_param -n osd-ldisk.${FSNAME}-${MDT_device}.oi_scrub + lctl get_param -n osd-ldisk.FSNAME-MDT_device.oi_scrub
diff --git a/UpgradingLustre.xml b/UpgradingLustre.xml index f9f40ba..2a03433 100644 --- a/UpgradingLustre.xml +++ b/UpgradingLustre.xml @@ -58,18 +58,18 @@ Install the kernel, modules and ldiskfs packages. For example: $ rpm -ivh -kernel-lustre-smp-<ver> \ -kernel-ib-<ver> \ -lustre-modules-<ver> \ -lustre-ldiskfs-<ver> +kernel-lustre-smp-ver \ +kernel-ib-ver \ +lustre-modules-ver \ +lustre-ldiskfs-ver Upgrade the utilities/userspace packages. For example: - $ rpm -Uvh lustre-<ver> + $ rpm -Uvh lustre-ver If a new e2fsprogs package is available, upgrade it. For example: - $ rpm -Uvh e2fsprogs-<ver> + $ rpm -Uvh e2fsprogs-ver Use e2fsprogs-1.41.90-wc3 or later, available at: http://downloads.whamcloud.com/public/e2fsprogs/latest/ @@ -106,17 +106,17 @@ lustre-ldiskfs-<ver> Mount the OSTs (be sure to mount all OSTs). On each OSS node, run: - mount -a -t lustre + oss# mount -a -t lustre This command assumes that all OSTs are listed in the /etc/fstab file. If the OSTs are not in the /etc/fstab file, they need to be mounted individually by running the mount command: - mount -t lustre <block device name> <mount point> + oss# mount -t lustre /dev/block_device /mount_point Mount the MDT. On the MDS node, run: - mount -a -t lustre + mds# mount -a -t lustre Mount the file system on the clients. On each client node, run: - mount -a -t lustre + client# mount -a -t lustre diff --git a/UserUtilities.xml b/UserUtilities.xml index cd2ba01..ea242ea 100644 --- a/UserUtilities.xml +++ b/UserUtilities.xml @@ -30,59 +30,59 @@
Synopsis lfs -lfs changelog [--follow] <mdtname> [startrec [endrec]] -lfs changelog_clear <mdtname> <id> <endrec> -lfs check <mds|osts|servers> -lfs df [-i] [-h] [--pool]-p <fsname>[.<pool>] [path] +lfs changelog [--follow] mdt_name [startrec [endrec]] +lfs changelog_clear mdt_name id endrec +lfs check mds|osts|servers +lfs df [-i] [-h] [--pool]-p fsname[.pool] [path] lfs find [[!] --atime|-A [-+]N] [[!] --mtime|-M [-+]N] - [[!] --ctime|-C [-+]N] [--maxdepth|-D N] [--name|-n <pattern>] - [--print|-p] [--print0|-P] [[!] --obd|-O <uuid[s]>] + [[!] --ctime|-C [-+]N] [--maxdepth|-D N] [--name|-n pattern] + [--print|-p] [--print0|-P] [[!] --obd|-O ost_name[,ost_name...]] [[!] --size|-S [+-]N[kMGTPE]] --type |-t {bcdflpsD}] - [[!] --gid|-g|--group|-G <gname>|<gid>] - [[!] --uid|-u|--user|-U <uname>|<uid>] - <dirname|filename> + [[!] --gid|-g|--group|-G gname|gid] + [[!] --uid|-u|--user|-U uname|uid] + dirname|filename lfs getname [-h]|[path...] -lfs getstripe [--obd|-O <uuid>] [--quiet|-q] [--verbose|-v] +lfs getstripe [--obd|-O ost_name] [--quiet|-q] [--verbose|-v] [--count|-c] [--index|-i | --offset|-o] [--size|-s] [--pool|-p] [--directory|-d] - [--recursive|-r] [--raw|-R] [-M] <dirname|filename> ... -lfs setstripe [--size|-s stripe_size] [--count|-c stripe_cnt] - [--index|-i|--offset|-o start_ost_index] - [--pool|-p <pool>] - <dirname|filename> -lfs setstripe -d <dir> + [--recursive|-r] [--raw|-R] [-M] dirname|filename ... +lfs setstripe [--size|-s stripe_size] [--count|-c stripe_count] + [--index|-i|--offset|-o start_ost_index] + [--pool|-p pool] + dirname|filename +lfs setstripe -d dir lfs osts [path] -lfs poollist <filesystem>[.<pool>]|<pathname> -lfs quota [-q] [-v] [-o obd_uuid|-I ost_idx|-i mdt_idx] - [-u <uname>|-u <uid>|-g <gname>|-g <gid>] - <filesystem> -lfs quota -t <-u|-g> <filesystem> -lfs quotacheck [-ug] <filesystem> -lfs quotachown [-i] <filesystem> -lfs quotainv [-ug] [-f] <filesystem> -lfs quotaon [-ugf] <filesystem> -lfs quotaoff [-ug] <filesystem> -lfs setquota <-u|--user|-g|--group> <uname|uid|gname|gid> - [--block-softlimit <block-softlimit>] - [--block-hardlimit <block-hardlimit>] - [--inode-softlimit <inode-softlimit>] - [--inode-hardlimit <inode-hardlimit>] - <filesystem> -lfs setquota <-u|--user|-g|--group> <uname|uid|gname|gid> - [-b <block-softlimit>] [-B <block-hardlimit>] - [-i <inode-softlimit>] [-I <inode-hardlimit>] - <filesystem> -lfs setquota -t <-u|-g> - [--block-grace <block-grace>] - [--inode-grace <inode-grace>] - <filesystem> -lfs setquota -t <-u|-g> - [-b <block-grace>] [-i <inode-grace>] - <filesystem> +lfs poollist filesystem[.pool]|pathname +lfs quota [-q] [-v] [-o obd_uuid|-I ost_idx|-i mdt_idx] + [-u username|uid|-g group|gid] + /mount_point +lfs quota -t -u|-g /mount_point +lfs quotacheck [-ug] /mount_point +lfs quotachown [-i] /mount_point +lfs quotainv [-ug] [-f] /mount_point +lfs quotaon [-ugf] /mount_point +lfs quotaoff [-ug] /mount_point +lfs setquota {-u|--user|-g|--group} uname|uid|gname|gid + [--block-softlimit block_softlimit] + [--block-hardlimit block_hardlimit] + [--inode-softlimit inode_softlimit] + [--inode-hardlimit inode_hardlimit] + /mount_point +lfs setquota -u|--user|-g|--group uname|uid|gname|gid + [-b block_softlimit] [-B block_hardlimit] + [-i inode-softlimit] [-I inode_hardlimit] + /mount_point +lfs setquota -t -u|-g + [--block-grace block_grace] + [--inode-grace inode_grace] + /mount_point +lfs setquota -t -u|-g + [-b block_grace] [-i inode_grace] + /mount_point lfs help - In the above example, the <filesystem> parameter refers to the mount point of the Lustre file system. The default mount point is /mnt/lustre + In the above example, the /mount_point parameter refers to the mount point of the Lustre file system. The old lfs quota output was very detailed and contained cluster-wide quota statistics (including cluster-wide limits for a user/group and cluster-wide usage for a user/group), as well as statistics for each MDS/OST. Now, lfs quota has been updated to provide only cluster-wide statistics, by default. To obtain the full report of cluster-wide limits, usage and statistics, use the -v option with lfs quota. @@ -124,7 +124,7 @@ lfs help changelog_clear - Indicates that changelog records previous to <endrec> are no longer of interest to a particular consumer <id>, potentially allowing the MDT to free up disk space. An <endrec> of 0 indicates the current last record. Changelog consumers must be registered on the MDT node using lctl. + Indicates that changelog records previous to endrec are no longer of interest to a particular consumer id, potentially allowing the MDT to free up disk space. An endrec of 0 indicates the current last record. Changelog consumers must be registered on the MDT node using lctl. @@ -137,7 +137,7 @@ lfs help - df [-i] [-h] [--pool|-p <fsname>[.<pool>] [path] + df [-i] [-h] [--pool|-p fsname[.pool] [path] Report file system disk space usage or inode usage (with -i) of each MDT/OST or a subset of OSTs if a pool is specified with -p. By default, prints the usage of all mounted Lustre file systems. Otherwise, if path is specified, prints only the usage of that file system. If -h is given, the output is printed in human-readable format, using SI base-2 suffixes for Mega-, Giga-, Tera-, Peta-, or Exabytes. @@ -316,7 +316,7 @@ lfs help   - --obd <uuid> + --obd ost_name Lists files that have an object on a specific OST. @@ -471,7 +471,7 @@ lfs help   - --pool <pool> + --pool pool Name of the pre-defined pool of OSTs (see ) that will be used for striping. The stripe_cnt, stripe_size and start_ost values are used as well. The start-ost value must be part of the pool or an error is returned. @@ -495,7 +495,7 @@ lfs help - quota [-q] [-v] [-o obd_uuid|-i mdt_idx|-I ost_idx] [-u|-g <uname>|<uid>|<gname>|<gid>] <filesystem> + quota [-q] [-v] [-o obd_uuid|-i mdt_idx|-I ost_idx] [-u|-g uname|uid|gname|gid] /mount_point   @@ -504,7 +504,7 @@ lfs help - quota -t <-u|-g> <filesystem> + quota -t -u|-g /mount_point Displays block and inode grace times for user (-u) or group (-g) quotas. @@ -520,7 +520,7 @@ lfs help - quotacheck [-ugf] <filesystem> + quotacheck [-ugf] /mount_point Scans the specified file system for disk usage, and creates or updates quota files. Options specify quota for users (-u), groups (-g), and force (-f). @@ -528,7 +528,7 @@ lfs help - quotaon [-ugf] <filesystem> + quotaon [-ugf] /mount_point Turns on file system quotas. Options specify quota for users (-u), groups (-g), and force (-f). @@ -536,7 +536,7 @@ lfs help - quotaoff [-ugf] <filesystem> + quotaoff [-ugf] /mount_point Turns off file system quotas. Options specify quota for users (-u), groups (-g), and force (-f). @@ -544,7 +544,7 @@ lfs help - quotainv [-ug] [-f] <filesystem> + quotainv [-ug] [-f] /mount_point Clears quota files (administrative quota files if used without -f, operational quota files otherwise), all of their quota entries for users (-u) or groups (-g). After running quotainv, you must run quotacheck before using quotas. @@ -555,7 +555,7 @@ lfs help - setquota <-u|-g> <uname>|<uid>|<gname>|<gid> [--block-softlimit <block-softlimit>] [--block-hardlimit <block-hardlimit>] [--inode-softlimit <inode-softlimit>] [--inode-hardlimit <inode-hardlimit>] <filesystem> + setquota -u|-g uname|uid|gname|gid} [--block-softlimit block_softlimit] [--block-hardlimit block_hardlimit] [--inode-softlimit inode_softlimit] [--inode-hardlimit inode_hardlimit] /mount_point Sets file system quotas for users or groups. Limits can be specified with --{block|inode}-{softlimit|hardlimit} or their short equivalents -b, -B, -i, -I. Users can set 1, 2, 3 or 4 limits. @@ -565,7 +565,7 @@ lfs help - setquota -t <-u|-g> [--block-grace <block-grace>] [--inode-grace <inode-grace>] <filesystem> + setquota -t -u|-g [--block-grace block_grace] [--inode-grace inode_grace] /mount_point Sets the file system quota grace times for users or groups. Grace time is specified in 'XXwXXdXXhXXmXXs' format or as an integer seconds value. See . @@ -614,7 +614,7 @@ lfs help Lists inode usage per OST and MDT. $ lfs df -i List space or inode usage for a specific OST pool. - $ lfs df --pool <filesystem>[.<pool>] | <pathname> + $ lfs df --pool filesystem[.pool] | pathname List quotas of user 'bob'. $ lfs quota -u bob /mnt/lustre Show grace times for user quotas on /mnt/lustre. @@ -767,10 +767,10 @@ lfs help The e2fsck utility is run on each of the local MDS and OST device file systems and verifies that the underlying ldiskfs is consistent. After e2fsck is run, lfsck does distributed coherency checking for the Lustre file system. In most cases, e2fsck is sufficient to repair any file system issues and lfsck is not required.
Synopsis - lfsck [-c|--create] [-d|--delete] [-f|--force] [-h|--help] [-l|--lostfound] [-n|--nofix] [-v|--verbose] --mdsdb mds_database_file --ostdb ost1_database_file [ost2_database_file...] <filesystem> + lfsck [-c|--create] [-d|--delete] [-f|--force] [-h|--help] [-l|--lostfound] [-n|--nofix] [-v|--verbose] --mdsdb mds_database_file --ostdb ost1_database_file[ost2_database_file...] /mount_point - As shown, the <filesystem> parameter refers to the Lustre file system mount point. The default mount point is /mnt/lustre. + As shown, the /mount_point parameter refers to the Lustre file system mount point. The default mount point is /mnt/lustre. For lfsck, database filenames must be provided as absolute pathnames. Relative paths do not work, the databases cannot be properly opened. @@ -848,16 +848,16 @@ lfs help mds_database_file - MDS database file created by running e2fsck --mdsdb mds_database_file <device> on the MDS backing device. This is required. + MDT database file created by running e2fsck --mdsdb mds_database_file /dev/mdt_device on the MDT backing device. This is required. - --ostdb ost1_database_file - [ost2_database_file...] + --ostdb ost1_database_file + [ost2_database_file...] - OST database files created by running e2fsck --ostdb ost_database_file <device> on each of the OST backing devices. These are required unless an OST is unavailable, in which case all objects thereon are considered missing. + OST database files created by running e2fsck --ostdb ost_database_file /dev/ost_device on each of the OST backing devices. These are required unless an OST is unavailable, in which case all objects thereon are considered missing. diff --git a/index.xml b/index.xml index ab6befa..2255a99 100644 --- a/index.xml +++ b/index.xml @@ -11,7 +11,7 @@ 2011 - 2012 + 2013 Intel Corporation. (Intel modifications to the original version of this Operations Manual.)