From: Richard Henwood Date: Wed, 18 May 2011 14:44:04 +0000 (-0500) Subject: FIX: patched on additional missed content X-Git-Tag: workingxslt~20 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=c6522346d25a51ec8b61d18e499385077d81e319;p=doc%2Fmanual.git FIX: patched on additional missed content --- diff --git a/LustreOperations.xml b/LustreOperations.xml index b27ba69..21bd9bf 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -1,153 +1,56 @@ -
+ - Lustre Operations + Lustre Operations - - - - - - - - - Lustre 2.0 Operations Manual - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - C H A P T E R  13 - - - - - - - - - - - Lustre Operations - - - - + + Once you have the Lustre file system up and running, you can use the procedures in this section to perform these basic Lustre administration tasks: - Mounting by Label + - + - Starting Lustre + - + - Mounting a Server + - + - Unmounting a Server + - + - Specifying Failout/Failover Mode for OSTs + - + - Handling Degraded OST RAID Arrays + - + - Running Multiple Lustre File Systems - - - - - - Setting and Retrieving Lustre Parameters - - - - - - Specifying NIDs and Failover - - - - - - Erasing a File System - - - - - - Reclaiming Reserved Disk Space - - - - - - Replacing an Existing OST or MDS - - - - - - Identifying To Which Lustre File an OST Object Belongs + -
- <anchor xml:id="dbdoclet.50438194_pgfId-1298852" xreflabel=""/> -
- 13.1 <anchor xml:id="dbdoclet.50438194_42877" xreflabel=""/>Mounting by Label +
+ 13.1 Mounting by Label The file system name is limited to 8 characters. We have encoded the file system and target information in the disk label, so you can mount by label. This allows system administrators to move disks around without worrying about issues such as SCSI disk reordering or getting the /dev/device wrong for a shared target. Soon, file system naming will be made as fail-safe as possible. Currently, Linux disk labels are limited to 16 characters. To identify the target within the file system, 8 characters are reserved, leaving 8 characters for the file system name: <fsname>-MDT0000 or <fsname>-OST0a19 To mount by label, use this command: @@ -156,32 +59,14 @@ This is an example of mount-by-label: $ mount -t lustre -L testfs-MDT0000 /mnt/mdt - - - - - - - - - - - - - - - - Caution -Mount-by-label should NOT be used in a multi-path environment. - - - - + Mount-by-label should NOT be used in a multi-path environment. + Although the file system name is internally limited to 8 characters, you can mount the clients at any mount point, so file system users are not subjected to short names. Here is an example: mount -t lustre uml1@tcp0:/shortfs /mnt/<long-file_system-name>
-
- 13.2 <anchor xml:id="dbdoclet.50438194_24122" xreflabel=""/>Starting <anchor xml:id="dbdoclet.50438194_marker-1305696" xreflabel=""/>Lustre +
+ 13.2 Starting <anchor xml:id="dbdoclet.50438194_marker-1305696" xreflabel=""/>Lustre The startup order of Lustre components depends on whether you have a combined MGS/MDT or these components are separate. If you have a combined MGS/MDT, the recommended startup order is OSTs, then the MGS/MDT, and then clients. @@ -196,19 +81,10 @@ - - - - - - Note -If an OST is added to a Lustre file system with a combined MGS/MDT, then the startup order changes slightly; the MGS must be started first because the OST needs to write its configuration data to it. In this scenario, the startup order is MGS/MDT, then OSTs, then the clients. - - - - + If an OST is added to a Lustre file system with a combined MGS/MDT, then the startup order changes slightly; the MGS must be started first because the OST needs to write its configuration data to it. In this scenario, the startup order is MGS/MDT, then OSTs, then the clients.
-
- 13.3 <anchor xml:id="dbdoclet.50438194_84876" xreflabel=""/>Mounting a <anchor xml:id="dbdoclet.50438194_marker-1298863" xreflabel=""/>Server +
+ 13.3 Mounting a <anchor xml:id="dbdoclet.50438194_marker-1298863" xreflabel=""/>Server Starting a Lustre server is straightforward and only involves the mount command. Lustre servers can be added to /etc/fstab: mount -t lustre @@ -223,69 +99,26 @@ In general, it is wise to specify noauto and let your high-availability (HA) package manage when to mount the device. If you are not using failover, make sure that networking has been started before mounting a Lustre server. RedHat, SuSE, Debian (and perhaps others) use the _netdev flag to ensure that these disks are mounted after the network is up. We are mounting by disk label here--the label of a device can be read with e2label. The label of a newly-formatted Lustre server ends in FFFF, meaning that it has yet to be assigned. The assignment takes place when the server is first started, and the disk label is updated. - - - - + + Do not do this when the client and OSS are on the same node, as memory pressure between the client and OSS can lead to deadlocks. - - - - - - - - - - - Caution -Do not do this when the client and OSS are on the same node, as memory pressure between the client and OSS can lead to deadlocks. - - - - - - - - - - - - - - - - - - - - Caution - Mount-by-label should NOT be used in a multi-path environment. - - - - - + Mount-by-label should NOT be used in a multi-path environment. +
-
- 13.4 <anchor xml:id="dbdoclet.50438194_69255" xreflabel=""/>Unmounting a<anchor xml:id="dbdoclet.50438194_marker-1298879" xreflabel=""/> Server +
+ 13.4 Unmounting a<anchor xml:id="dbdoclet.50438194_marker-1298879" xreflabel=""/> Server To stop a Lustre server, use the umount <mount point> command. For example, to stop ost0 on mount point /mnt/test, run: $ umount /mnt/test Gracefully stopping a server with the umount command preserves the state of the connected clients. The next time the server is started, it waits for clients to reconnect, and then goes through the recovery procedure. If the force (-f) flag is used, then the server evicts all clients and stops WITHOUT recovery. Upon restart, the server does not wait for recovery. Any currently connected clients receive I/O errors until they reconnect. - - - - - - Note -If you are using loopback devices, use the -d flag. This flag cleans up loop devices and can always be safely specified. - - - - + + If you are using loopback devices, use the -d flag. This flag cleans up loop devices and can always be safely specified. +
-
- 13.5 <anchor xml:id="dbdoclet.50438194_57420" xreflabel=""/>Specifying Fail<anchor xml:id="dbdoclet.50438194_marker-1298926" xreflabel=""/>out/Failover Mode for OSTs +
+ 13.5 Specifying Fail<anchor xml:id="dbdoclet.50438194_marker-1298926" xreflabel=""/>out/Failover Mode for OSTs Lustre uses two modes, failout and failover, to handle an OST that has become unreachable because it fails, is taken off the network, is unmounted, etc. In failout mode, Lustre clients immediately receive errors (EIOs) after a timeout, instead of waiting for the OST to recover. @@ -308,39 +141,13 @@ $ mkfs.lustre --fsname=testfs --ost --mgsnode=uml1 --param="failover.mode=fa\ ilout" /dev/sdb - - - - - - - - - - - - - - - - Caution -Before running this command, unmount all OSTs that will be affected by the change in the failover/failout mode. - - - - - - - - - - Note -After initial file system configuration, use the tunefs.lustre utility to change the failover/failout mode. For example, to set the failout mode, run:$ tunefs.lustre --param failover.mode=failout <OST partition> - - - - + + Before running this command, unmount all OSTs that will be affected by the change in the failover/failout mode. + After initial file system configuration, use the tunefs.lustre utility to change the failover/failout mode. For example, to set the failout mode, run:$ tunefs.lustre --param failover.mode=failout <OST partition> +
-
- 13.6 <anchor xml:id="dbdoclet.50438194_54138" xreflabel=""/>Handling <anchor xml:id="dbdoclet.50438194_marker-1307136" xreflabel=""/>Degraded OST RAID Arrays +
+ 13.6 Handling <anchor xml:id="dbdoclet.50438194_marker-1307136" xreflabel=""/>Degraded OST RAID Arrays Lustre includes functionality that notifies Lustre if an external RAID array has degraded performance (resulting in reduced overall file system performance), either because a disk has failed and not been replaced, or because a disk was replaced and is undergoing a rebuild. To avoid a global performance slowdown due to a degraded OST, the MDS can avoid the OST for new object allocation if it is notified of the degraded state. A parameter for each OST, called degraded, specifies whether the OST is running in degraded mode or not. To mark the OST as degraded, use: @@ -354,57 +161,23 @@ ilout" /dev/sdb If the OST is remounted due to a reboot or other condition, the flag resets to 0. It is recommended that this be implemented by an automated script that monitors the status of individual RAID devices.
-
- 13.7 <anchor xml:id="dbdoclet.50438194_88063" xreflabel=""/>Running Multiple<anchor xml:id="dbdoclet.50438194_marker-1298939" xreflabel=""/> Lustre File Systems +
+ 13.7 Running Multiple<anchor xml:id="dbdoclet.50438194_marker-1298939" xreflabel=""/> Lustre File Systems There may be situations in which you want to run multiple file systems. This is doable, as long as you follow specific naming conventions. By default, the mkfs.lustre command creates a file system named lustre. To specify a different file system name (limited to 8 characters), run this command: mkfs.lustre --fsname=<new file system name> - - - - - - Note -The MDT, OSTs and clients in the new file system must share the same name (prepended to the device name). For example, for a new file system named foo, the MDT and two OSTs would be named foo-MDT0000, foo-OST0000, and foo-OST0001. - - - - + The MDT, OSTs and clients in the new file system must share the same name (prepended to the device name). For example, for a new file system named foo, the MDT and two OSTs would be named foo-MDT0000, foo-OST0000, and foo-OST0001. + To mount a client on the file system, run: mount -t lustre mgsnode:/<new fsname> <mountpoint> For example, to mount a client on file system foo at mount point /mnt/lustre1, run: mount -t lustre mgsnode:/foo /mnt/lustre1 - - - - - - Note - If a client(s) will be mounted on several file systems, add the following line to /etc/xattr.conf file to avoid problems when files are moved between the file systems: lustre.* skip - - - - - - - - - - Note -The MGS is universal; there is only one MGS per Lustre installation, not per file system. - - - - - - - - - - Note -There is only one file system per MDT. Therefore, specify --mdt --mgs on one file system and --mdt --mgsnode=<MGS node NID> on the other file systems. - - - - + If a client(s) will be mounted on several file systems, add the following line to /etc/xattr.conf file to avoid problems when files are moved between the file systems: lustre.* skip + The MGS is universal; there is only one MGS per Lustre installation, not per file system. + There is only one file system per MDT. Therefore, specify --mdt --mgs on one file system and --mdt --mgsnode=<MGS node NID> on the other file systems. + A Lustre installation with two file systems (foo and bar) could look like this, where the MGS node is mgsnode@tcp0 and the mount points are /mnt/lustre1 and /mnt/lustre2. mgsnode# mkfs.lustre --mgs /mnt/lustre1 mdtfoonode# mkfs.lustre --fsname=foo --mdt --mgsnode=mgsnode@tcp0 /mnt/lust\ @@ -427,27 +200,18 @@ re2 mount -t lustre mgsnode@tcp0:/bar /mnt/lustre2
-
- 13.8 <anchor xml:id="dbdoclet.50438194_88980" xreflabel=""/>Setting <anchor xml:id="dbdoclet.50438194_marker-1302467" xreflabel=""/>and Retrieving Lustre Parameters +
+ 13.8 Setting <anchor xml:id="dbdoclet.50438194_marker-1302467" xreflabel=""/>and Retrieving Lustre Parameters Several options are available for setting parameters in Lustre: When creating a file system, use mkfs.lustre. See Setting Parameters with mkfs.lustre below. - - - When a server is stopped, use tunefs.lustre. See Setting Parameters with tunefs.lustre below. - - - When the file system is running, use lctl to set or retrieve Lustre parameters. See Setting Parameters with lctl and Reporting Current Parameter Values below. - - -
<anchor xml:id="dbdoclet.50438194_pgfId-1301648" xreflabel=""/>13.8.1 <anchor xml:id="dbdoclet.50438194_17237" xreflabel=""/>Setting Parameters with <anchor xml:id="dbdoclet.50438194_marker-1305722" xreflabel=""/>mkfs.lustre @@ -471,17 +235,8 @@ re2
<anchor xml:id="dbdoclet.50438194_pgfId-1301773" xreflabel=""/>13.8.3 <anchor xml:id="dbdoclet.50438194_51490" xreflabel=""/>Setting Parameters <anchor xml:id="dbdoclet.50438194_marker-1305718" xreflabel=""/>with lctl When the file system is running, the lctl command can be used to set parameters (temporary or permanent) and report current parameter values. Temporary parameters are active as long as the server or client is not shut down. Permanent parameters live through server and client reboots. - - - - - - Note -The lctl list_param command enables users to list all parameters that can be set. See Listing Parameters. - - - - - For more details about the lctl command, see the examples in the sections below and Chapter 36: System Configuration Utilities. + The lctl list_param command enables users to list all parameters that can be set. See . + For more details about the lctl command, see the examples in the sections below and .
<anchor xml:id="dbdoclet.50438194_pgfId-1307025" xreflabel=""/>13.8.3.1 Setting Temporary Parameters Use lctl set_param to set temporary parameters on the node where it is run. These parameters map to items in /proc/{fs,sys}/{lnet,lustre}. The lctl set_param command uses this syntax: @@ -510,26 +265,7 @@ re2 $ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15 $ lctl conf_param testfs.sys.timeout=40 - - - - - - - - - - - - - - - - Caution -Parameters specified with the lctlconf_param command are set permanently in the file system’s configuration file on the MGS. - - - - + Parameters specified with the lctlconf_param command are set permanently in the file system’s configuration file on the MGS.
<anchor xml:id="dbdoclet.50438194_pgfId-1305661" xreflabel=""/>13.8.3.3 <anchor xml:id="dbdoclet.50438194_88217" xreflabel=""/>Listing Parameters @@ -562,8 +298,8 @@ re2
-
- 13.9 <anchor xml:id="dbdoclet.50438194_41817" xreflabel=""/><anchor xml:id="dbdoclet.50438194_42379" xreflabel=""/><anchor xml:id="dbdoclet.50438194_50129" xreflabel=""/>Specifying NIDs and Fail<anchor xml:id="dbdoclet.50438194_marker-1306313" xreflabel=""/>over +
+ 13.9 <anchor xml:id="dbdoclet.50438194_42379" xreflabel=""/><anchor xml:id="dbdoclet.50438194_50129" xreflabel=""/>Specifying NIDs and Fail<anchor xml:id="dbdoclet.50438194_marker-1306313" xreflabel=""/>over If a node has multiple network interfaces, it may have multiple NIDs. When a node is specified, all of its NIDs must be listed, delimited by commas (,) so other nodes can choose the NID that is appropriate for their network interfaces. When failover nodes are specified, they are delimited by a colon (:) or by repeating a keyword (--mgsnode= or --failnode=). To obtain all NIDs from a node (while LNET is running), run: lctl list_nids @@ -580,60 +316,43 @@ n --mgsnode=uml2,2@elan /dev/sdb uml2> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status Where multiple NIDs are specified, comma-separation (for example, uml2,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the "best" one for communication. Colon-separation (for example, uml1:uml2) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.) - - - - - - Note -If you have an MGS or MDT configured for failover, perform these steps: 1. On the OST, list the NIDs of all MGS nodes at mkfs time.OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1 --mgsnode=10.0.0.2 /dev/{device} 2. On the client, mount the file system.client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/ - - - - + If you have an MGS or MDT configured for failover, perform these steps: 1. On the OST, list the NIDs of all MGS nodes at mkfs time.OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1 --mgsnode=10.0.0.2 /dev/{device} 2. On the client, mount the file system.client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/
-
- 13.10 <anchor xml:id="dbdoclet.50438194_70905" xreflabel=""/>Erasing a <anchor xml:id="dbdoclet.50438194_marker-1307237" xreflabel=""/>File System +
+ 13.10 Erasing a <anchor xml:id="dbdoclet.50438194_marker-1307237" xreflabel=""/>File System If you want to erase a file system, run this command on your targets: $ "mkfs.lustre -reformat" If you are using a separate MGS and want to keep other file systems defined on that MGS, then set the writeconf flag on the MDT for that file system. The writeconf flag causes the configuration logs to be erased; they are regenerated the next time the servers start. To set the writeconf flag on the MDT: - 1. Unmount all clients/servers using this file system, run: + + Unmount all clients/servers using this file system, run: $ umount /mnt/lustre - 2. Erase the file system and, presumably, replace it with another file system, run: + + Erase the file system and, presumably, replace it with another file system, run: $ mkfs.lustre -reformat --fsname spfs --mdt --mgs /dev/sda - 3. If you have a separate MGS (that you do not want to reformat), then add the "writeconf" flag to mkfs.lustre on the MDT, run: + + If you have a separate MGS (that you do not want to reformat), then add the "writeconf" flag to mkfs.lustre on the MDT, run: $ mkfs.lustre --reformat --writeconf -fsname spfs --mdt \ --mgs /dev/sda - - - - - - Note -If you have a combined MGS/MDT, reformatting the MDT reformats the MGS as well, causing all configuration information to be lost; you can start building your new file system. Nothing needs to be done with old disks that will not be part of the new file system, just do not mount them. - - - - + + If you have a combined MGS/MDT, reformatting the MDT reformats the MGS as well, causing all configuration information to be lost; you can start building your new file system. Nothing needs to be done with old disks that will not be part of the new file system, just do not mount them.
-
- 13.11 <anchor xml:id="dbdoclet.50438194_16954" xreflabel=""/>Reclaiming <anchor xml:id="dbdoclet.50438194_marker-1307251" xreflabel=""/>Reserved Disk Space +
+ 13.11 Reclaiming <anchor xml:id="dbdoclet.50438194_marker-1307251" xreflabel=""/>Reserved Disk Space All current Lustre installations run the ldiskfs file system internally on service nodes. By default, ldiskfs reserves 5% of the disk space for the root user. In order to reclaim this space, run the following command on your OSSs: tune2fs [-m reserved_blocks_percent] [device] You do not need to shut down Lustre before running this command or restart it afterwards.
-
- 13.12 <anchor xml:id="dbdoclet.50438194_69998" xreflabel=""/>Replacing an Existing <anchor xml:id="dbdoclet.50438194_marker-1307278" xreflabel=""/>OST or MDS +
+ 13.12 Replacing an Existing <anchor xml:id="dbdoclet.50438194_marker-1307278" xreflabel=""/>OST or MDS To copy the contents of an existing OST to a new OST (or an old MDS to a new MDS), use one of these methods: Connect the old OST disk and new OST disk to a single machine, mount both, and use rsync to copy all data between the OST file systems. - - - For example: mount -t ldiskfs /dev/old /mnt/ost_old @@ -644,27 +363,22 @@ n --mgsnode=uml2,2@elan /dev/sdb If you are unable to connect both sets of disk to the same computer, use rsync to copy over the network using rsh (or ssh with -e ssh): - - - rsync -aSvz /mnt/ost_old/ new_ost_node:/mnt/ost_new Use the same procedure for the MDS, with one additional step: - - - cd /mnt/mds_old; getfattr -R -e base64 -d . > /tmp/mdsea; \<copy all MDS file\ s as above>; cd /mnt/mds_new; setfattr \--restore=/tmp/mdsea
-
- 13.13 <anchor xml:id="dbdoclet.50438194_30872" xreflabel=""/>Identifying To Which Lustre File an OST Object Belongs +
+ 13.13 Identifying To Which Lustre File an OST Object Belongs Use this procedure to identify the file containing a given object on a given OST. - 1. On the OST (as root), run debugfs to display the file identifier (FID) of the file associated with the object. + + On the OST (as root), run debugfs to display the file identifier (FID) of the file associated with the object. For example, if the object is 34976 on /dev/lustre/ost_test2, the debug command is: # debugfs -c -R "stat /O/0/d$((34976 %32))/34976" /dev/lustre/ost_test2 @@ -690,8 +404,9 @@ ps (0-63):47968-48031 TOTAL: 64 - 2. Note the FID’s EA and apply it to the osd_inode_id mapping. - In this example, the FID’s EA is: + + Note the FID's EA and apply it to the osd_inode_id mapping. + In this example, the FID's EA is: e2001100000000002543c18700000000a0880000000000000000000000000000 struct osd_inode_id { __u64 oii_ino; /* inode number */ @@ -700,7 +415,8 @@ ps }; After swapping, you get an inode number of 0x001100e2 and generation of 0. - 3. On the MDT (as root), use debugfs to find the file associated with the inode. + + On the MDT (as root), use debugfs to find the file associated with the inode. # debugfs -c -R "ncheck 0x001100e2" /dev/lustre/mdt_test Here is the command output: @@ -710,78 +426,10 @@ s Inode Pathname 1114338 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP + The command lists the inode and pathname associated with the object. - - - - - - Note -Debugfs' ''ncheck'' is a brute-force search that may take a long time to complete. - - - - - - - - - - Note -To find the Lustre file from a disk LBA, follow the steps listed in the document at this URL: http://smartmontools.sourceforge.net/badblockhowto.html. Then, follow the steps above to resolve the Lustre filename. - - - - - - - - - - - - - - - - - - - Lustre 2.0 Operations Manual - 821-2076-10 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Copyright © 2011, Oracle and/or its affiliates. All rights reserved. -
+ Debugfs' ''ncheck'' is a brute-force search that may take a long time to complete. + To find the Lustre file from a disk LBA, follow the steps listed in the document at this URL: http://smartmontools.sourceforge.net/badblockhowto.html. Then, follow the steps above to resolve the Lustre filename. +
-
+ diff --git a/ManagingStripingFreeSpace.xml b/ManagingStripingFreeSpace.xml index b234a84..71ea876 100644 --- a/ManagingStripingFreeSpace.xml +++ b/ManagingStripingFreeSpace.xml @@ -129,7 +129,7 @@ 4 4 0x4 0 5 2 0x2 0 - This is in contrast to the output in Setting the Stripe Size which shows only a single object for the file. + This is in contrast to the output in that shows only a single object for the file.
diff --git a/index.xml b/index.xml index dba85f6..d237096 100644 --- a/index.xml +++ b/index.xml @@ -31,6 +31,7 @@ +