X-Git-Url: https://git.whamcloud.com/?a=blobdiff_plain;f=LustreOperations.xml;h=51daff3308c3b0b2d34a3c6e9ddf7a9c88aa4b1b;hb=refs%2Fchanges%2F62%2F34062%2F3;hp=b27ba693d47cf5b900f32e65d3017f19c8c2064c;hpb=a78713b1b7b1584abdfb262557d4ab2ed6feaa02;p=doc%2Fmanual.git diff --git a/LustreOperations.xml b/LustreOperations.xml index b27ba69..51daff3 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -1,787 +1,1020 @@ - -
- - Lustre Operations - - - - - - - - - - Lustre 2.0 Operations Manual - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - C H A P T E R  13 - - - - - - - - - - - Lustre Operations - - - - - Once you have the Lustre file system up and running, you can use the procedures in this section to perform these basic Lustre administration tasks: - - Mounting by Label - - - - - - Starting Lustre - - - - - - Mounting a Server - - - - - - Unmounting a Server - - - - - - Specifying Failout/Failover Mode for OSTs - - - - - - Handling Degraded OST RAID Arrays - - - - - - Running Multiple Lustre File Systems - - - - - - Setting and Retrieving Lustre Parameters - - - - - - Specifying NIDs and Failover - - - - - - Erasing a File System - - - - - - Reclaiming Reserved Disk Space - - - - - - Replacing an Existing OST or MDS - - - - - - Identifying To Which Lustre File an OST Object Belongs - - - - - -
- <anchor xml:id="dbdoclet.50438194_pgfId-1298852" xreflabel=""/> -
- 13.1 <anchor xml:id="dbdoclet.50438194_42877" xreflabel=""/>Mounting by Label - The file system name is limited to 8 characters. We have encoded the file system and target information in the disk label, so you can mount by label. This allows system administrators to move disks around without worrying about issues such as SCSI disk reordering or getting the /dev/device wrong for a shared target. Soon, file system naming will be made as fail-safe as possible. Currently, Linux disk labels are limited to 16 characters. To identify the target within the file system, 8 characters are reserved, leaving 8 characters for the file system name: - <fsname>-MDT0000 or <fsname>-OST0a19 - To mount by label, use this command: - $ mount -t lustre -L <file system label> <mount point> - - This is an example of mount-by-label: - $ mount -t lustre -L testfs-MDT0000 /mnt/mdt - - - - - - - - - - - - - - - - - Caution -Mount-by-label should NOT be used in a multi-path environment. - - - - - Although the file system name is internally limited to 8 characters, you can mount the clients at any mount point, so file system users are not subjected to short names. Here is an example: - mount -t lustre uml1@tcp0:/shortfs /mnt/<long-file_system-name> + + + Lustre Operations + Once you have the Lustre file system up and running, you can use the + procedures in this section to perform these basic Lustre administration + tasks. +
+ + <indexterm> + <primary>operations</primary> + </indexterm> + <indexterm> + <primary>operations</primary> + <secondary>mounting by label</secondary> + </indexterm>Mounting by Label + The file system name is limited to 8 characters. We have encoded the + file system and target information in the disk label, so you can mount by + label. This allows system administrators to move disks around without + worrying about issues such as SCSI disk reordering or getting the + /dev/device wrong for a shared target. Soon, file system + naming will be made as fail-safe as possible. Currently, Linux disk labels + are limited to 16 characters. To identify the target within the file + system, 8 characters are reserved, leaving 8 characters for the file system + name: + +fsname-MDT0000 or +fsname-OST0a19 + + To mount by label, use this command: + +mount -t lustre -L +file_system_label +/mount_point + + This is an example of mount-by-label: + +mds# mount -t lustre -L testfs-MDT0000 /mnt/mdt + + + Mount-by-label should NOT be used in a multi-path environment or + when snapshots are being created of the device, since multiple block + devices will have the same label. + + Although the file system name is internally limited to 8 characters, + you can mount the clients at any mount point, so file system users are not + subjected to short names. Here is an example: + +client# mount -t lustre mds0@tcp0:/short +/dev/long_mountpoint_name +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>starting</secondary> + </indexterm>Starting Lustre + On the first start of a Lustre file system, the components must be + started in the following order: + + + Mount the MGT. + + If a combined MGT/MDT is present, Lustre will correctly mount + the MGT and MDT automatically. + + + + Mount the MDT. + + Mount all MDTs if multiple MDTs are + present. + + + + Mount the OST(s). + + + Mount the client(s). + + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>mounting</secondary> + </indexterm>Mounting a Server + Starting a Lustre server is straightforward and only involves the + mount command. Lustre servers can be added to + /etc/fstab: + +mount -t lustre + + The mount command generates output similar to this: + +/dev/sda1 on /mnt/test/mdt type lustre (rw) +/dev/sda2 on /mnt/test/ost0 type lustre (rw) +192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw) + + In this example, the MDT, an OST (ost0) and file system (testfs) are + mounted. + +LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0 +LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 + + In general, it is wise to specify noauto and let your + high-availability (HA) package manage when to mount the device. If you are + not using failover, make sure that networking has been started before + mounting a Lustre server. If you are running Red Hat Enterprise Linux, SUSE + Linux Enterprise Server, Debian operating system (and perhaps others), use + the + _netdev flag to ensure that these disks are mounted after + the network is up. + We are mounting by disk label here. The label of a device can be read + with + e2label. The label of a newly-formatted Lustre server + may end in + FFFF if the + --index option is not specified to + mkfs.lustre, meaning that it has yet to be assigned. The + assignment takes place when the server is first started, and the disk label + is updated. It is recommended that the + --index option always be used, which will also ensure + that the label is set at format time. + + Do not do this when the client and OSS are on the same node, as + memory pressure between the client and OSS can lead to deadlocks. + + + Mount-by-label should NOT be used in a multi-path + environment. + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>shutdownLustre</secondary> + </indexterm>Stopping the Filesystem + A complete Lustre filesystem shutdown occurs by unmounting all + clients and servers in the order shown below. Please note that unmounting + a block device causes the Lustre software to be shut down on that node. + + Please note that the -a -t lustre in the + commands below is not the name of a filesystem, but rather is + specifying to unmount all entries in /etc/mtab that are of type + lustre + + Unmount the clients + On each client node, unmount the filesystem on that client + using the umount command: + umount -a -t lustre + The example below shows the unmount of the + testfs filesystem on a client node: + [root@client1 ~]# mount |grep testfs +XXX.XXX.0.11@tcp:/testfs on /mnt/testfs type lustre (rw,lazystatfs) + +[root@client1 ~]# umount -a -t lustre +[154523.177714] Lustre: Unmounted testfs-client + + Unmount the MDT and MGT + On the MGS and MDS node(s), use the umount + command: + umount -a -t lustre + The example below shows the unmount of the MDT and MGT for + the testfs filesystem on a combined MGS/MDS: + + [root@mds1 ~]# mount |grep lustre +/dev/sda on /mnt/mgt type lustre (ro) +/dev/sdb on /mnt/mdt type lustre (ro) + +[root@mds1 ~]# umount -a -t lustre +[155263.566230] Lustre: Failing over testfs-MDT0000 +[155263.775355] Lustre: server umount testfs-MDT0000 complete +[155269.843862] Lustre: server umount MGS complete + For a seperate MGS and MDS, the same command is used, first on + the MDS and then followed by the MGS. + + Unmount all the OSTs + On each OSS node, use the umount command: + + umount -a -t lustre + The example below shows the unmount of all OSTs for the + testfs filesystem on server + OSS1: + + [root@oss1 ~]# mount |grep lustre +/dev/sda on /mnt/ost0 type lustre (ro) +/dev/sdb on /mnt/ost1 type lustre (ro) +/dev/sdc on /mnt/ost2 type lustre (ro) + +[root@oss1 ~]# umount -a -t lustre +[155336.491445] Lustre: Failing over testfs-OST0002 +[155336.556752] Lustre: server umount testfs-OST0002 complete + + + For unmount command syntax for a single OST, MDT, or MGT target + please refer to +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>unmounting</secondary> + </indexterm>Unmounting a Specific Target on a Server + To stop a Lustre OST, MDT, or MGT , use the + umount + /mount_point command. + The example below stops an OST, ost0, on mount + point /mnt/ost0 for the testfs + filesystem: + [root@oss1 ~]# umount /mnt/ost0 +[ 385.142264] Lustre: Failing over testfs-OST0000 +[ 385.210810] Lustre: server umount testfs-OST0000 complete + Gracefully stopping a server with the + umount command preserves the state of the connected + clients. The next time the server is started, it waits for clients to + reconnect, and then goes through the recovery procedure. + If the force ( + -f) flag is used, then the server evicts all clients and + stops WITHOUT recovery. Upon restart, the server does not wait for + recovery. Any currently connected clients receive I/O errors until they + reconnect. + + If you are using loopback devices, use the + -d flag. This flag cleans up loop devices and can + always be safely specified. + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>failover</secondary> + </indexterm>Specifying Failout/Failover Mode for OSTs + In a Lustre file system, an OST that has become unreachable because + it fails, is taken off the network, or is unmounted can be handled in one + of two ways: + + + In + failout mode, Lustre clients immediately receive + errors (EIOs) after a timeout, instead of waiting for the OST to + recover. + + + In + failover mode, Lustre clients wait for the OST to + recover. + + + By default, the Lustre file system uses + failover mode for OSTs. To specify + failout mode instead, use the + --param="failover.mode=failout" option as shown below + (entered on one line): + +oss# mkfs.lustre --fsname= +fsname --mgsnode= +mgs_NID --param=failover.mode=failout + --ost --index= +ost_index +/dev/ost_block_device + + In the example below, + failout mode is specified for the OSTs on the MGS + mds0 in the file system + testfs(entered on one line). + +oss# mkfs.lustre --fsname=testfs --mgsnode=mds0 --param=failover.mode=failout + --ost --index=3 /dev/sdb + + + Before running this command, unmount all OSTs that will be affected + by a change in + failover/ + failout mode. + + + After initial file system configuration, use the + tunefs.lustre utility to change the mode. For example, + to set the + failout mode, run: + + +$ tunefs.lustre --param failover.mode=failout +/dev/ost_device + + + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>degraded OST RAID</secondary> + </indexterm>Handling Degraded OST RAID Arrays + Lustre includes functionality that notifies Lustre if an external + RAID array has degraded performance (resulting in reduced overall file + system performance), either because a disk has failed and not been + replaced, or because a disk was replaced and is undergoing a rebuild. To + avoid a global performance slowdown due to a degraded OST, the MDS can + avoid the OST for new object allocation if it is notified of the degraded + state. + A parameter for each OST, called + degraded, specifies whether the OST is running in + degraded mode or not. + To mark the OST as degraded, use: + +lctl set_param obdfilter.{OST_name}.degraded=1 + + To mark that the OST is back in normal operation, use: + +lctl set_param obdfilter.{OST_name}.degraded=0 + + To determine if OSTs are currently in degraded mode, use: + +lctl get_param obdfilter.*.degraded + + If the OST is remounted due to a reboot or other condition, the flag + resets to + 0. + It is recommended that this be implemented by an automated script + that monitors the status of individual RAID devices, such as MD-RAID's + mdadm(8) command with the --monitor + option to mark an affected device degraded or restored. +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>multiple file systems</secondary> + </indexterm>Running Multiple Lustre File Systems + Lustre supports multiple file systems provided the combination of + NID:fsname is unique. Each file system must be allocated + a unique name during creation with the + --fsname parameter. Unique names for file systems are + enforced if a single MGS is present. If multiple MGSs are present (for + example if you have an MGS on every MDS) the administrator is responsible + for ensuring file system names are unique. A single MGS and unique file + system names provides a single point of administration and allows commands + to be issued against the file system even if it is not mounted. + Lustre supports multiple file systems on a single MGS. With a single + MGS fsnames are guaranteed to be unique. Lustre also allows multiple MGSs + to co-exist. For example, multiple MGSs will be necessary if multiple file + systems on different Lustre software versions are to be concurrently + available. With multiple MGSs additional care must be taken to ensure file + system names are unique. Each file system should have a unique fsname among + all systems that may interoperate in the future. + By default, the + mkfs.lustre command creates a file system named + lustre. To specify a different file system name (limited + to 8 characters) at format time, use the + --fsname option: + + +mkfs.lustre --fsname= +file_system_name + + + + The MDT, OSTs and clients in the new file system must use the same + file system name (prepended to the device name). For example, for a new + file system named + foo, the MDT and two OSTs would be named + foo-MDT0000, + foo-OST0000, and + foo-OST0001. + + To mount a client on the file system, run: + +client# mount -t lustre +mgsnode: +/new_fsname +/mount_point + + For example, to mount a client on file system foo at mount point + /mnt/foo, run: + +client# mount -t lustre mgsnode:/foo /mnt/foo + + + If a client(s) will be mounted on several file systems, add the + following line to + /etc/xattr.conf file to avoid problems when files are + moved between the file systems: + lustre.* skip + + + To ensure that a new MDT is added to an existing MGS create the MDT + by specifying: + --mdt --mgsnode= + mgs_NID. + + A Lustre installation with two file systems ( + foo and + bar) could look like this, where the MGS node is + mgsnode@tcp0 and the mount points are + /mnt/foo and + /mnt/bar. + +mgsnode# mkfs.lustre --mgs /dev/sda +mdtfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --mdt --index=0 +/dev/sdb +ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=0 +/dev/sda +ossfoonode# mkfs.lustre --fsname=foo --mgsnode=mgsnode@tcp0 --ost --index=1 +/dev/sdb +mdtbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --mdt --index=0 +/dev/sda +ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=0 +/dev/sdc +ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 +/dev/sdd + + To mount a client on file system foo at mount point + /mnt/foo, run: + +client# mount -t lustre mgsnode@tcp0:/foo /mnt/foo + + To mount a client on file system bar at mount point + /mnt/bar, run: + +client# mount -t lustre mgsnode@tcp0:/bar /mnt/bar + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>remote directory</secondary> + </indexterm>Creating a sub-directory on a given MDT + Lustre 2.4 enables individual sub-directories to be serviced by + unique MDTs. An administrator can allocate a sub-directory to a given MDT + using the command: + +client# lfs mkdir –i +mdt_index +/mount_point/remote_dir + + This command will allocate the sub-directory + remote_dir onto the MDT of index + mdt_index. For more information on adding additional MDTs + and + mdt_index see + . + + An administrator can allocate remote sub-directories to separate + MDTs. Creating remote sub-directories in parent directories not hosted on + MDT0 is not recommended. This is because the failure of the parent MDT + will leave the namespace below it inaccessible. For this reason, by + default it is only possible to create remote sub-directories off MDT0. To + relax this restriction and enable remote sub-directories off any MDT, an + administrator must issue the following command on the MGS: + mgs# lctl conf_param fsname.mdt.enable_remote_dir=1 + For Lustre filesystem 'scratch', the command executed is: + mgs# lctl conf_param scratch.mdt.enable_remote_dir=1 + To verify the configuration setting execute the following command on any + MDS: + mds# lctl get_param mdt.*.enable_remote_dir + + With Lustre software version 2.8, a new + tunable is available to allow users with a specific group ID to create + and delete remote and striped directories. This tunable is + enable_remote_dir_gid. For example, setting this + parameter to the 'wheel' or 'admin' group ID allows users with that GID + to create and delete remote and striped directories. Setting this + parameter to -1 on MDT0 to permanently allow any + non-root users create and delete remote and striped directories. + On the MGS execute the following command: + mgs# lctl conf_param fsname.mdt.enable_remote_dir_gid=-1 + For the Lustre filesystem 'scratch', the commands expands to: + mgs# lctl conf_param scratch.mdt.enable_remote_dir_gid=-1. + The change can be verified by executing the following command on every MDS: + mds# lctl get_param mdt.*.enable_remote_dir_gid + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>striped directory</secondary> + </indexterm> + <indexterm> + <primary>operations</primary> + <secondary>mkdir</secondary> + </indexterm> + <indexterm> + <primary>operations</primary> + <secondary>setdirstripe</secondary> + </indexterm> + <indexterm> + <primary>striping</primary> + <secondary>metadata</secondary> + </indexterm>Creating a directory striped across multiple MDTs + The Lustre 2.8 DNE feature enables individual files in a given + directory to store their metadata on separate MDTs (a striped + directory) once additional MDTs have been added to the + filesystem, see . + The result of this is that metadata requests for + files in a striped directory are serviced by multiple MDTs and metadata + service load is distributed over all the MDTs that service a given + directory. By distributing metadata service load over multiple MDTs, + performance can be improved beyond the limit of single MDT + performance. Prior to the development of this feature all files in a + directory must record their metadata on a single MDT. + This command to stripe a directory over + mdt_count MDTs is: + + +client# lfs mkdir -c +mdt_count +/mount_point/new_directory + + The striped directory feature is most useful for distributing + single large directories (50k entries or more) across multiple MDTs, + since it incurs more overhead than non-striped directories. +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>parameters</secondary> + </indexterm>Setting and Retrieving Lustre Parameters + Several options are available for setting parameters in + Lustre: + + + When creating a file system, use mkfs.lustre. See + below. + + + When a server is stopped, use tunefs.lustre. See + below. + + + When the file system is running, use lctl to set or retrieve + Lustre parameters. See + and + below. + + +
+ Setting Tunable Parameters with + <literal>mkfs.lustre</literal> + When the file system is first formatted, parameters can simply be + added as a + --param option to the + mkfs.lustre command. For example: + +mds# mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda + + For more details about creating a file system,see + . For more details about + mkfs.lustre, see + .
-
- 13.2 <anchor xml:id="dbdoclet.50438194_24122" xreflabel=""/>Starting <anchor xml:id="dbdoclet.50438194_marker-1305696" xreflabel=""/>Lustre - The startup order of Lustre components depends on whether you have a combined MGS/MDT or these components are separate. - - If you have a combined MGS/MDT, the recommended startup order is OSTs, then the MGS/MDT, and then clients. - - - - - - If the MGS and MDT are separate, the recommended startup order is: MGS, then OSTs, then the MDT, and then clients. - - - - - - - - - - - Note -If an OST is added to a Lustre file system with a combined MGS/MDT, then the startup order changes slightly; the MGS must be started first because the OST needs to write its configuration data to it. In this scenario, the startup order is MGS/MDT, then OSTs, then the clients. - - - - -
-
- 13.3 <anchor xml:id="dbdoclet.50438194_84876" xreflabel=""/>Mounting a <anchor xml:id="dbdoclet.50438194_marker-1298863" xreflabel=""/>Server - Starting a Lustre server is straightforward and only involves the mount command. Lustre servers can be added to /etc/fstab: - mount -t lustre - - The mount command generates output similar to this: - /dev/sda1 on /mnt/test/mdt type lustre (rw) -/dev/sda2 on /mnt/test/ost0 type lustre (rw) -192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw) - - In this example, the MDT, an OST (ost0) and file system (testfs) are mounted. - LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0 -LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 - - In general, it is wise to specify noauto and let your high-availability (HA) package manage when to mount the device. If you are not using failover, make sure that networking has been started before mounting a Lustre server. RedHat, SuSE, Debian (and perhaps others) use the _netdev flag to ensure that these disks are mounted after the network is up. - We are mounting by disk label here--the label of a device can be read with e2label. The label of a newly-formatted Lustre server ends in FFFF, meaning that it has yet to be assigned. The assignment takes place when the server is first started, and the disk label is updated. - - - - - - - - - - - - - - - - Caution -Do not do this when the client and OSS are on the same node, as memory pressure between the client and OSS can lead to deadlocks. - - - - - - - - - - - - - - - - - - - - Caution - Mount-by-label should NOT be used in a multi-path environment. - - - - - -
-
- 13.4 <anchor xml:id="dbdoclet.50438194_69255" xreflabel=""/>Unmounting a<anchor xml:id="dbdoclet.50438194_marker-1298879" xreflabel=""/> Server - To stop a Lustre server, use the umount <mount point> command. - For example, to stop ost0 on mount point /mnt/test, run: - $ umount /mnt/test - - Gracefully stopping a server with the umount command preserves the state of the connected clients. The next time the server is started, it waits for clients to reconnect, and then goes through the recovery procedure. - If the force (-f) flag is used, then the server evicts all clients and stops WITHOUT recovery. Upon restart, the server does not wait for recovery. Any currently connected clients receive I/O errors until they reconnect. - - - - - - Note -If you are using loopback devices, use the -d flag. This flag cleans up loop devices and can always be safely specified. - - - - -
-
- 13.5 <anchor xml:id="dbdoclet.50438194_57420" xreflabel=""/>Specifying Fail<anchor xml:id="dbdoclet.50438194_marker-1298926" xreflabel=""/>out/Failover Mode for OSTs - Lustre uses two modes, failout and failover, to handle an OST that has become unreachable because it fails, is taken off the network, is unmounted, etc. - - In failout mode, Lustre clients immediately receive errors (EIOs) after a timeout, instead of waiting for the OST to recover. - - - - - - In failover mode, Lustre clients wait for the OST to recover. - - - - - - By default, the Lustre file system uses failover mode for OSTs. To specify failout mode instead, run this command: - $ mkfs.lustre --fsname=<fsname> --ost --mgsnode=<MGS node NID> --param="failover\ -.mode=failout" <block device name> - - In this example, failout mode is specified for the OSTs on MGS uml1, file system testfs. - $ mkfs.lustre --fsname=testfs --ost --mgsnode=uml1 --param="failover.mode=fa\ -ilout" /dev/sdb - - - - - - - - - - - - - - - - - Caution -Before running this command, unmount all OSTs that will be affected by the change in the failover/failout mode. - - - - - - - - - - Note -After initial file system configuration, use the tunefs.lustre utility to change the failover/failout mode. For example, to set the failout mode, run:$ tunefs.lustre --param failover.mode=failout <OST partition> - - - - -
-
- 13.6 <anchor xml:id="dbdoclet.50438194_54138" xreflabel=""/>Handling <anchor xml:id="dbdoclet.50438194_marker-1307136" xreflabel=""/>Degraded OST RAID Arrays - Lustre includes functionality that notifies Lustre if an external RAID array has degraded performance (resulting in reduced overall file system performance), either because a disk has failed and not been replaced, or because a disk was replaced and is undergoing a rebuild. To avoid a global performance slowdown due to a degraded OST, the MDS can avoid the OST for new object allocation if it is notified of the degraded state. - A parameter for each OST, called degraded, specifies whether the OST is running in degraded mode or not. - To mark the OST as degraded, use: - lctl set_param obdfilter.{OST_name}.degraded=1 - To mark that the OST is back in normal operation, use: - lctl set_param obdfilter.{OST_name}.degraded=0 - - To determine if OSTs are currently in degraded mode, use: - lctl get_param obdfilter.*.degraded - - If the OST is remounted due to a reboot or other condition, the flag resets to 0. - It is recommended that this be implemented by an automated script that monitors the status of individual RAID devices. +
+ Setting Parameters with + <literal>tunefs.lustre</literal> + If a server (OSS or MDS) is stopped, parameters can be added to an + existing file system using the + --param option to the + tunefs.lustre command. For example: + +oss# tunefs.lustre --param=failover.node=192.168.0.13@tcp0 /dev/sda + + With + tunefs.lustre, parameters are + additive-- new parameters are specified in addition + to old parameters, they do not replace them. To erase all old + tunefs.lustre parameters and just use newly-specified + parameters, run: + +mds# tunefs.lustre --erase-params --param= +new_parameters + + The tunefs.lustre command can be used to set any parameter settable + via lctl conf_param and that has its own OBD device, + so it can be specified as + + obdname|fsname. + obdtype. + proc_file_name= + value. For example: + +mds# tunefs.lustre --param mdt.identity_upcall=NONE /dev/sda1 + + For more details about + tunefs.lustre, see + .
-
- 13.7 <anchor xml:id="dbdoclet.50438194_88063" xreflabel=""/>Running Multiple<anchor xml:id="dbdoclet.50438194_marker-1298939" xreflabel=""/> Lustre File Systems - There may be situations in which you want to run multiple file systems. This is doable, as long as you follow specific naming conventions. - By default, the mkfs.lustre command creates a file system named lustre. To specify a different file system name (limited to 8 characters), run this command: - mkfs.lustre --fsname=<new file system name> - - - - - - Note -The MDT, OSTs and clients in the new file system must share the same name (prepended to the device name). For example, for a new file system named foo, the MDT and two OSTs would be named foo-MDT0000, foo-OST0000, and foo-OST0001. - - - - - To mount a client on the file system, run: - mount -t lustre mgsnode:/<new fsname> <mountpoint> - - For example, to mount a client on file system foo at mount point /mnt/lustre1, run: - mount -t lustre mgsnode:/foo /mnt/lustre1 - - - - - - - Note - If a client(s) will be mounted on several file systems, add the following line to /etc/xattr.conf file to avoid problems when files are moved between the file systems: lustre.* skip - - - - - - - - - - Note -The MGS is universal; there is only one MGS per Lustre installation, not per file system. - - - - - - - - - - Note -There is only one file system per MDT. Therefore, specify --mdt --mgs on one file system and --mdt --mgsnode=<MGS node NID> on the other file systems. - - - - - A Lustre installation with two file systems (foo and bar) could look like this, where the MGS node is mgsnode@tcp0 and the mount points are /mnt/lustre1 and /mnt/lustre2. - mgsnode# mkfs.lustre --mgs /mnt/lustre1 -mdtfoonode# mkfs.lustre --fsname=foo --mdt --mgsnode=mgsnode@tcp0 /mnt/lust\ -re1 -ossfoonode# mkfs.lustre --fsname=foo --ost --mgsnode=mgsnode@tcp0 /mnt/lust\ -re1 -ossfoonode# mkfs.lustre --fsname=foo --ost --mgsnode=mgsnode@tcp0 /mnt/lust\ -re2 -mdtbarnode# mkfs.lustre --fsname=bar --mdt --mgsnode=mgsnode@tcp0 /mnt/lust\ -re1 -ossbarnode# mkfs.lustre --fsname=bar --ost --mgsnode=mgsnode@tcp0 /mnt/lust\ -re1 -ossbarnode# mkfs.lustre --fsname=bar --ost --mgsnode=mgsnode@tcp0 /mnt/lust\ -re2 - - To mount a client on file system foo at mount point /mnt/lustre1, run: - mount -t lustre mgsnode@tcp0:/foo /mnt/lustre1 - - To mount a client on file system bar at mount point /mnt/lustre2, run: - mount -t lustre mgsnode@tcp0:/bar /mnt/lustre2 +
+ Setting Parameters with + <literal>lctl</literal> + When the file system is running, the + lctl command can be used to set parameters (temporary + or permanent) and report current parameter values. Temporary parameters + are active as long as the server or client is not shut down. Permanent + parameters live through server and client reboots. + + The lctl list_param command enables users to + list all parameters that can be set. See + . + + For more details about the + lctl command, see the examples in the sections below + and + . +
+ Setting Temporary Parameters + Use + lctl set_param to set temporary parameters on the + node where it is run. These parameters map to items in + /proc/{fs,sys}/{lnet,lustre}. The + lctl set_param command uses this syntax: + +lctl set_param [-n] [-P] +obdtype. +obdname. +proc_file_name= +value + + For example: + +# lctl set_param osc.*.max_dirty_mb=1024 +osc.myth-OST0000-osc.max_dirty_mb=32 +osc.myth-OST0001-osc.max_dirty_mb=32 +osc.myth-OST0002-osc.max_dirty_mb=32 +osc.myth-OST0003-osc.max_dirty_mb=32 +osc.myth-OST0004-osc.max_dirty_mb=32 -
-
- 13.8 <anchor xml:id="dbdoclet.50438194_88980" xreflabel=""/>Setting <anchor xml:id="dbdoclet.50438194_marker-1302467" xreflabel=""/>and Retrieving Lustre Parameters - Several options are available for setting parameters in Lustre: - - When creating a file system, use mkfs.lustre. See Setting Parameters with mkfs.lustre below. - - - - - - When a server is stopped, use tunefs.lustre. See Setting Parameters with tunefs.lustre below. - - - - - - When the file system is running, use lctl to set or retrieve Lustre parameters. See Setting Parameters with lctl and Reporting Current Parameter Values below. - - - - - -
- <anchor xml:id="dbdoclet.50438194_pgfId-1301648" xreflabel=""/>13.8.1 <anchor xml:id="dbdoclet.50438194_17237" xreflabel=""/>Setting Parameters with <anchor xml:id="dbdoclet.50438194_marker-1305722" xreflabel=""/>mkfs.lustre - When the file system is created, parameters can simply be added as a --param option to the mkfs.lustre command. For example: - $ mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda - - For more details about creating a file system,see Chapter 10:Configuring Lustre. For more details about mkfs.lustre, see Chapter 36: System Configuration Utilities.
-
- <anchor xml:id="dbdoclet.50438194_pgfId-1301767" xreflabel=""/>13.8.2 <anchor xml:id="dbdoclet.50438194_55253" xreflabel=""/>Setting Parameters with <anchor xml:id="dbdoclet.50438194_marker-1305720" xreflabel=""/>tunefs.lustre - If a server (OSS or MDS) is stopped, parameters can be added using the --param option to the tunefs.lustre command. For example: - $ tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda - - With tunefs.lustre, parameters are "additive" -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run: - $ tunefs.lustre --erase-params --param=<new parameters> - The tunefs.lustre command can be used to set any parameter settable in a /proc/fs/lustre file and that has its own OBD device, so it can be specified as <obd|fsname>.<obdtype>.<proc_file_name>=<value>. For example: - $ tunefs.lustre --param mdt.group_upcall=NONE /dev/sda1 - - For more details about tunefs.lustre, see Chapter 36: System Configuration Utilities. +
+ Setting Permanent Parameters + Use lctl set_param -P or + lctl conf_param command to set permanent parameters. + In general, the + lctl conf_param command can be used to specify any + parameter settable in a + /proc/fs/lustre file, with its own OBD device. The + lctl conf_param command uses this syntax (same as the + + mkfs.lustre and + tunefs.lustre commands): + +obdname|fsname. +obdtype. +proc_file_name= +value) + + Here are a few examples of + lctl conf_param commands: + +mgs# lctl conf_param testfs-MDT0000.sys.timeout=40 +$ lctl conf_param testfs-MDT0000.mdt.identity_upcall=NONE +$ lctl conf_param testfs.llite.max_read_ahead_mb=16 +$ lctl conf_param testfs-MDT0000.lov.stripesize=2M +$ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15 +$ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15 +$ lctl conf_param testfs.sys.timeout=40 + + + Parameters specified with the + lctl conf_param command are set permanently in the + file system's configuration file on the MGS. +
-
- <anchor xml:id="dbdoclet.50438194_pgfId-1301773" xreflabel=""/>13.8.3 <anchor xml:id="dbdoclet.50438194_51490" xreflabel=""/>Setting Parameters <anchor xml:id="dbdoclet.50438194_marker-1305718" xreflabel=""/>with lctl - When the file system is running, the lctl command can be used to set parameters (temporary or permanent) and report current parameter values. Temporary parameters are active as long as the server or client is not shut down. Permanent parameters live through server and client reboots. - - - - - - Note -The lctl list_param command enables users to list all parameters that can be set. See Listing Parameters. - - - - - For more details about the lctl command, see the examples in the sections below and Chapter 36: System Configuration Utilities. -
- <anchor xml:id="dbdoclet.50438194_pgfId-1307025" xreflabel=""/>13.8.3.1 Setting Temporary Parameters - Use lctl set_param to set temporary parameters on the node where it is run. These parameters map to items in /proc/{fs,sys}/{lnet,lustre}. The lctl set_param command uses this syntax: - lctl set_param [-n] <obdtype>.<obdname>.<proc_file_name>=<value> - - For example: - # lctl set_param osc.*.max_dirty_mb=1024 -osc.myth-OST0000-osc.max_dirty_mb=32 -osc.myth-OST0001-osc.max_dirty_mb=32 -osc.myth-OST0002-osc.max_dirty_mb=32 -osc.myth-OST0003-osc.max_dirty_mb=32 -osc.myth-OST0004-osc.max_dirty_mb=32 - -
-
- <anchor xml:id="dbdoclet.50438194_pgfId-1302347" xreflabel=""/>13.8.3.2 <anchor xml:id="dbdoclet.50438194_64195" xreflabel=""/>Setting Permanent Parameters - Use the lctl conf_param command to set permanent parameters. In general, the lctl conf_param command can be used to specify any parameter settable in a /proc/fs/lustre file, with its own OBD device. The lctl conf_param command uses this syntax (same as the mkfs.lustre and tunefs.lustre commands): - <obd|fsname>.<obdtype>.<proc_file_name>=<value>) - - Here are a few examples of lctl conf_param commands: - $ mgs> lctl conf_param testfs-MDT0000.sys.timeout=40 -$ lctl conf_param testfs-MDT0000.mdt.group_upcall=NONE -$ lctl conf_param testfs.llite.max_read_ahead_mb=16 -$ lctl conf_param testfs-MDT0000.lov.stripesize=2M -$ lctl conf_param testfs-OST0000.osc.max_dirty_mb=29.15 -$ lctl conf_param testfs-OST0000.ost.client_cache_seconds=15 -$ lctl conf_param testfs.sys.timeout=40 - - - - - - - - - - - - - - - - - Caution -Parameters specified with the lctlconf_param command are set permanently in the file system’s configuration file on the MGS. - - - - -
-
- <anchor xml:id="dbdoclet.50438194_pgfId-1305661" xreflabel=""/>13.8.3.3 <anchor xml:id="dbdoclet.50438194_88217" xreflabel=""/>Listing Parameters - To list Lustre or LNET parameters that are available to set, use the lctl list_param command. For example: - lctl list_param [-FR] <obdtype>.<obdname> - - The following arguments are available for the lctl list_param command. - -F Add '/', '@' or '=' for directories, symlinks and writeable files, respectively - -R Recursively lists all parameters under the specified path - For example: - $ lctl list_param obdfilter.lustre-OST0000 - -
-
- <anchor xml:id="dbdoclet.50438194_pgfId-1302343" xreflabel=""/>13.8.3.4 <anchor xml:id="dbdoclet.50438194_63247" xreflabel=""/>Reporting Current <anchor xml:id="dbdoclet.50438194_marker-1302474" xreflabel=""/>Parameter Values - To report current Lustre parameter values, use the lctl get_param command with this syntax: - lctl get_param [-n] <obdtype>.<obdname>.<proc_file_name> - - This example reports data on RPC service times. - $ lctl get_param -n ost.*.ost_io.timeouts -service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 - - This example reports the number of inodes available on each OST. - # lctl get_param osc.*.filesfree -osc.myth-OST0000-osc-ffff88006dd20000.filesfree=217623 -osc.myth-OST0001-osc-ffff88006dd20000.filesfree=5075042 -osc.myth-OST0002-osc-ffff88006dd20000.filesfree=3762034 -osc.myth-OST0003-osc-ffff88006dd20000.filesfree=91052 -osc.myth-OST0004-osc-ffff88006dd20000.filesfree=129651 -
+
+ Setting Permanent Parameters with lctl set_param -P + The lctl set_param -P command can also + set parameters permanently. This command must be issued on the MGS. + The given parameter is set on every host using + lctl upcall. Parameters map to items in + /proc/{fs,sys}/{lnet,lustre}. The + lctl set_param command uses this syntax: + +lctl set_param -P +obdtype. +obdname. +proc_file_name= +value + + For example: + +# lctl set_param -P osc.*.max_dirty_mb=1024 +osc.myth-OST0000-osc.max_dirty_mb=32 +osc.myth-OST0001-osc.max_dirty_mb=32 +osc.myth-OST0002-osc.max_dirty_mb=32 +osc.myth-OST0003-osc.max_dirty_mb=32 +osc.myth-OST0004-osc.max_dirty_mb=32 + + Use + -d(only with -P) option to delete permanent + parameter. Syntax: + +lctl set_param -P -d +obdtype. +obdname. +proc_file_name + + For example: + +# lctl set_param -P -d osc.*.max_dirty_mb +
-
-
- 13.9 <anchor xml:id="dbdoclet.50438194_41817" xreflabel=""/><anchor xml:id="dbdoclet.50438194_42379" xreflabel=""/><anchor xml:id="dbdoclet.50438194_50129" xreflabel=""/>Specifying NIDs and Fail<anchor xml:id="dbdoclet.50438194_marker-1306313" xreflabel=""/>over - If a node has multiple network interfaces, it may have multiple NIDs. When a node is specified, all of its NIDs must be listed, delimited by commas (,) so other nodes can choose the NID that is appropriate for their network interfaces. When failover nodes are specified, they are delimited by a colon (:) or by repeating a keyword (--mgsnode= or --failnode=). To obtain all NIDs from a node (while LNET is running), run: - lctl list_nids - - This displays the server's NIDs (networks configured to work with Lustre). - This example has a combined MGS/MDT failover pair on uml1 and uml2, and a OST failover pair on uml3 and uml4. There are corresponding Elan addresses on uml1 and uml2. - uml1> mkfs.lustre --fsname=testfs --mdt --mgs --failnode=uml2,2@elan /dev/sda1 -uml1> mount -t lustre /dev/sda1 /mnt/test/mdt -uml3> mkfs.lustre --fsname=testfs --ost --failnode=uml4 --mgsnode=uml1,1@ela\ -n --mgsnode=uml2,2@elan /dev/sdb -uml3> mount -t lustre /dev/sdb /mnt/test/ost0 -client> mount -t lustre uml1,1@elan:uml2,2@elan:/testfs /mnt/testfs -uml1> umount /mnt/mdt -uml2> mount -t lustre /dev/sda1 /mnt/test/mdt -uml2> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status - - Where multiple NIDs are specified, comma-separation (for example, uml2,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the "best" one for communication. Colon-separation (for example, uml1:uml2) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.) - - - - - - Note -If you have an MGS or MDT configured for failover, perform these steps: 1. On the OST, list the NIDs of all MGS nodes at mkfs time.OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1 --mgsnode=10.0.0.2 /dev/{device} 2. On the client, mount the file system.client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/ - - - - -
-
- 13.10 <anchor xml:id="dbdoclet.50438194_70905" xreflabel=""/>Erasing a <anchor xml:id="dbdoclet.50438194_marker-1307237" xreflabel=""/>File System - If you want to erase a file system, run this command on your targets: - $ "mkfs.lustre -reformat" - - If you are using a separate MGS and want to keep other file systems defined on that MGS, then set the writeconf flag on the MDT for that file system. The writeconf flag causes the configuration logs to be erased; they are regenerated the next time the servers start. - To set the writeconf flag on the MDT: - 1. Unmount all clients/servers using this file system, run: - $ umount /mnt/lustre - - 2. Erase the file system and, presumably, replace it with another file system, run: - $ mkfs.lustre -reformat --fsname spfs --mdt --mgs /dev/sda - - 3. If you have a separate MGS (that you do not want to reformat), then add the "writeconf" flag to mkfs.lustre on the MDT, run: - $ mkfs.lustre --reformat --writeconf -fsname spfs --mdt \ --mgs /dev/sda - - - - - - - Note -If you have a combined MGS/MDT, reformatting the MDT reformats the MGS as well, causing all configuration information to be lost; you can start building your new file system. Nothing needs to be done with old disks that will not be part of the new file system, just do not mount them. - - - - -
-
- 13.11 <anchor xml:id="dbdoclet.50438194_16954" xreflabel=""/>Reclaiming <anchor xml:id="dbdoclet.50438194_marker-1307251" xreflabel=""/>Reserved Disk Space - All current Lustre installations run the ldiskfs file system internally on service nodes. By default, ldiskfs reserves 5% of the disk space for the root user. In order to reclaim this space, run the following command on your OSSs: - tune2fs [-m reserved_blocks_percent] [device] +
+ Listing Parameters + To list Lustre or LNet parameters that are available to set, use + the + lctl list_param command. For example: + +lctl list_param [-FR] +obdtype. +obdname + + The following arguments are available for the + lctl list_param command. + + -F Add ' + /', ' + @' or ' + =' for directories, symlinks and writeable files, + respectively + + -R Recursively lists all parameters under the + specified path + For example: + +oss# lctl list_param obdfilter.lustre-OST0000 - You do not need to shut down Lustre before running this command or restart it afterwards. -
-
- 13.12 <anchor xml:id="dbdoclet.50438194_69998" xreflabel=""/>Replacing an Existing <anchor xml:id="dbdoclet.50438194_marker-1307278" xreflabel=""/>OST or MDS - To copy the contents of an existing OST to a new OST (or an old MDS to a new MDS), use one of these methods: - - Connect the old OST disk and new OST disk to a single machine, mount both, and use rsync to copy all data between the OST file systems. - - - - - - For example: - mount -t ldiskfs /dev/old /mnt/ost_old -mount -t ldiskfs /dev/new /mnt/ost_new -rsync -aSv /mnt/ost_old/ /mnt/ost_new -# note trailing slash on ost_old/ - - - If you are unable to connect both sets of disk to the same computer, use rsync to copy over the network using rsh (or ssh with -e ssh): - - - - - - rsync -aSvz /mnt/ost_old/ new_ost_node:/mnt/ost_new - - - Use the same procedure for the MDS, with one additional step: - - - - - - cd /mnt/mds_old; getfattr -R -e base64 -d . > /tmp/mdsea; \<copy all MDS file\ -s as above>; cd /mnt/mds_new; setfattr \--restore=/tmp/mdsea +
+
+ Reporting Current Parameter Values + To report current Lustre parameter values, use the + lctl get_param command with this syntax: + +lctl get_param [-n] +obdtype. +obdname. +proc_file_name + + This example reports data on RPC service times. + +oss# lctl get_param -n ost.*.ost_io.timeouts +service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 + + This example reports the amount of space this client has reserved + for writeback cache with each OST: + +client# lctl get_param osc.*.cur_grant_bytes +osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152 +osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304 +osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112 +osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152 +osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384 +
-
- 13.13 <anchor xml:id="dbdoclet.50438194_30872" xreflabel=""/>Identifying To Which Lustre File an OST Object Belongs - Use this procedure to identify the file containing a given object on a given OST. - 1. On the OST (as root), run debugfs to display the file identifier (FID) of the file associated with the object. - For example, if the object is 34976 on /dev/lustre/ost_test2, the debug command is: - # debugfs -c -R "stat /O/0/d$((34976 %32))/34976" /dev/lustre/ost_test2 - - The command output is: - debugfs 1.41.5.sun2 (23-Apr-2009) -/dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitma\ -ps -Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000 -Generation: 1574463214 Version: 0xea020000:00000000 -User: 500 Group: 500 Size: 260096 -File ACL: 0 Directory ACL: 0 -Links: 1 Blockcount: 512 -Fragment: Address: 0 Number: 0 Size: 0 -ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009 -atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009 -mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009 -crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009 -Size of extra inode fields: 24 -Extended attributes stored in inode body: -fid = "e2 00 11 00 00 00 00 00 25 43 c1 87 00 00 00 00 a0 88 00 00 00 00 00 \ -00 00 00 00 00 00 00 00 00 " (32) -BLOCKS: -(0-63):47968-48031 -TOTAL: 64 - - 2. Note the FID’s EA and apply it to the osd_inode_id mapping. - In this example, the FID’s EA is: - e2001100000000002543c18700000000a0880000000000000000000000000000 -struct osd_inode_id { -__u64 oii_ino; /* inode number */ -__u32 oii_gen; /* inode generation */ -__u32 oii_pad; /* alignment padding */ -}; - - After swapping, you get an inode number of 0x001100e2 and generation of 0. - 3. On the MDT (as root), use debugfs to find the file associated with the inode. - # debugfs -c -R "ncheck 0x001100e2" /dev/lustre/mdt_test - - Here is the command output: - debugfs 1.41.5.sun2 (23-Apr-2009) -/dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\ +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>failover</secondary> + </indexterm>Specifying NIDs and Failover + If a node has multiple network interfaces, it may have multiple NIDs, + which must all be identified so other nodes can choose the NID that is + appropriate for their network interfaces. Typically, NIDs are specified in + a list delimited by commas ( + ,). However, when failover nodes are specified, the NIDs + are delimited by a colon ( + :) or by repeating a keyword such as + --mgsnode= or + --servicenode=). + To display the NIDs of all servers in networks configured to work + with the Lustre file system, run (while LNet is running): + +lctl list_nids + + In the example below, + mds0 and + mds1 are configured as a combined MGS/MDT failover pair + and + oss0 and + oss1 are configured as an OST failover pair. The Ethernet + address for + mds0 is 192.168.10.1, and for + mds1 is 192.168.10.2. The Ethernet addresses for + oss0 and + oss1 are 192.168.10.20 and 192.168.10.21 + respectively. + +mds0# mkfs.lustre --fsname=testfs --mdt --mgs \ + --servicenode=192.168.10.2@tcp0 \ + -–servicenode=192.168.10.1@tcp0 /dev/sda1 +mds0# mount -t lustre /dev/sda1 /mnt/test/mdt +oss0# mkfs.lustre --fsname=testfs --servicenode=192.168.10.20@tcp0 \ + --servicenode=192.168.10.21 --ost --index=0 \ + --mgsnode=192.168.10.1@tcp0 --mgsnode=192.168.10.2@tcp0 \ + /dev/sdb +oss0# mount -t lustre /dev/sdb /mnt/test/ost0 +client# mount -t lustre 192.168.10.1@tcp0:192.168.10.2@tcp0:/testfs \ + /mnt/testfs +mds0# umount /mnt/mdt +mds1# mount -t lustre /dev/sda1 /mnt/test/mdt +mds1# lctl get_param mdt.testfs-MDT0000.recovery_status + + Where multiple NIDs are specified separated by commas (for example, + 10.67.73.200@tcp,192.168.10.1@tcp), the two NIDs refer + to the same host, and the Lustre software chooses the + best one for communication. When a pair of NIDs is + separated by a colon (for example, + 10.67.73.200@tcp:10.67.73.201@tcp), the two NIDs refer + to two different hosts and are treated as a failover pair (the Lustre + software tries the first one, and if that fails, it tries the second + one.) + Two options to + mkfs.lustre can be used to specify failover nodes. + Introduced in Lustre software release 2.0, the + --servicenode option is used to specify all service NIDs, + including those for primary nodes and failover nodes. When the + --servicenode option is used, the first service node to + load the target device becomes the primary service node, while nodes + corresponding to the other specified NIDs become failover locations for the + target device. An older option, + --failnode, specifies just the NIDS of failover nodes. + For more information about the + --servicenode and + --failnode options, see + . +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>erasing a file system</secondary> + </indexterm>Erasing a File System + If you want to erase a file system and permanently delete all the + data in the file system, run this command on your targets: + +$ "mkfs.lustre --reformat" + + If you are using a separate MGS and want to keep other file systems + defined on that MGS, then set the + writeconf flag on the MDT for that file system. The + writeconf flag causes the configuration logs to be + erased; they are regenerated the next time the servers start. + To set the + writeconf flag on the MDT: + + + Unmount all clients/servers using this file system, run: + +$ umount /mnt/lustre + + + + Permanently erase the file system and, presumably, replace it + with another file system, run: + +$ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/ +{mdsdev} + + + + If you have a separate MGS (that you do not want to reformat), + then add the + --writeconf flag to + mkfs.lustre on the MDT, run: + +$ mkfs.lustre --reformat --writeconf --fsname spfs --mgsnode= +mgs_nid --mdt --index=0 +/dev/mds_device + + + + + If you have a combined MGS/MDT, reformatting the MDT reformats the + MGS as well, causing all configuration information to be lost; you can + start building your new file system. Nothing needs to be done with old + disks that will not be part of the new file system, just do not mount + them. + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>reclaiming space</secondary> + </indexterm>Reclaiming Reserved Disk Space + All current Lustre installations run the ldiskfs file system + internally on service nodes. By default, ldiskfs reserves 5% of the disk + space to avoid file system fragmentation. In order to reclaim this space, + run the following command on your OSS for each OST in the file + system: + +tune2fs [-m reserved_blocks_percent] /dev/ +{ostdev} + + You do not need to shut down Lustre before running this command or + restart it afterwards. + + Reducing the space reservation can cause severe performance + degradation as the OST file system becomes more than 95% full, due to + difficulty in locating large areas of contiguous free space. This + performance degradation may persist even if the space usage drops below + 95% again. It is recommended NOT to reduce the reserved disk space below + 5%. + +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>replacing an OST or MDS</secondary> + </indexterm>Replacing an Existing OST or MDT + To copy the contents of an existing OST to a new OST (or an old MDT + to a new MDT), follow the process for either OST/MDT backups in + or + . + For more information on removing a MDT, see + . +
+
+ + <indexterm> + <primary>operations</primary> + <secondary>identifying OSTs</secondary> + </indexterm>Identifying To Which Lustre File an OST Object Belongs + Use this procedure to identify the file containing a given object on + a given OST. + + + On the OST (as root), run + debugfs to display the file identifier ( + FID) of the file associated with the object. + For example, if the object is + 34976 on + /dev/lustre/ost_test2, the debug command is: + +# debugfs -c -R "stat /O/0/d$((34976 % 32))/34976" /dev/lustre/ost_test2 + + The command output is: + +debugfs 1.42.3.wc3 (15-Aug-2012) +/dev/lustre/ost_test2: catastrophic mode - not reading inode or group bitmaps +Inode: 352365 Type: regular Mode: 0666 Flags: 0x80000 +Generation: 2393149953 Version: 0x0000002a:00005f81 +User: 1000 Group: 1000 Size: 260096 +File ACL: 0 Directory ACL: 0 +Links: 1 Blockcount: 512 +Fragment: Address: 0 Number: 0 Size: 0 +ctime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009 +atime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009 +mtime: 0x4a216b48:00000000 -- Sat May 30 13:22:16 2009 +crtime: 0x4a216b3c:975870dc -- Sat May 30 13:22:04 2009 +Size of extra inode fields: 24 +Extended attributes stored in inode body: + fid = "b9 da 24 00 00 00 00 00 6a fa 0d 3f 01 00 00 00 eb 5b 0b 00 00 00 0000 +00 00 00 00 00 00 00 00 " (32) + fid: objid=34976 seq=0 parent=[0x24dab9:0x3f0dfa6a:0x0] stripe=1 +EXTENTS: +(0-64):4620544-4620607 + + + + For Lustre software release 2.x file systems, the parent FID will + be of the form [0x200000400:0x122:0x0] and can be resolved directly + using the + lfs fid2path [0x200000404:0x122:0x0] + /mnt/lustre command on any Lustre client, and the process is + complete. + + + In this example the parent inode FID is an upgraded 1.x inode + (due to the first part of the FID being below 0x200000400), the MDT + inode number is + 0x24dab9 and generation + 0x3f0dfa6a and the pathname needs to be resolved + using + debugfs. + + + On the MDS (as root), use + debugfs to find the file associated with the + inode: + +# debugfs -c -R "ncheck 0x24dab9" /dev/lustre/mdt_test + + Here is the command output: + +debugfs 1.42.3.wc2 (15-Aug-2012) +/dev/lustre/mdt_test: catastrophic mode - not reading inode or group bitmap\ s -Inode Pathname -1114338 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP - - The command lists the inode and pathname associated with the object. - - - - - - Note -Debugfs' ''ncheck'' is a brute-force search that may take a long time to complete. - - - - - - - - - - Note -To find the Lustre file from a disk LBA, follow the steps listed in the document at this URL: http://smartmontools.sourceforge.net/badblockhowto.html. Then, follow the steps above to resolve the Lustre filename. - - - - - - - - - - - - - - - - - - - Lustre 2.0 Operations Manual - 821-2076-10 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Copyright © 2011, Oracle and/or its affiliates. All rights reserved. -
+Inode Pathname +2415289 /ROOT/brian-laptop-guest/clients/client11/~dmtmp/PWRPNT/ZD16.BMP + + + + The command lists the inode and pathname associated with the + object. + + + Debugfs' ''ncheck'' is a brute-force search that may + take a long time to complete. + + + To find the Lustre file from a disk LBA, follow the steps listed in + the document at this URL: + + http://smartmontools.sourceforge.net/badblockhowto.html. Then, + follow the steps above to resolve the Lustre filename. +
-
+