From: Richard Henwood Date: Tue, 17 May 2011 23:40:54 +0000 (-0500) Subject: FIX: patched on additional missed content X-Git-Tag: workingxslt~21 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=aecc76a5aff6b8d125d3a2314f140538d596abcb;p=doc%2Fmanual.git FIX: patched on additional missed content --- diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml index dbb073b..ba8589b 100644 --- a/LustreMaintenance.xml +++ b/LustreMaintenance.xml @@ -1,302 +1,531 @@ - - Lustre Maintenance - - Once you have the Lustre file system up and running, you can use the procedures in this section to perform these basic Lustre maintenance tasks: - - - Working with Inactive OSTs - - - Finding Nodes in the Lustre File System - - - Mounting a Server Without Lustre Service - - - Regenerating Lustre Configuration Logs - - - Changing a Server NID - - - Adding a New OST to a Lustre File System - - - Removing and Restoring OSTs - - - Aborting Recovery - + + Lustre Maintenance + + Once you have the Lustre file system up and running, you can use the procedures in this section to perform these basic Lustre maintenance tasks: + + + Working with Inactive OSTs + + + Finding Nodes in the Lustre File System + + + Mounting a Server Without Lustre Service + + + Regenerating Lustre Configuration Logs + + + Changing a Server NID + + + Adding a New OST to a Lustre File System + + + Removing and Restoring OSTs + + + Aborting Recovery + + + Determining Which Machine is Serving an OST + + + Changing the Address of a Failover Node + + + + + +
+ 14.1 <anchor xml:id="dbdoclet.50438199_85142" xreflabel=""/>Working with <anchor xml:id="dbdoclet.50438199_marker-1298888" xreflabel=""/>Inactive OSTs + To mount a client or an MDT with one or more inactive OSTs, run commands similar to this: + client> mount -o exclude=testfs-OST0000 -t lustre uml1:/testfs\ /mnt/testfs + client> cat /proc/fs/lustre/lov/testfs-clilov-*/target_obd + + To activate an inactive OST on a live client or MDT, use the lctl activate command on the OSC device. For example: + lctl --device 7 activate + + + + + + A colon-separated list can also be specified. For example, exclude=testfs-OST0000:testfs-OST0001. + + +
+ 14.2 Finding <anchor xml:id="dbdoclet.50438199_marker-1298897" xreflabel=""/>Nodes in the Lustre File System + There may be situations in which you need to find all nodes in your Lustre file system or get the names of all OSTs. + To get a list of all Lustre nodes, run this command on the MGS: + # cat /proc/fs/lustre/mgs/MGS/live/* + + + + This command must be rund on the MGS. + + + + In this example, file system lustre has three nodes, lustre-MDT0000, lustre-OST0000, and lustre-OST0001. + cfs21:/tmp# cat /proc/fs/lustre/mgs/MGS/live/* + fsname: lustre + flags: 0x0 gen: 26 + lustre-MDT0000 + lustre-OST0000 + lustre-OST0001 + + To get the names of all OSTs, run this command on the MDS: + # cat /proc/fs/lustre/lov/<fsname>-mdtlov/target_obd + + + + This command must be rund on the MDS. + + + + In this example, there are two OSTs, lustre-OST0000 and lustre-OST0001, which are both active. + cfs21:/tmp# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd + 0: lustre-OST0000_UUID ACTIVE + 1: lustre-OST0001_UUID ACTIVE + +
+
+ 14.3 Mounting a Server Without <anchor xml:id="dbdoclet.50438199_marker-1298918" xreflabel=""/>Lustre Service + If you are using a combined MGS/MDT, but you only want to start the MGS and not the MDT, run this command: + mount -t lustre <MDT partition> -o nosvc <mount point> + + The <MDT partition> variable is the combined MGS/MDT. + In this example, the combined MGS/MDT is testfs-MDT0000 and the mount point is mnt/test/mdt. + $ mount -t lustre -L testfs-MDT0000 -o nosvc /mnt/test/mdt + +
+
+ 14.4 Regenerating Lustre <anchor xml:id="dbdoclet.50438199_marker-1305736" xreflabel=""/>Configuration Logs + If the Lustre system’s configuration logs are in a state where the file system cannot be started, use the writeconf command to erase them. After the writeconf command is run and the servers restart, the configuration logs are re-generated and stored on the MGS (as in a new file system). + You should only use the writeconf command if: + + The configuration logs are in a state where the file system cannot start + + + A server NID is being changed + + + The writeconf command is destructive to some configuration items (i.e., OST pools information and items set via conf_param), and should be used with caution. To avoid problems: + + Shut down the file system before running the writeconf command + + + Run the writeconf command on all servers (MDT first, then OSTs) + + + Start the file system in this order: + + + + MGS (or the combined MGS/MDT) + + + MDT + + + OSTs + + + Lustre clients + + + + + + + + The OST pools feature enables a group of OSTs to be named for file striping purposes. If you use OST pools, be aware that running the writeconf command erases all pools information (as well as any other parameters set via lctl conf_param). We recommend that the pools definitions (and conf_param settings) be executed via a script, so they can be reproduced easily after a writeconf is performed. + + + To regenerate Lustre's system configuration logs: + + Shut down the file system in this order. + + Unmount the clients. + + Unmount the MDT. + + Unmount all OSTs. + + + Make sure the the MDT and OST devices are available. + + Run the writeconf command on all servers. + Run writeconf on the MDT first, and then the OSTs. + + On the MDT, run: + <mdt node>$ tunefs.lustre --writeconf <device> + + + On each OST, run: + <ost node>$ tunefs.lustre --writeconf <device> + + + Restart the file system in this order. + + Mount the MGS (or the combined MGS/MDT). + + Mount the MDT. + + Mount the OSTs. + + Mount the clients. + + After the writeconf command is run, the configuration logs are re-generated as servers restart. + +
+
+ 14.5 Changing a <anchor xml:id="dbdoclet.50438199_marker-1305737" xreflabel=""/>Server NID + If you need to change the NID on the MDT or an OST, run the writeconf command to erase Lustre configuration information (including server NIDs), and then re-generate the system configuration using updated server NIDs. + Change a server NID in these situations: + + New server hardware is added to the file system, and the MDS or an OSS is being moved to the new machine + + + New network card is installed in the server + + + You want to reassign IP addresses + + + To change a server NID: + + Update the LNET configuration in the /etc/modprobe.conf file so the list of server NIDs (lctl list_nids) is correct. + The lctl list_nids command indicates which network(s) are configured to work with Lustre. + + Shut down the file system in this order. + + Unmount the clients. + + Unmount the MDT. + + Unmount all OSTs. + + + Run the writeconf command on all servers. + Run writeconf on the MDT first, and then the OSTs. + + On the MDT, run: + <mdt node>$ tunefs.lustre --writeconf <device> + + + On each OST, run: + <ost node>$ tunefs.lustre --writeconf <device> + + + If the NID on the MGS was changed, communicate the new MGS location to each server. Run: + tunefs.lustre --erase-param --mgsnode=<new_nid(s)> --writeconf /dev/.. + + + + Restart the file system in this order. + + Mount the MGS (or the combined MGS/MDT). + + Mount the MDT. + + Mount the OSTs. + + Mount the clients. + + + After the writeconf command is run, the configuration logs are re-generated as servers restart, and server NIDs in the updated list_nids file are used. +
+
+ 14.6 Adding a New <anchor xml:id="dbdoclet.50438199_marker-1306353" xreflabel=""/>OST to a Lustre File System + To add an OST to existing Lustre file system: + + 1. Add a new OST by passing on the following commands, run: + $ mkfs.lustre --fsname=spfs --ost --mgsnode=mds16@tcp0 /dev/sda + $ mkdir -p /mnt/test/ost0 + $ mount -t lustre /dev/sda /mnt/test/ost0 + + + 2. Migrate the data (possibly). + +The file system is quite unbalanced when new empty OSTs are added. New file creations are automatically balanced. If this is a scratch file system or files are pruned at a regular interval, then no further work may be needed. +New files being created will preferentially be placed on the empty OST. As old files are deleted, they will release space on the old OST. +Files existing prior to the expansion can optionally be rebalanced with an in-place copy, which can be done with a simple script. The basic method is to copy existing files to a temporary file, then move the temp file over the old one. This should not be attempted with files which are currently being written to by users or applications. This operation redistributes the stripes over the entire set of OSTs. +For example, to rebalance all files within /mnt/lustre/dir, enter: +lfs_migrate /mnt/lustre/file + +To migrate files within the /test filesystem on OST0004 that are larger than 4GB in size, enter: +lfs find /test -obd test-OST0004 -size +4G | lfs_migrate -y + +See (lfs_migrate) for more details. +
+
+14.7 Removing and Restoring OSTs +OSTs can be removed from and restored to a Lustre file system. Currently in Lustre, removing an OST really means that the OST is 'deactivated' in the file system, not permanently removed. A removed OST still appears in the file system; do not create a new OST with the same name. +You may want to remove (deactivate) an OST and prevent new files from being written to it in several situations: + + Hard drive has failed and a RAID resync/rebuild is underway + - Determining Which Machine is Serving an OST - + OST is nearing its space capacity + + +
+<anchor xml:id="dbdoclet.50438199_pgfId-1298979" xreflabel=""/>14.7.1 Removing an OST from the File System +OSTs can be removed from a Lustre file system. Currently in Lustre, removing an OST actually means that the OST is 'deactivated' from the file system, not permanently removed. A removed OST still appears in the device listing; you should not normally create a new OST with the same name. +You may want to deactivate an OST and prevent new files from being written to it in several situations: + + OST is nearing its space capacity + - Changing the Address of a Failover Node - + Hard drive has failed and a RAID resync/rebuild is underway + - - + OST storage has failed permanently + -
- 14.1 <anchor xml:id="dbdoclet.50438199_85142" xreflabel=""/>Working with <anchor xml:id="dbdoclet.50438199_marker-1298888" xreflabel=""/>Inactive OSTs - To mount a client or an MDT with one or more inactive OSTs, run commands similar to this: - client> mount -o exclude=testfs-OST0000 -t lustre uml1:/testfs\ /mnt/testfs -client> cat /proc/fs/lustre/lov/testfs-clilov-*/target_obd +When removing an OST, remember that the MDT does not communicate directly with OSTs. Rather, each OST has a corresponding OSC which communicates with the MDT. It is necessary to determine the device number of the OSC that corresponds to the OST. Then, you use this device number to deactivate the OSC on the MDT. +To remove an OST from the file system: + + +For the OST to be removed, determine the device number of the corresponding OSC on the MDT. + + List all OSCs on the node, along with their device numbers. Run: +lctldl|grep " osc " - To activate an inactive OST on a live client or MDT, use the lctl activate command on the OSC device. For example: - lctl --device 7 activate +This is sample lctldl|grep + +11 UP osc lustre-OST-0000-osc-cac94211 4ea5b30f-6a8e-55a0-7519-2f20318ebdb4 5 +12 UP osc lustre-OST-0001-osc-cac94211 4ea5b30f-6a8e-55a0-7519-2f20318ebdb4 5 +13 IN osc lustre-OST-0000-osc lustre-MDT0000-mdtlov_UUID 5 +14 UP osc lustre-OST-0001-osc lustre-MDT0000-mdtlov_UUID 5 + +Determine the device number of the OSC that corresponds to the OST to be removed. + + +Temporarily deactivate the OSC on the MDT. On the MDT, run: - - -A colon-separated list can also be specified. For example, exclude=testfs-OST0000:testfs-OST0001. - - -
- 14.2 Finding <anchor xml:id="dbdoclet.50438199_marker-1298897" xreflabel=""/>Nodes in the Lustre File System - There may be situations in which you need to find all nodes in your Lustre file system or get the names of all OSTs. - To get a list of all Lustre nodes, run this command on the MGS: - # cat /proc/fs/lustre/mgs/MGS/live/* + +$ mdt> lctl --device >devno< deactivate - - -This command must be rund on the MGS. + +For example, based on the command output in Step 1, to deactivate device 13 (the MDT’s OSC for OST-0000), the command would be: - - - In this example, file system lustre has three nodes, lustre-MDT0000, lustre-OST0000, and lustre-OST0001. - cfs21:/tmp# cat /proc/fs/lustre/mgs/MGS/live/* -fsname: lustre -flags: 0x0 gen: 26 -lustre-MDT0000 -lustre-OST0000 -lustre-OST0001 - - To get the names of all OSTs, run this command on the MDS: - # cat /proc/fs/lustre/lov/<fsname>-mdtlov/target_obd + + +$ mdt> lctl --device 13 deactivate - - -This command must be rund on the MDS. + + +This marks the OST as inactive on the MDS, so no new objects are assigned to the OST. This does not prevent use of existing objects for reads or writes. + + +Do not deactivate the OST on the clients. Do so causes errors (EIOs), and the copy out to fail. + +Do not use lctl conf_param to deactivate the OST. It permanently sets a parameter in the file system configuration. + + + +Discover all files that have objects residing on the deactivated OST. + + + +Depending on whether the deactivated OST is available or not, the data from that OST may be migrated to other OSTs, or may need to be restored from backup. - - In this example, there are two OSTs, lustre-OST0000 and lustre-OST0001, which are both active. - cfs21:/tmp# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd -0: lustre-OST0000_UUID ACTIVE -1: lustre-OST0001_UUID ACTIVE + + +If the OST is still online and available, find all files with objects on the deactivated OST, and copy them to other OSTs in the file system to: + + + +[client]# lfs find --obd <OST UUID> <mount_point> | lfs_migrate -y -
-
- 14.3 Mounting a Server Without <anchor xml:id="dbdoclet.50438199_marker-1298918" xreflabel=""/>Lustre Service - If you are using a combined MGS/MDT, but you only want to start the MGS and not the MDT, run this command: - mount -t lustre <MDT partition> -o nosvc <mount point> + + +If the OST is no longer available, delete the files on that OST and restore them from backup: + + +[client]# lfs find --obd <OST UUID> -print0 <mount_point> | \ +tee /tmp/files_to_restore | xargs -0 -n 1 unlink - The <MDT partition> variable is the combined MGS/MDT. - In this example, the combined MGS/MDT is testfs-MDT0000 and the mount point is mnt/test/mdt. - $ mount -t lustre -L testfs-MDT0000 -o nosvc /mnt/test/mdt + +The list of files that need to be restored from backup is stored in /tmp/files_to_restore. Restoring these files is beyond the scope of this document. + + + +4. Deactivate the OST. + + +a. To temporarily disable the deactivated OST, enter: + + +[client]# lctl set_param osc.<fsname>-<OST name>-*.active=0 -
-
- 14.4 Regenerating Lustre <anchor xml:id="dbdoclet.50438199_marker-1305736" xreflabel=""/>Configuration Logs - If the Lustre system’s configuration logs are in a state where the file system cannot be started, use the writeconf command to erase them. After the writeconf command is run and the servers restart, the configuration logs are re-generated and stored on the MGS (as in a new file system). - You should only use the writeconf command if: - - The configuration logs are in a state where the file system cannot start - - - A server NID is being changed - - - The writeconf command is destructive to some configuration items (i.e., OST pools information and items set via conf_param), and should be used with caution. To avoid problems: - - Shut down the file system before running the writeconf command - - - Run the writeconf command on all servers (MDT first, then OSTs) - - - Start the file system in this order: - - - - MGS (or the combined MGS/MDT) - - - MDT - - - OSTs - - - Lustre clients - - - - - - -The OST pools feature enables a group of OSTs to be named for file striping purposes. If you use OST pools, be aware that running the writeconf command erases all pools information (as well as any other parameters set via lctl conf_param). We recommend that the pools definitions (and conf_param settings) be executed via a script, so they can be reproduced easily after a writeconf is performed. - - - To regenerate Lustre's system configuration logs: - - Shut down the file system in this order. - - Unmount the clients. - - Unmount the MDT. - - Unmount all OSTs. - - - Make sure the the MDT and OST devices are available. - - Run the writeconf command on all servers. - Run writeconf on the MDT first, and then the OSTs. - - On the MDT, run: - <mdt node>$ tunefs.lustre --writeconf <device> +If there is expected to be a replacement OST in some short time (a few days), the OST can temporarily be deactivated on the clients: +Note - This setting is only temporary and will be reset if the clients or MDS are rebooted. It needs to be run on all clients. + + +b. To permanently disable the deactivated OST, enter: + + +[mgs]# lctl conf_param {OST name}.osc.active=0 - - On each OST, run: - <ost node>$ tunefs.lustre --writeconf <device> + + +If there is not expected to be a replacement for this OST in the near future, permanently deactivate the OST on all clients and the MDS: +Note - A removed OST still appears in the file system; do not create a new OST with the same name. +14.7.2 Backing Up OST Configuration Files + +If the OST device is still accessible, then the Lustre configuration files on the OST should be backed up and saved for future use in order to avoid difficulties when a replacement OST is returned to service. These files rarely change, so they can and should be backed up while the OST is functional and accessible. If the deactivated OST is still available to mount (i.e. has not permanently failed or is unmountable due to severe corruption), an effort should be made to preserve these files. + + +1. Mount the OST filesystem. + + +[oss]# mkdir -p /mnt/ost +[oss]# mount -t ldiskfs {ostdev} /mnt/ost - - Restart the file system in this order. - - Mount the MGS (or the combined MGS/MDT). - - Mount the MDT. - - Mount the OSTs. - - Mount the clients. - - After the writeconf command is run, the configuration logs are re-generated as servers restart. - -
-
- 14.5 Changing a <anchor xml:id="dbdoclet.50438199_marker-1305737" xreflabel=""/>Server NID - If you need to change the NID on the MDT or an OST, run the writeconf command to erase Lustre configuration information (including server NIDs), and then re-generate the system configuration using updated server NIDs. - Change a server NID in these situations: - - New server hardware is added to the file system, and the MDS or an OSS is being moved to the new machine - - - New network card is installed in the server - - - You want to reassign IP addresses - - - To change a server NID: - - Update the LNET configuration in the /etc/modprobe.conf file so the list of server NIDs (lctl list_nids) is correct. - The lctl list_nids command indicates which network(s) are configured to work with Lustre. - - Shut down the file system in this order. - - Unmount the clients. - - Unmount the MDT. - - Unmount all OSTs. - - - Run the writeconf command on all servers. - Run writeconf on the MDT first, and then the OSTs. - - On the MDT, run: - <mdt node>$ tunefs.lustre --writeconf <device> + + +2. Back up the OST configuration files. + + +[oss]# tar cvf {ostname}.tar -C /mnt/ost last_rcvd \ +CONFIGS/ O/0/LAST_ID - - On each OST, run: - <ost node>$ tunefs.lustre --writeconf <device> + + +3. Unmount the OST filesystem. + + +[oss]# umount /mnt/ost - - If the NID on the MGS was changed, communicate the new MGS location to each server. Run: - tunefs.lustre --erase-param --mgsnode=<new_nid(s)> --writeconf /dev/.. + + +14.7.3 Restoring OST Configuration Files + +If the original OST is still available, it is best to follow the OST backup and restore procedure given in either Backing Up and Restoring an MDS or OST (Device Level), or Making a File-Level Backup of an OST File System and Restoring a File-Level Backup. + +To replace an OST that was removed from service due to corruption or hardware failure, the file system needs to be formatted for Lustre, and the Lustre configuration should be restored, if available. + +If the OST configuration files were not backed up, due to the OST file system being completely inaccessible, it is still possible to replace the failed OST with a new one at the same OST index. + + +1. Format the OST file system. + + +[oss]# mkfs.lustre --ost --index {OST index} {other options} \ +{newdev} - - - Restart the file system in this order. - - Mount the MGS (or the combined MGS/MDT). - - Mount the MDT. - - Mount the OSTs. - - Mount the clients. - - - After the writeconf command is run, the configuration logs are re-generated as servers restart, and server NIDs in the updated list_nids file are used. -
-
- 14.6 Adding a New <anchor xml:id="dbdoclet.50438199_marker-1306353" xreflabel=""/>OST to a Lustre File System - To add an OST to existing Lustre file system: - - 1. Add a new OST by passing on the following commands, run: - $ mkfs.lustre --fsname=spfs --ost --mgsnode=mds16@tcp0 /dev/sda -$ mkdir -p /mnt/test/ost0 -$ mount -t lustre /dev/sda /mnt/test/ost0 + + +2. Mount the OST filesystem. + + +[oss]# mkdir /mnt/ost +[oss]# mount -t ldiskfs {newdev} /mnt/ost - - 2. Migrate the data (possibly). - - The file system is quite unbalanced when new empty OSTs are added. New file creations are automatically balanced. If this is a scratch file system or files are pruned at a regular interval, then no further work may be needed. - New files being created will preferentially be placed on the empty OST. As old files are deleted, they will release space on the old OST. - Files existing prior to the expansion can optionally be rebalanced with an in-place copy, which can be done with a simple script. The basic method is to copy existing files to a temporary file, then move the temp file over the old one. This should not be attempted with files which are currently being written to by users or applications. This operation redistributes the stripes over the entire set of OSTs. - For example, to rebalance all files within /mnt/lustre/dir, enter: - lfs_migrate /mnt/lustre/file + + +3. Restore the OST configuration files, if available. + + +[oss]# tar xvf {ostname}.tar -C /mnt/ost - To migrate files within the /test filesystem on OST0004 that are larger than 4GB in size, enter: - lfs find /test -obd test-OST0004 -size +4G | lfs_migrate -y + + +4. Recreate the OST configuration files, if unavailable. + +Follow the procedure in Fixing a Bad LAST_ID on an OST to recreate the LAST_ID file for this OST index. The last_rcvd file will be recreated when the OST is first mounted using the default parameters, which are normally correct for all file systems. + +The CONFIGS/mountdata file is created by mkfs.lustre at format time, but has flags set that request it to register itself with the MGS. It is possible to copy these flags from another working OST (which should be the same): + + +[oss2]# debugfs -c -R "dump CONFIGS/mountdata /tmp/ldd" \ +{other_osdev} +[oss2]# scp /tmp/ldd oss:/tmp/ldd +[oss]# dd if=/tmp/ldd of=/mnt/ost/CONFIGS/mountdata bs=4 count=1 \ +seek=5 skip=5 -See (lfs_migrate) for more details. -
-
- 14.7 Removing<anchor xml:id="dbdoclet.50438199_marker-1298976" xreflabel=""/> and Restoring OSTs - OSTs can be removed from and restored to a Lustre file system. Currently in Lustre, removing an OST really means that the OST is 'deactivated' in the file system, not permanently removed. A removed OST still appears in the file system; do not create a new OST with the same name. - You may want to remove (deactivate) an OST and prevent new files from being written to it in several situations: - - Hard drive has failed and a RAID resync/rebuild is underway - - - OST is nearing its space capacity - - -
- <anchor xml:id="dbdoclet.50438199_pgfId-1298979" xreflabel=""/>14.7.1 Removing an OST from the File System - OSTs can be removed from a Lustre file system. Currently in Lustre, removing an OST actually means that the OST is 'deactivated' from the file system, not permanently removed. A removed OST still appears in the device listing; you should not normally create a new OST with the same name. - You may want to deactivate an OST and prevent new files from being written to it in several situations: - - OST is nearing its space capacity - - - Hard drive has failed and a RAID resync/rebuild is underway - - - OST storage has failed permanently - - - When removing an OST, remember that the MDT does not communicate directly with OSTs. Rather, each OST has a corresponding OSC which communicates with the MDT. It is necessary to determine the device number of the OSC that corresponds to the OST. Then, you use this device number to deactivate the OSC on the MDT. - To remove an OST from the file system: - - For the OST to be removed, determine the device number of the corresponding OSC on the MDT. + +5. Unmount the OST filesystem. + + +[oss]# umount /mnt/ost + - To list all OSCs on the node, along with their device numbers. Run: - lctldl|grep " osc " + +14.7.4 Returning a Deactivated OST to Service + +If the OST was permanently deactivated, it needs to be reactivated in the MGS configuration. + + +[mgs]# lctl conf_param {OST name}.osc.active=1 + + +If the OST was temporarily deactivated, it needs to be reactivated on the MDS and clients. + + +[mds]# lctl --device <devno> activate +[client]# lctl set_param osc.<fsname>-<OST name>-*.active=0 - This is sample lctldl|grep - -
-
-
- + +14.8 Aborting Recovery + +You can abort recovery with either the lctl utility or by mounting the target with the abort_recov option (mount -o abort_recov). When starting a target, run: + + +$ mount -t lustre -L <MDT name> -o abort_recov <mount_point> + + +Note - The recovery process is blocked until all OSTs are available. +14.9 Determining Which Machine is Serving an OST + +In the course of administering a Lustre file system, you may need to determine which machine is serving a specific OST. It is not as simple as identifying the machine’s IP address, as IP is only one of several networking protocols that Lustre uses and, as such, LNET does not use IP addresses as node identifiers, but NIDs instead. + +To identify the NID that is serving a specific OST, run one of the following commands on a client (you do not need to be a root user): + + +client$ lctl get_param osc.${fsname}-${OSTname}*.ost_conn_uuid + + +For example: + + +client$ lctl get_param osc.*-OST0000*.ost_conn_uuid +osc.lustre-OST0000-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp + + +- OR - + + +client$ lctl get_param osc.*.ost_conn_uuid +osc.lustre-OST0000-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp +osc.lustre-OST0001-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp +osc.lustre-OST0002-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp +osc.lustre-OST0003-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp +osc.lustre-OST0004-osc-f1579000.ost_conn_uuid=192.168.20.1@tcp + + +14.10 Changing the Address of a Failover Node + +To change the address of a failover node (e.g, to use node X instead of node Y), run this command on the OSS/OST partition: + + +tunefs.lustre --erase-params --failnode=<NID> <device> + + + + + + +
+
+
+