From 0ea900fc42b8328b46326df2534df5f5f429bdb9 Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Wed, 17 Aug 2011 15:33:05 -0600 Subject: [PATCH] LUDOC-11 Misc cleanups of examples and text Clean up some examples seen in the manual while working on LUDOC-14. Use "ost0" and "ost1" in examples, to match the actual OST indices used, instead of "ost1" and "ost2". Same for "oss0" and "oss1", etc. Use "lctl {get,set}_param" instead of direct /proc access in a few places (there are a lot more that should be fixed). Signed-off-by: Andreas Dilger Change-Id: Ibeb4e3318b6fbdb8a81283d5cfad14a69e27443b --- BackupAndRestore.xml | 86 +++++++++++++++++++++++------------------------- BenchmarkingTests.xml | 52 +++++++++++++---------------- ConfiguringLustre.xml | 80 +++++++++++++++++++++----------------------- ConfiguringQuotas.xml | 3 -- LustreMaintenance.xml | 21 +++++++----- LustreOperations.xml | 60 ++++++++++++++++----------------- ManagingFileSystemIO.xml | 27 +++++++-------- ManagingSecurity.xml | 2 +- 8 files changed, 158 insertions(+), 173 deletions(-) diff --git a/BackupAndRestore.xml b/BackupAndRestore.xml index dd69a81..e536e15 100644 --- a/BackupAndRestore.xml +++ b/BackupAndRestore.xml @@ -1,7 +1,7 @@ Backing Up and Restoring a File System - Lustre provides backups at the file system-level, device-level and file-level. This chapter describes how to backup and restore on Lustre, and includes the following sections: + Lustre provides backups at the filesystem-level, device-level and file-level. This chapter describes how to backup and restore on Lustre, and includes the following sections: @@ -29,7 +29,7 @@ rsyncbackup Backing up a File System - Backing up a complete file system gives you full control over the files to back up, and allows restoration of individual files as needed. File system-level backups are also the easiest to integrate into existing backup solutions. + Backing up a complete file system gives you full control over the files to back up, and allows restoration of individual files as needed. Filesystem-level backups are also the easiest to integrate into existing backup solutions. File system backups are performed from a Lustre client (or many clients working parallel in different directories) rather than on individual server nodes; this is no different than backing up any other file system. However, due to the large size of most Lustre file systems, it is not always possible to get a complete backup. We recommend that you back up subsets of a file system. This includes subdirectories of the entire file system, filesets for a single user, files incremented by date, and so on. @@ -156,11 +156,11 @@ <indexterm><primary>backup</primary><secondary>rsync</secondary><tertiary>examples</tertiary></indexterm><literal>lustre_rsync</literal> Examples Sample lustre_rsync commands are listed below. Register a changelog user for an MDT (e.g. lustre-MDT0000). - # lctl --device lustre-MDT0000 changelog_register lustre-MDT0000 Registered\ - changelog userid 'cl1' + # lctl --device lustre-MDT0000 changelog_register lustre-MDT0000 +Registered changelog userid 'cl1' Synchronize a Lustre file system (/mnt/lustre) to a target file system (/mnt/target). - $ lustre_rsync --source=/mnt/lustre --target=/mnt/target --mdt=lustre-MDT0000 \ ---user=cl1 --statuslog sync.log --verbose + $ lustre_rsync --source=/mnt/lustre --target=/mnt/target \ + --mdt=lustre-MDT0000 --user=cl1 --statuslog sync.log --verbose Lustre filesystem: lustre MDT device: lustre-MDT0000 Source: /mnt/lustre @@ -184,9 +184,8 @@ Errors: 0 lustre_rsync took 2 seconds Changelog records consumed: 42 To synchronize a Lustre file system (/mnt/lustre) to two target file systems (/mnt/target1 and /mnt/target2). - $ lustre_rsync --source=/mnt/lustre --target=/mnt/target1 --target=/mnt/tar\ -get2 \ - --mdt=lustre-MDT0000 --user=cl1 + $ lustre_rsync --source=/mnt/lustre --target=/mnt/target1 \ + --target=/mnt/target2 --mdt=lustre-MDT0000 --user=cl1 \ --statuslog sync.log @@ -203,8 +202,8 @@ get2 \ If hardware replacement is the reason for the backup or if a spare storage device is available, it is possible to do a raw copy of the MDT or OST from one block device to the other, as long as the new device is at least as large as the original device. To do this, run: dd if=/dev/{original} of=/dev/{new} bs=1M If hardware errors cause read problems on the original device, use the command below to allow as much data as possible to be read from the original device while skipping sections of the disk with errors: - dd if=/dev/{original} of=/dev/{new} bs=4k conv=sync,noerror count={original\ - size in 4kB blocks} + dd if=/dev/{original} of=/dev/{new} bs=4k conv=sync,noerror / + count={original size in 4kB blocks} Even in the face of hardware errors, the ldiskfs file system is very robust and it may be possible to recover the file system data after running e2fsck -f on the new device.
@@ -230,7 +229,7 @@ get2 \ Back up the extended attributes. [oss]# getfattr -R -d -m '.*' -e hex -P . > ea-$(date +%Y%m%d).bak - If the tar(1) command supports the --xattr option, the getfattr step may be unnecessary as long as it does a backup of the "trusted" attributes. However, completing this step is not harmful and can serve as an added safety measure. + If the tar(1) command supports the --xattr option, the getfattr step may be unnecessary as long as tar does a backup of the "trusted.*" attributes. However, completing this step is not harmful and can serve as an added safety measure. In most distributions, the getfattr command is part of the "attr" package. If the getfattr command returns errors like Operation not supported, then the kernel does not correctly support EAs. Stop and use a different backup method. @@ -319,20 +318,20 @@ trusted.fid= \ Create LVM devices for your MDT and OST targets. Make sure not to use the entire disk for the targets; save some room for the snapshots. The snapshots start out as 0 size, but grow as you make changes to the current file system. If you expect to change 20% of the file system between backups, the most recent snapshot will be 20% of the target size, the next older one will be 40%, etc. Here is an example: cfs21:~# pvcreate /dev/sda1 Physical volume "/dev/sda1" successfully created -cfs21:~# vgcreate volgroup /dev/sda1 - Volume group "volgroup" successfully created -cfs21:~# lvcreate -L200M -nMDT volgroup - Logical volume "MDT" created -cfs21:~# lvcreate -L200M -nOST0 volgroup +cfs21:~# vgcreate vgmain /dev/sda1 + Volume group "vgmain" successfully created +cfs21:~# lvcreate -L200G -nMDT0 vgmain + Logical volume "MDT0" created +cfs21:~# lvcreate -L200G -nOST0 vgmain Logical volume "OST0" created cfs21:~# lvscan - ACTIVE '/dev/volgroup/MDT' [200.00 MB] inherit - ACTIVE '/dev/volgroup/OST0' [200.00 MB] inherit + ACTIVE '/dev/vgmain/MDT0' [200.00 GB] inherit + ACTIVE '/dev/vgmain/OST0' [200.00 GB] inherit Format the LVM volumes as Lustre targets. In this example, the backup file system is called 'main' and designates the current, most up-to-date backup. - cfs21:~# mkfs.lustre --fsname=main --mdt --index=0 /dev/volgroup/MDT + cfs21:~# mkfs.lustre --fsname=main --mdt --index=0 /dev/vgmain/MDT0 No management node specified, adding MGS to this MDT. Permanent disk data: Target: main-MDT0000 @@ -344,15 +343,15 @@ cfs21:~# lvscan Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: checking for existing Lustre data - device size = 200MB - formatting backing filesystem ldiskfs on /dev/volgroup/MDT - target name main-MDTffff + device size = 200GB + formatting backing filesystem ldiskfs on /dev/vgmain/MDT0 + target name main-MDT0000 4k blocks 0 options -i 4096 -I 512 -q -O dir_index -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L main-MDTffff -i 4096 -I 512 -q - -O dir_index -F /dev/volgroup/MDT + -O dir_index -F /dev/vgmain/MDT0 Writing CONFIGS/mountdata -cfs21:~# mkfs.lustre --mgsnode=cfs21 --fsname=main --ost --index=0 /dev/volgroup/OST0 +cfs21:~# mkfs.lustre --mgsnode=cfs21 --fsname=main --ost --index=0 /dev/vgmain/OST0 Permanent disk data: Target: main-OST0000 Index: 0 @@ -363,17 +362,17 @@ cfs21:~# mkfs.lustre --mgsnode=cfs21 --fsname=main --ost --index=0 /dev/volgroup Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=192.168.0.21@tcp checking for existing Lustre data - device size = 200MB - formatting backing filesystem ldiskfs on /dev/volgroup/OST0 + device size = 200GB + formatting backing filesystem ldiskfs on /dev/vgmain/OST0 target name main-OST0000 4k blocks 0 options -I 256 -q -O dir_index -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L lustre-OSTffff -J size=400 -I 256 -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256 - -E resize=4290772992,lazy_journal_init, -F /dev/volgroup/OST0 + -E resize=4290772992,lazy_journal_init, -F /dev/vgmain/OST0 Writing CONFIGS/mountdata -cfs21:~# mount -t lustre /dev/volgroup/MDT /mnt/mdt -cfs21:~# mount -t lustre /dev/volgroup/OST0 /mnt/ost +cfs21:~# mount -t lustre /dev/vgmain/MDT0 /mnt/mdt +cfs21:~# mount -t lustre /dev/vgmain/OST0 /mnt/ost cfs21:~# mount -t lustre cfs21:/main /mnt/main @@ -394,12 +393,12 @@ fstab passwd You can create as many snapshots as you have room for in the volume group. If necessary, you can dynamically add disks to the volume group. The snapshots of the target MDT and OSTs should be taken at the same point in time. Make sure that the cronjob updating the backup file system is not running, since that is the only thing writing to the disks. Here is an example: cfs21:~# modprobe dm-snapshot -cfs21:~# lvcreate -L50M -s -n MDTb1 /dev/volgroup/MDT +cfs21:~# lvcreate -L50M -s -n MDT0.b1 /dev/vgmain/MDT0 Rounding up size to full physical extent 52.00 MB - Logical volume "MDTb1" created -cfs21:~# lvcreate -L50M -s -n OSTb1 /dev/volgroup/OST0 + Logical volume "MDT0.b1" created +cfs21:~# lvcreate -L50M -s -n OST0.b1 /dev/vgmain/OST0 Rounding up size to full physical extent 52.00 MB - Logical volume "OSTb1" created + Logical volume "OST.b1" created After the snapshots are taken, you can continue to back up new/changed files to "main". The snapshots will not contain the new files. cfs21:~# cp /etc/termcap /mnt/main cfs21:~# ls /mnt/main @@ -412,7 +411,7 @@ fstab passwd termcap Rename the LVM snapshot. Rename the file system snapshot from "main" to "back" so you can mount it without unmounting "main". This is recommended, but not required. Use the --reformat flag to tunefs.lustre to force the name change. For example: - cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/volgroup/MDTb1 + cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/vgmain/MDT0.b1 checking for existing Lustre data found Lustre data Reading CONFIGS/mountdata @@ -435,7 +434,7 @@ Permanent disk data: Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Writing CONFIGS/mountdata -cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/volgroup/OSTb1 +cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/vgmain/OST0.b1 checking for existing Lustre data found Lustre data Reading CONFIGS/mountdata @@ -459,24 +458,23 @@ Permanent disk data: Parameters: mgsnode=192.168.0.21@tcp Writing CONFIGS/mountdata When renaming an FS, we must also erase the last_rcvd file from the snapshots -cfs21:~# mount -t ldiskfs /dev/volgroup/MDTb1 /mnt/mdtback +cfs21:~# mount -t ldiskfs /dev/vgmain/MDT0.b1 /mnt/mdtback cfs21:~# rm /mnt/mdtback/last_rcvd cfs21:~# umount /mnt/mdtback -cfs21:~# mount -t ldiskfs /dev/volgroup/OSTb1 /mnt/ostback +cfs21:~# mount -t ldiskfs /dev/vgmain/OST0.b1 /mnt/ostback cfs21:~# rm /mnt/ostback/last_rcvd cfs21:~# umount /mnt/ostback Mount the file system from the LVM snapshot. For example: - cfs21:~# mount -t lustre /dev/volgroup/MDTb1 /mnt/mdtback + cfs21:~# mount -t lustre /dev/vgmain/MDT0.b1 /mnt/mdtback -cfs21:~# mount -t lustre /dev/volgroup/OSTb1 /mnt/ostback +cfs21:~# mount -t lustre /dev/vgmain/OST0.b1 /mnt/ostback cfs21:~# mount -t lustre cfs21:/back /mnt/back - Note the old directory contents, as of the snapshot time. - For example: + Note the old directory contents, as of the snapshot time. For example: cfs21:~/cfs/b1_5/lustre/utils# ls /mnt/back fstab passwds @@ -485,12 +483,12 @@ fstab passwds
<indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>deleting</tertiary></indexterm>Deleting Old Snapshots To reclaim disk space, you can erase old snapshots as your backup policy dictates. Run: - lvremove /dev/volgroup/MDTb1 + lvremove /dev/vgmain/MDT0.b1
<indexterm><primary>backup</primary><secondary>using LVM</secondary><tertiary>resizing</tertiary></indexterm>Changing Snapshot Volume Size You can also extend or shrink snapshot volumes if you find your daily deltas are smaller or larger than expected. Run: - lvextend -L10G /dev/volgroup/MDTb1 + lvextend -L10G /dev/vgmain/MDT0.b1 Extending snapshots seems to be broken in older LVM. It is working in LVM v2.02.01. diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml index 08db410..f05d8f3 100644 --- a/BenchmarkingTests.xml +++ b/BenchmarkingTests.xml @@ -143,8 +143,8 @@ The stdout and the .summary file will contain lines like this: - total_size 8388608K rsz 1024 thr 1 crg 1 180.45 MB/s 1 x 180.50 \=/ 180.50 \ -MB/s + total_size 8388608K rsz 1024 thr 1 crg 1 180.45 MB/s 1 x 180.50 \ + = 180.50 MB/s Each line corresponds to a run of the test. Each test run will have a different number of threads, record size, or number of regions. @@ -256,8 +256,8 @@ MB/s Run the obdfilter_survey script with the target=parameter. For example, to run a local test with up to two objects (nobjhi), up to two threads (thrhi), and 1024 Mb (size) transfer size: - $ nobjhi=2 thrhi=2 size=1024 targets='lustre-OST0001 \ -lustre-OST0002' sh obdfilter-survey + $ nobjhi=2 thrhi=2 size=1024 targets="lustre-OST0001 \ + lustre-OST0002" sh obdfilter-survey
@@ -281,8 +281,8 @@ lustre-OST0002' sh obdfilter-survey Run the obdfilter_survey script with the parameters case=network and targets=<hostname|ip_of_server>. For example: - $ nobjhi=2 thrhi=2 size=1024 targets='oss1 oss2' case=network sh obdfilte\ -sh odbfilter-survey + $ nobjhi=2 thrhi=2 size=1024 targets="oss0 oss1" \ + case=network sh odbfilter-survey On the server side, view the statistics at: @@ -327,9 +327,9 @@ sh odbfilter-survey On the OSS nodes to be tested, run the lctl dl command. The OSC device names are listed in the fourth column of the output. For example: $ lctl dl |grep obdfilter 3 UP osc lustre-OST0000-osc-ffff88007754bc00 \ - 54b91eab-0ea9-1516-b571-5e6df349592e 5 + 54b91eab-0ea9-1516-b571-5e6df349592e 5 4 UP osc lustre-OST0001-osc-ffff88007754bc00 \ - 54b91eab-0ea9-1516-b571-5e6df349592e 5 + 54b91eab-0ea9-1516-b571-5e6df349592e 5 ... @@ -341,9 +341,8 @@ sh odbfilter-survey Run the obdfilter_survey script with the target=parameter and case=netdisk. An example of a local test run with up to two objects (nobjhi), up to two threads (thrhi), and 1024 Mb (size) transfer size is shown below: $ nobjhi=2 thrhi=2 size=1024 \ - targets="lustre-OST0000-osc-ffff88007754bc00 \ - lustre-OST0001-osc-ffff88007754bc00" \ - sh obdfilter-survey + targets="lustre-OST0000-osc-ffff88007754bc00 \ + lustre-OST0001-osc-ffff88007754bc00" sh obdfilter-survey @@ -516,25 +515,20 @@ sh odbfilter-survey $ ./ost-survey.sh 10 /mnt/lustre Typical output is: - Average read Speed: 6.73 -Average write Speed: 5.41 -read - Worst OST indx 0 5.84 MB/s -write - Worst OST indx 0 3.77 MB/s -read - Best OST indx 1 7.38 MB/s -write - Best OST indx 1 6.31 MB/s + +Average read Speed: 6.73 +Average write Speed: 5.41 +read - Worst OST indx 0 5.84 MB/s +write - Worst OST indx 0 3.77 MB/s +read - Best OST indx 1 7.38 MB/s +write - Best OST indx 1 6.31 MB/s 3 OST devices found -Ost index 0 Read speed 5.84 Write speed \ - 3.77 -Ost index 0 Read time 0.17 Write time \ - 0.27 -Ost index 1 Read speed 7.38 Write speed \ - 6.31 -Ost index 1 Read time 0.14 Write time \ - 0.16 -Ost index 2 Read speed 6.98 Write speed \ - 6.16 -Ost index 2 Read time 0.14 Write time \ - 0.16 +Ost index 0 Read speed 5.84 Write speed 3.77 +Ost index 0 Read time 0.17 Write time 0.27 +Ost index 1 Read speed 7.38 Write speed 6.31 +Ost index 1 Read time 0.14 Write time 0.16 +Ost index 2 Read speed 6.98 Write speed 6.16 +Ost index 2 Read time 0.14 Write time 0.16
diff --git a/ConfiguringLustre.xml b/ConfiguringLustre.xml index 1d4c1cf..8865ff7 100644 --- a/ConfiguringLustre.xml +++ b/ConfiguringLustre.xml @@ -215,7 +215,7 @@ MGS/MDS node - mdt1 + mdt0 MDS in Lustre file system temp @@ -246,7 +246,7 @@ /mnt/mdt - Mount point for the mdt1 block device (/dev/sdb) on the MGS/MDS node + Mount point for the mdt0 block device (/dev/sdb) on the MGS/MDS node @@ -262,7 +262,7 @@ OSS node - oss1 + oss0 First OSS node in Lustre file system temp @@ -276,7 +276,7 @@ OST - ost1 + ost0 First OST in Lustre file system temp @@ -293,7 +293,7 @@ /dev/sdc - Block device for the first OSS node (oss1) + Block device for the first OSS node (oss0) @@ -304,10 +304,10 @@ mount point - /mnt/ost1 + /mnt/ost0 - Mount point for the ost1 block device (/dev/sdc) on the oss1 node + Mount point for the ost0 block device (/dev/sdc) on the oss1 node @@ -323,7 +323,7 @@ OSS node - oss2 + oss1 Second OSS node in Lustre file system temp @@ -337,7 +337,7 @@ OST - ost2 + ost1 Second OST in Lustre file system temp @@ -352,7 +352,7 @@ /dev/sdd - Block device for the second OSS node (oss2) + Block device for the second OSS node (oss1) @@ -363,10 +363,10 @@ mount point - /mnt/ost2 + /mnt/ost1 - Mount point for the ost2 block device (/dev/sdd) on the oss2 node + Mount point for the ost1 block device (/dev/sdd) on the oss1 node @@ -440,18 +440,18 @@ Writing CONFIGS/mountdata [root@mds /]# mount -t lustre /dev/sdb /mnt/mdt This command generates this output: Lustre: temp-MDT0000: new disk, initializing -Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_group_upcall()) temp-MDT0000: gr\ -oup upcall set to /usr/sbin/l_getgroups +Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_group_upcall()) temp-MDT0000: +group upcall set to /usr/sbin/l_getgroups Lustre: temp-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups Lustre: Server temp-MDT0000 on device /dev/sdb has started - Create and mount ost1. - In this example, the OSTs (ost1 and ost2) are being created on different OSSs (oss1 and oss2 respectively). + Create and mount ost0. + In this example, the OSTs (ost0 and ost1) are being created on different OSS nodes (oss0 and oss1 respectively). - Create ost1. On oss1 node, run: - [root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost --index=0 /dev/sdc + Create ost0. On oss0 node, run: + [root@oss0 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost --index=0 /dev/sdc The command generates this output: Permanent disk data: Target: temp-OST0000 @@ -475,8 +475,8 @@ dir_index,uninit_groups -F /dev/sdc Writing CONFIGS/mountdata - Mount ost1 on the OSS on which it was created. On oss1 node, run: - root@oss1 /] mount -t lustre /dev/sdc /mnt/ost1 + Mount ost0 on the OSS on which it was created. On oss0 node, run: + root@oss0 /] mount -t lustre /dev/sdc /mnt/ost0 The command generates this output: LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled @@ -489,12 +489,12 @@ Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans - Create and mount ost2. + Create and mount ost1. - Create ost2. On oss2 node, run: - [root@oss2 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \ ---ost --index=1 /dev/sdd + Create ost1. On oss1 node, run: + [root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \ + --ost --index=1 /dev/sdd The command generates this output: Permanent disk data: Target: temp-OST0001 @@ -518,8 +518,8 @@ dir_index,uninit_groups -F /dev/sdc Writing CONFIGS/mountdata - Mount ost2 on the OSS on which it was created. On oss2 node, run: - root@oss2 /] mount -t lustre /dev/sdd /mnt/ost2 + Mount ost1 on the OSS on which it was created. On oss1 node, run: + root@oss1 /] mount -t lustre /dev/sdd /mnt/ost1 The command generates this output: LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled @@ -544,27 +544,23 @@ Lustre: MDS temp-MDT0000: temp-OST0001_UUID now active, resetting orphans Run the lfs df -h command: [root@client1 /] lfs df -h The lfs df -h command lists space usage per OST and the MDT in human-readable format. This command generates output similar to this: - UUID bytes Used Available \ - Use% Mounted on -temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% \ -/lustre[MDT:0] -temp-OST0000_UUID 800.0G 400.0M 799.6G 0% \ -/lustre[OST:0] -temp-OST0001_UUID 800.0G 400.0M 799.6G 0% \ -/lustre[OST:1] -filesystem summary: 1.6T 800.0M 1.6T \ -0% /lustre + +UUID bytes Used Available Use% Mounted on +temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% /lustre[MDT:0] +temp-OST0000_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:0] +temp-OST0001_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:1] +filesystem summary: 1.6T 800.0M 1.6T 0% /lustre Run the lfs df -ih command. [root@client1 /] lfs df -ih The lfs df -ih command lists inode usage per OST and the MDT. This command generates output similar to this: - UUID Inodes IUsed IFree IUse% Mounted\ - on -temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0] -temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0] -temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1] -filesystem summary: 2.5M 32 2.5M 0% /lustre + +UUID Inodes IUsed IFree IUse% Mounted on +temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0] +temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0] +temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1] +filesystem summary: 2.5M 32 2.5M 0% /lustre Run the dd command: diff --git a/ConfiguringQuotas.xml b/ConfiguringQuotas.xml index fa124b7..be9fe46 100644 --- a/ConfiguringQuotas.xml +++ b/ConfiguringQuotas.xml @@ -81,9 +81,6 @@ ksocklnd 111812 1 The Lustre mount command no longer recognizes the usrquota and grpquota options. If they were previously specified, remove them from /etc/fstab. When quota is enabled, it is enabled for all file system clients (started automatically using quota_type or manually with lfs quotaon). - - Lustre with the Linux kernel 2.4 does not support quotas. - To enable quotas automatically when the file system is started, you must set the mdt.quota_type and ost.quota_type parameters, respectively, on the MDT and OSTs. The parameters can be set to the string u (user), g (group) or ug for both users and groups. You can enable quotas at mkfs time (mkfs.lustre --param mdt.quota_type=ug) or with tunefs.lustre. As an example: tunefs.lustre --param ost.quota_type=ug $ost_dev diff --git a/LustreMaintenance.xml b/LustreMaintenance.xml index 82d4889..6c75b6e 100644 --- a/LustreMaintenance.xml +++ b/LustreMaintenance.xml @@ -43,7 +43,8 @@ maintanceinactive OSTs Working with Inactive OSTs To mount a client or an MDT with one or more inactive OSTs, run commands similar to this: - client> mount -o exclude=testfs-OST0000 -t lustre uml1:/testfs\ /mnt/testfs + client> mount -o exclude=testfs-OST0000 -t lustre \ + uml1:/testfs /mnt/testfs client> cat /proc/fs/lustre/lov/testfs-clilov-*/target_obd To activate an inactive OST on a live client or MDT, use the lctl activate command on the OSC device. For example: lctl --device 7 activate @@ -359,8 +360,10 @@ Removing and Restoring OSTs [client]# lfs find --obd <OST UUID> <mount_point> | lfs_migrate -y - If the OST is no longer available, delete the files on that OST and restore them from backup: [client]# lfs find --obd <OST UUID> -print0 <mount_point> | \ - tee /tmp/files_to_restore | xargs -0 -n 1 unlinkThe list of files that need to be restored from backup is stored in /tmp/files_to_restore. Restoring these files is beyond the scope of this document. + If the OST is no longer available, delete the files on that OST and restore them from backup: + [client]# lfs find --obd <OST UUID> -print0 <mount_point> | \ + tee /tmp/files_to_restore | xargs -0 -n 1 unlink + The list of files that need to be restored from backup is stored in /tmp/files_to_restore. Restoring these files is beyond the scope of this document. @@ -398,7 +401,7 @@ Backing Up OST Configuration Files Back up the OST configuration files. [oss]# tar cvf {ostname}.tar -C /mnt/ost last_rcvd \ -CONFIGS/ O/0/LAST_ID + CONFIGS/ O/0/LAST_ID @@ -423,8 +426,8 @@ Restoring OST Configuration Files Format the OST file system. - [oss]# mkfs.lustre --ost --index {OST index} {other options} \ -{newdev} + [oss]# mkfs.lustre --ost --index={old OST index} {other options} \ + {newdev} @@ -445,9 +448,9 @@ Restoring OST Configuration Files The CONFIGS/mountdata file is created by mkfs.lustre at format time, but has flags set that request it to register itself with the MGS. It is possible to copy these flags from another working OST (which should be the same): -[oss2]# debugfs -c -R "dump CONFIGS/mountdata /tmp/ldd" {other_osdev} -[oss2]# scp /tmp/ldd oss:/tmp/ldd -[oss]# dd if=/tmp/ldd of=/mnt/ost/CONFIGS/mountdata bs=4 count=1 seek=5 skip=5 +[oss1]# debugfs -c -R "dump CONFIGS/mountdata /tmp/ldd" {other_osdev} +[oss1]# scp /tmp/ldd oss:/tmp/ldd +[oss0]# dd if=/tmp/ldd of=/mnt/ost/CONFIGS/mountdata bs=4 count=1 seek=5 skip=5 diff --git a/LustreOperations.xml b/LustreOperations.xml index 504a676..397a4e2 100644 --- a/LustreOperations.xml +++ b/LustreOperations.xml @@ -57,7 +57,7 @@ Mounting by Label Mount-by-label should NOT be used in a multi-path environment. Although the file system name is internally limited to 8 characters, you can mount the clients at any mount point, so file system users are not subjected to short names. Here is an example: - mount -t lustre uml1@tcp0:/shortfs /mnt/<long-file_system-name> + mount -t lustre mds0@tcp0:/shortfs /mnt/<long-file_system-name>
<indexterm><primary>operations</primary><secondary>starting</secondary></indexterm>Starting Lustre @@ -116,10 +116,10 @@ LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0 In failover mode, Lustre clients wait for the OST to recover. - By default, the Lustre file system uses failover mode for OSTs. To specify failout mode instead, run this command: + By default, the Lustre file system uses failover mode for OSTs. To specify failout mode instead, use the --param="failover.mode=failout" option: $ mkfs.lustre --fsname=<fsname> --mgsnode=<MGS node NID> --param="failover.mode=failout" --ost --index="OST index" <block device name> - In this example, failout mode is specified for the OSTs on MGS uml1, file system testfs. - $ mkfs.lustre --fsname=testfs --mgsnode=uml1 --param="failover.mode=failout" --ost --index=3 /dev/sdb + In this example, failout mode is specified for the OSTs on MGS mds0, file system testfs. + $ mkfs.lustre --fsname=testfs --mgsnode=mds0 --param="failover.mode=failout" --ost --index=3 /dev/sdb Before running this command, unmount all OSTs that will be affected by the change in the failover/failout mode. @@ -192,14 +192,14 @@ ossbarnode# mkfs.lustre --fsname=bar --mgsnode=mgsnode@tcp0 --ost --index=1 /dev
- Setting Parameters with <literal>mkfs.lustre</literal> - When the file system is created, parameters can simply be added as a --param option to the mkfs.lustre command. For example: + Setting Tunable Parameters with <literal>mkfs.lustre</literal> + When the file system is first formatted, parameters can simply be added as a --param option to the mkfs.lustre command. For example: $ mkfs.lustre --mdt --param="sys.timeout=50" /dev/sda For more details about creating a file system,see . For more details about mkfs.lustre, see .
Setting Parameters with <literal>tunefs.lustre</literal> - If a server (OSS or MDS) is stopped, parameters can be added using the --param option to the tunefs.lustre command. For example: + If a server (OSS or MDS) is stopped, parameters can be added to an existing filesystem using the --param option to the tunefs.lustre command. For example: $ tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda With tunefs.lustre, parameters are "additive" -- new parameters are specified in addition to old parameters, they do not replace them. To erase all old tunefs.lustre parameters and just use newly-specified parameters, run: $ tunefs.lustre --erase-params --param=<new parameters> @@ -259,13 +259,13 @@ $ lctl conf_param testfs.sys.timeout=40 This example reports data on RPC service times. $ lctl get_param -n ost.*.ost_io.timeouts service : cur 1 worst 30 (at 1257150393, 85d23h58m54s ago) 1 1 1 1 - This example reports the number of inodes available on each OST. - # lctl get_param osc.*.filesfree -osc.myth-OST0000-osc-ffff88006dd20000.filesfree=217623 -osc.myth-OST0001-osc-ffff88006dd20000.filesfree=5075042 -osc.myth-OST0002-osc-ffff88006dd20000.filesfree=3762034 -osc.myth-OST0003-osc-ffff88006dd20000.filesfree=91052 -osc.myth-OST0004-osc-ffff88006dd20000.filesfree=129651 + This example reports the amount of space this client has reserved for writeback cache with each OST: + # lctl get_param osc.*.cur_grant_bytes +osc.myth-OST0000-osc-ffff8800376bdc00.cur_grant_bytes=2097152 +osc.myth-OST0001-osc-ffff8800376bdc00.cur_grant_bytes=33890304 +osc.myth-OST0002-osc-ffff8800376bdc00.cur_grant_bytes=35418112 +osc.myth-OST0003-osc-ffff8800376bdc00.cur_grant_bytes=2097152 +osc.myth-OST0004-osc-ffff8800376bdc00.cur_grant_bytes=33808384
@@ -274,23 +274,23 @@ osc.myth-OST0004-osc-ffff88006dd20000.filesfree=129651
If a node has multiple network interfaces, it may have multiple NIDs. When a node is specified, all of its NIDs must be listed, delimited by commas (,) so other nodes can choose the NID that is appropriate for their network interfaces. When failover nodes are specified, they are delimited by a colon (:) or by repeating a keyword (--mgsnode= or --failnode=). To obtain all NIDs from a node (while LNET is running), run: lctl list_nids This displays the server's NIDs (networks configured to work with Lustre). - This example has a combined MGS/MDT failover pair on uml1 and uml2, and a OST failover pair on uml3 and uml4. There are corresponding Elan addresses on uml1 and uml2. - uml1> mkfs.lustre --fsname=testfs --mgs --mdt --index=0 --failnode=uml2,2@elan /dev/sda1 -uml1> mount -t lustre /dev/sda1 /mnt/test/mdt -uml3> mkfs.lustre --fsname=testfs --failnode=uml4 --mgsnode=uml1,1@elan \ ---mgsnode=uml2,2@elan --ost --index=0 /dev/sdb -uml3> mount -t lustre /dev/sdb /mnt/test/ost0 -client> mount -t lustre uml1,1@elan:uml2,2@elan:/testfs /mnt/testfs -uml1> umount /mnt/mdt -uml2> mount -t lustre /dev/sda1 /mnt/test/mdt -uml2> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status - Where multiple NIDs are specified, comma-separation (for example, uml2,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the "best" one for communication. Colon-separation (for example, uml1:uml2) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.) + This example has a combined MGS/MDT failover pair on mds0 and mds1, and a OST failover pair on oss0 and oss1. There are corresponding Elan addresses on mds0 and mds1. + mds0> mkfs.lustre --fsname=testfs --mdt --mgs --failnode=mds1,2@elan /dev/sda1 +mds0> mount -t lustre /dev/sda1 /mnt/test/mdt +oss0> mkfs.lustre --fsname=testfs --failnode=oss1 --ost --index=0 \ + --mgsnode=mds0,1@elan --mgsnode=mds1,2@elan /dev/sdb +oss0> mount -t lustre /dev/sdb /mnt/test/ost0 +client> mount -t lustre mds0,1@elan:mds1,2@elan:/testfs /mnt/testfs +mds0> umount /mnt/mdt +mds1> mount -t lustre /dev/sda1 /mnt/test/mdt +mds1> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status + Where multiple NIDs are specified, comma-separation (for example, mds1,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the "best" one for communication. Colon-separation (for example, mds0:mds1) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.) If you have an MGS or MDT configured for failover, perform these steps: - On the OST, list the NIDs of all MGS nodes at mkfs time. - OST# mkfs.lustre --fsname sunfs --mgsnode=10.0.0.1 \ + On the oss0 node, list the NIDs of all MGS nodes at mkfs time. + oss0# mkfs.lustre --fsname sunfs --mgsnode=10.0.0.1 \ --mgsnode=10.0.0.2 --ost --index=0 /dev/sdb @@ -303,7 +303,7 @@ uml2> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status
<indexterm><primary>operations</primary><secondary>erasing a file system</secondary></indexterm>Erasing a File System If you want to erase a file system, run this command on your targets: - $ "mkfs.lustre -reformat" + $ "mkfs.lustre --reformat" If you are using a separate MGS and want to keep other file systems defined on that MGS, then set the writeconf flag on the MDT for that file system. The writeconf flag causes the configuration logs to be erased; they are regenerated the next time the servers start. To set the writeconf flag on the MDT: @@ -316,8 +316,8 @@ uml2> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status $ mkfs.lustre --reformat --fsname spfs --mgs --mdt --index=0 /dev/sda - If you have a separate MGS (that you do not want to reformat), then add the "writeconf" flag to mkfs.lustre on the MDT, run: - $ mkfs.lustre --reformat --writeconf -fsname spfs --mgs --mdt --index=0 /dev/sda + If you have a separate MGS (that you do not want to reformat), then add the "--writeconf" flag to mkfs.lustre on the MDT, run: + $ mkfs.lustre --reformat --writeconf --fsname spfs --mgs --mdt /dev/sda diff --git a/ManagingFileSystemIO.xml b/ManagingFileSystemIO.xml index 55b01ec..665a5e0 100644 --- a/ManagingFileSystemIO.xml +++ b/ManagingFileSystemIO.xml @@ -350,31 +350,28 @@ from 192.168.1.1@tcp inum 8991479/2386814769 object 1127239/0 extent [10240\ 0-106495] If this happens, the client will re-read or re-write the affected data up to five times to get a good copy of the data over the network. If it is still not possible, then an I/O error is returned to the application. To enable both types of checksums (in-memory and wire), run: - echo 1 > /proc/fs/lustre/llite/<fsname>/checksum_pages + lctl set_param llite.*.checksum_pages=1 To disable both types of checksums (in-memory and wire), run: - echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pages + lctl set_param llite.*.checksum_pages=0 To check the status of a wire checksum, run: lctl get_param osc.*.checksums
Changing Checksum Algorithms - By default, Lustre uses the adler32 checksum algorithm, because it is robust and has a lower impact on performance than crc32. The Lustre administrator can change the checksum algorithm via /proc, depending on what is supported in the kernel. + By default, Lustre uses the adler32 checksum algorithm, because it is robust and has a lower impact on performance than crc32. The Lustre administrator can change the checksum algorithm via lctl get_param, depending on what is supported in the kernel. To check which checksum algorithm is being used by Lustre, run: - $ cat /proc/fs/lustre/osc/<fsname>-OST<index>-osc-*/checksum_type + $ lctl get_param osc.*.checksum_type To change the wire checksum algorithm used by Lustre, run: - $ echo <algorithm name> /proc/fs/lustre/osc/<fsname>-OST<index>- \osc-*/checksum_\ -type + $ lctl set_param osc.*.checksum_type=<algorithm name> The in-memory checksum always uses the adler32 algorithm, if available, and only falls back to crc32 if adler32 cannot be used. - In the following example, the cat command is used to determine that Lustre is using the adler32 checksum algorithm. Then the echo command is used to change the checksum algorithm to crc32. A second cat command confirms that the crc32 checksum algorithm is now in use. - $ cat /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_ty\ -pe -crc32 [adler] -$ echo crc32 > /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/che\ -cksum_type -$ cat /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_ty\ -pe -[crc32] adler + In the following example, the lctl get_param command is used to determine that Lustre is using the adler32 checksum algorithm. Then the lctl set_param command is used to change the checksum algorithm to crc32. A second lctl get_param command confirms that the crc32 checksum algorithm is now in use. + $ lctl get_param osc.*.checksum_type +osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32 [adler] +$ lctl set_param osc.*.checksum_type=crc32 +osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=crc32 +$ lctl get_param osc.*.checksum_type +osc.lustre-OST0000-osc-ffff81012b2c48e0.checksum_type=[crc32] adler
diff --git a/ManagingSecurity.xml b/ManagingSecurity.xml index 44fcc5c..aa76198 100644 --- a/ManagingSecurity.xml +++ b/ManagingSecurity.xml @@ -148,7 +148,7 @@ lctl get_param mds.Lustre-MDT000*.nosquash_nids The lctl conf_param value overwrites the parameter's previous value. If the new value uses an incorrect syntax, then the system continues with the old parameters and the previously-correct value is lost on remount. That is, be careful doing root squash tuning.
- mkfs.lustre and tunefs.lustre do not perform syntax checking. If the root squash parameters are incorrect, they are ignored on mount and the default values are used instead. + mkfs.lustre and tunefs.lustre do not perform parameter syntax checking. If the root squash parameters are incorrect, they are ignored on mount and the default values are used instead. Root squash parameters are parsed with rigorous syntax checking. The root_squash parameter should be specified as <decnum>':'<decnum>. The nosquash_nids parameter should follow LNET NID range list syntax. -- 1.8.3.1