From 6f654cb3651940f0e07146a6e1ae8e54eb26b332 Mon Sep 17 00:00:00 2001 From: Linda Bebernes Date: Tue, 27 Aug 2013 15:02:15 -0700 Subject: [PATCH] LUDOC-45 bugfixes: Changed iokit script names from *_survey to *-survey Fixed iokit script names in Ch 10, Ch 24, and Ch 36. Signed-off-by: Linda Bebernes Change-Id: I41e77b06c18c8a141121b92e93f31214d85b1127 Reviewed-on: http://review.whamcloud.com/7474 Tested-by: Hudson Reviewed-by: Minh Diep Reviewed-by: Richard Henwood --- BenchmarkingTests.xml | 165 ++++++++++++++++++++++++++++++--------- ConfiguringLustre.xml | 16 +++- SystemConfigurationUtilities.xml | 25 ++++-- 3 files changed, 155 insertions(+), 51 deletions(-) diff --git a/BenchmarkingTests.xml b/BenchmarkingTests.xml index 3502cba..fb300ec 100644 --- a/BenchmarkingTests.xml +++ b/BenchmarkingTests.xml @@ -36,13 +36,18 @@ The I/O kit contains three tests, each of which tests a progressively higher layer in the Lustre stack: - sgpdd_survey - Measure basic 'bare metal' performance of devices while bypassing the kernel block device layers, buffer cache, and file system. + sgpdd-survey - Measure basic 'bare metal' performance + of devices while bypassing the kernel block device layers, buffer cache, and file + system. - obdfilter_survey - Measure the performance of one or more OSTs directly on the OSS node or alternately over the network from a Lustre client. + obdfilter-survey - Measure the performance of one or more OSTs + directly on the OSS node or alternately over the network from a Lustre client. - ost_survey - Performs I/O against OSTs individually to allow performance comparisons to detect if an OST is performing suboptimally due to hardware issues. + ost-survey - Performs I/O against OSTs individually to allow + performance comparisons to detect if an OST is performing suboptimally due to hardware + issues. Typically with these tests, Lustre should deliver 85-90% of the raw device performance. @@ -70,8 +75,15 @@
- <indexterm><primary>benchmarking</primary><secondary>raw hardware with sgpdd_survey</secondary></indexterm>Testing I/O Performance of Raw Hardware (<literal>sgpdd_survey</literal>) - The sgpdd_survey tool is used to test bare metal I/O performance of the raw hardware, while bypassing as much of the kernel as possible. This survey may be used to characterize the performance of a SCSI device by simulating an OST serving multiple stripe files. The data gathered by this survey can help set expectations for the performance of a Lustre OST using this device. + <indexterm> + <primary>benchmarking</primary> + <secondary>raw hardware with sgpdd-survey</secondary> + </indexterm>Testing I/O Performance of Raw Hardware (<literal>sgpdd-survey</literal>) + The sgpdd-survey tool is used to test bare metal I/O performance of the + raw hardware, while bypassing as much of the kernel as possible. This survey may be used to + characterize the performance of a SCSI device by simulating an OST serving multiple stripe + files. The data gathered by this survey can help set expectations for the performance of a + Lustre OST using this device. The script uses sgp_dd to carry out raw sequential disk I/O. It runs with variable numbers of sgp_dd threads to show how performance varies with different request queue depths. The script spawns variable numbers of sgp_dd instances, each reading or writing a separate area of the disk to demonstrate performance variance within a number of concurrent stripe files. Several tips and insights for disk performance measurement are described below. Some of this information is specific to RAID arrays and/or the Linux RAID implementation. @@ -86,7 +98,8 @@ - The sgpdd_survey script overwrites the device being tested, which results in the + The sgpdd-survey script overwrites the device being tested, which + results in the LOSS OF ALL DATA on that device. Exercise caution when selecting the device to be tested. @@ -113,7 +126,9 @@ Raw and SCSI devices cannot be mixed in the test specification. - If you need to create raw devices to use the sgpdd_survey tool, note that raw device 0 cannot be used due to a bug in certain versions of the "raw" utility (including that shipped with RHEL4U4.) + If you need to create raw devices to use the sgpdd-survey tool, note + that raw device 0 cannot be used due to a bug in certain versions of the "raw" + utility (including that shipped with RHEL4U4.)
<indexterm><primary>benchmarking</primary><secondary>tuning storage</secondary></indexterm>Tuning Linux Storage Devices @@ -131,9 +146,15 @@
- Running sgpdd_survey - The sgpdd_survey script must be customized for the particular device being tested and for the location where the script saves its working and result files (by specifying the ${rslt} variable). Customization variables are described at the beginning of the script. - When the sgpdd_survey script runs, it creates a number of working files and a pair of result files. The names of all the files created start with the prefix defined in the variable ${rslt}. (The default value is /tmp.) The files include: + Running sgpdd-survey + The sgpdd-survey script must be customized for the particular device + being tested and for the location where the script saves its working and result files (by + specifying the ${rslt} variable). Customization variables are described + at the beginning of the script. + When the sgpdd-survey script runs, it creates a number of working + files and a pair of result files. The names of all the files created start with the prefix + defined in the variable ${rslt}. (The default value is + /tmp.) The files include: File containing standard output data (same as stdout) @@ -180,13 +201,23 @@
- <indexterm><primary>benchmarking</primary><secondary>OST performance</secondary></indexterm>Testing OST Performance (<literal>obdfilter_survey</literal>) - The obdfilter_survey script generates sequential I/O from varying numbers of threads and objects (files) to simulate the I/O patterns of a Lustre client. - The obdfilter_survey script can be run directly on the OSS node to measure the OST storage performance without any intervening network, or it can be run remotely on a Lustre client to measure the OST performance including network overhead. - The obdfilter_survey is used to characterize the performance of the following: + <indexterm> + <primary>benchmarking</primary> + <secondary>OST performance</secondary> + </indexterm>Testing OST Performance (<literal>obdfilter-survey</literal>) + The obdfilter-survey script generates sequential I/O from varying + numbers of threads and objects (files) to simulate the I/O patterns of a Lustre client. + The obdfilter-survey script can be run directly on the OSS node to + measure the OST storage performance without any intervening network, or it can be run remotely + on a Lustre client to measure the OST performance including network overhead. + The obdfilter-survey is used to characterize the performance of the + following: - Local file system - In this mode, the obdfilter_survey script exercises one or more instances of the obdfilter directly. The script may run on one or more OSS nodes, for example, when the OSSs are all attached to the same multi-ported disk subsystem. + Local file system - In this mode, the + obdfilter-survey script exercises one or more instances of the + obdfilter directly. The script may run on one or more OSS nodes, for example, when the + OSSs are all attached to the same multi-ported disk subsystem. Run the script using the case=disk parameter to run the test against all the local OSTs. The script automatically detects all local OSTs and includes them in the survey. To run the test against only specific OSTs, run the script using the target=parameter to list the OSTs to be tested explicitly. If some OSTs are on remote nodes, specify their hostnames in addition to the OST name (for example, oss2:lustre-OST0004). All obdfilter instances are driven directly. The script automatically loads the obdecho module (if required) and creates one instance of echo_client for each obdfilter instance in order to generate I/O requests directly to the OST. @@ -198,29 +229,53 @@ For more details, see - Remote file system over the network - In this mode the obdfilter_survey script generates I/O from a Lustre client to a remote OSS to write the data to the file system. + Remote file system over the network - In this mode + the obdfilter-survey script generates I/O from a Lustre client to a + remote OSS to write the data to the file system. To run the test against all the local OSCs, pass the parameter case=netdisk to the script. Alternately you can pass the target= parameter with one or more OSC devices (e.g., lustre-OST0000-osc-ffff88007754bc00) against which the tests are to be run. For more details, see . - The obdfilter_survey script is potentially destructive and there is a small risk data may be lost. To reduce this risk, obdfilter_survey should not be run on devices that contain data that needs to be preserved. Thus, the best time to run obdfilter_survey is before the Lustre file system is put into production. The reason obdfilter_survey may be safe to run on a production file system is because it creates objects with object sequence 2. Normal file system objects are typically created with object sequence 0. + The obdfilter-survey script is potentially destructive and there is a + small risk data may be lost. To reduce this risk, obdfilter-survey should + not be run on devices that contain data that needs to be preserved. Thus, the best time to + run obdfilter-survey is before the Lustre file system is put into + production. The reason obdfilter-survey may be safe to run on a + production file system is because it creates objects with object sequence 2. Normal file + system objects are typically created with object sequence 0. - If the obdfilter_survey test is terminated before it completes, some small amount of space is leaked. you can either ignore it or reformat the file system. + If the obdfilter-survey test is terminated before it completes, some + small amount of space is leaked. you can either ignore it or reformat the file + system. - The obdfilter_survey script is NOT scalable beyond tens of OSTs since it is only intended to measure the I/O performance of individual storage subsystems, not the scalability of the entire system. + The obdfilter-survey script is NOT scalable + beyond tens of OSTs since it is only intended to measure the I/O performance of individual + storage subsystems, not the scalability of the entire system. - The obdfilter_survey script must be customized, depending on the components under test and where the script's working files should be kept. Customization variables are described at the beginning of the obdfilter_survey script. In particular, pay attention to the listed maximum values listed for each parameter in the script. + The obdfilter-survey script must be customized, depending on the + components under test and where the script's working files should be kept. + Customization variables are described at the beginning of the + obdfilter-survey script. In particular, pay attention to the listed + maximum values listed for each parameter in the script.
<indexterm><primary>benchmarking</primary><secondary>local disk</secondary></indexterm>Testing Local Disk Performance - The obdfilter_survey script can be run automatically or manually against a local disk. This script profiles the overall throughput of storage hardware, including the file system and RAID layers managing the storage, by sending workloads to the OSTs that vary in thread count, object count, and I/O size. - When the obdfilter_survey script is run, it provides information about the performance abilities of the storage hardware and shows the saturation points. - The plot-obdfilter script generates from the output of the obdfilter_survey a CSV file and parameters for importing into a spreadsheet or gnuplot to visualize the data. - To run the obdfilter_survey script, create a standard Lustre configuration; no special setup is needed. + The obdfilter-survey script can be run automatically or manually + against a local disk. This script profiles the overall throughput of storage hardware, + including the file system and RAID layers managing the storage, by sending workloads to the + OSTs that vary in thread count, object count, and I/O size. + When the obdfilter-survey script is run, it provides information + about the performance abilities of the storage hardware and shows the saturation + points. + The plot-obdfilter script generates from the output of the + obdfilter-survey a CSV file and parameters for importing into a + spreadsheet or gnuplot to visualize the data. + To run the obdfilter-survey script, create a standard Lustre + configuration; no special setup is needed. To perform an automatic run: @@ -232,7 +287,8 @@ modprobe obdecho - Run the obdfilter_survey script with the parameter case=disk. + Run the obdfilter-survey script with the parameter + case=disk. For example, to run a local test with up to two objects (nobjhi), up to two threads (thrhi), and 1024 MB transfer size (size): $ nobjhi=2 thrhi=2 size=1024 case=disk sh obdfilter-survey @@ -262,7 +318,8 @@ (for example, lustre-OST0001). You do not have to specify an MDS or LOV. - Run the obdfilter_survey script with the target=parameter. + Run the obdfilter-survey script with the + target=parameter. For example, to run a local test with up to two objects (nobjhi), up to two threads (thrhi), and 1024 Mb (size) transfer size: $ nobjhi=2 thrhi=2 size=1024 targets="lustre-OST0001 \ lustre-OST0002" sh obdfilter-survey @@ -271,7 +328,8 @@
<indexterm><primary>benchmarking</primary><secondary>network</secondary></indexterm>Testing Network Performance - The obdfilter_survey script can only be run automatically against a network; no manual test is provided. + The obdfilter-survey script can only be run automatically against a + network; no manual test is provided. To run the network test, a specific Lustre setup is needed. Make sure that these configuration requirements have been met. To perform an automatic run: @@ -288,7 +346,10 @@ lctl dl - Run the obdfilter_survey script with the parameters case=network and targets=hostname|ip_of_server. For example: + Run the obdfilter-survey script with the parameters + case=network and + targets=hostname|ip_of_server. For + example: $ nobjhi=2 thrhi=2 size=1024 targets="oss0 oss1" \ case=network sh odbfilter-survey @@ -302,7 +363,9 @@
<indexterm><primary>benchmarking</primary><secondary>remote disk</secondary></indexterm>Testing Remote Disk Performance - The obdfilter_survey script can be run automatically or manually against a network disk. To run the network disk test, start with a standard Lustre configuration. No special setup is needed. + The obdfilter-survey script can be run automatically or manually + against a network disk. To run the network disk test, start with a standard Lustre + configuration. No special setup is needed. To perform an automatic run: @@ -314,7 +377,8 @@ modprobe obdecho - Run the obdfilter_survey script with the parameter case=netdisk. For example: + Run the obdfilter-survey script with the parameter + case=netdisk. For example: $ nobjhi=2 thrhi=2 size=1024 case=netdisk sh obdfilter-survey @@ -345,7 +409,9 @@ Use the target=parameter to list the OSCs separated by spaces. List the individual OSCs by name separated by spaces using the format fsname-OST_name-osc-instance (for example, lustre-OST0000-osc-ffff88007754bc00). You do not have to specify an MDS or LOV. - Run the obdfilter_survey script with the target=osc and case=netdisk. + Run the obdfilter-survey script with the + target=osc and + case=netdisk. An example of a local test run with up to two objects (nobjhi), up to two threads (thrhi), and 1024 Mb (size) transfer size is shown below: $ nobjhi=2 thrhi=2 size=1024 \ targets="lustre-OST0000-osc-ffff88007754bc00 \ @@ -356,7 +422,9 @@
Output Files - When the obdfilter_survey script runs, it creates a number of working files and a pair of result files. All files start with the prefix defined in the variable ${rslt}. + When the obdfilter-survey script runs, it creates a number of working + files and a pair of result files. All files start with the prefix defined in the variable + ${rslt}. @@ -407,13 +475,20 @@ - The obdfilter_survey script iterates over the given number of threads and objects performing the specified tests and checks that all test processes have completed successfully. + The obdfilter-survey script iterates over the given number of threads + and objects performing the specified tests and checks that all test processes have completed + successfully. - The obdfilter_survey script may not clean up properly if it is aborted or if it encounters an unrecoverable error. In this case, a manual cleanup may be required, possibly including killing any running instances of lctl (local or remote), removing echo_client instances created by the script and unloading obdecho. + The obdfilter-survey script may not clean up properly if it is + aborted or if it encounters an unrecoverable error. In this case, a manual cleanup may be + required, possibly including killing any running instances of lctl + (local or remote), removing echo_client instances created by the script + and unloading obdecho.
Script Output - The .summary file and stdout of the obdfilter_survey script contain lines like: + The .summary file and stdout of the + obdfilter-survey script contain lines like: ost 8 sz 67108864K rsz 1024 obj 8 thr 8 write 613.54 [ 64.00, 82.00] Where: @@ -505,7 +580,11 @@
Visualizing Results - It is useful to import the obdfilter_survey script summary data (it is fixed width) into Excel (or any graphing package) and graph the bandwidth versus the number of threads for varying numbers of concurrent regions. This shows how the OSS performs for a given number of concurrently-accessed objects (files) with varying numbers of I/Os in flight. + It is useful to import the obdfilter-survey script summary data (it + is fixed width) into Excel (or any graphing package) and graph the bandwidth versus the + number of threads for varying numbers of concurrent regions. This shows how the OSS + performs for a given number of concurrently-accessed objects (files) with varying numbers + of I/Os in flight. It is also useful to monitor and record average disk I/O sizes during each test using the 'disk io size' histogram in the file /proc/fs/lustre/obdfilter/ (see for details). These numbers help identify problems in the system when full-sized I/Os are not submitted to the underlying disk. This may be caused by problems in the device driver or Linux block layer. */brw_stats The plot-obdfilter script included in the I/O toolkit is an example of processing output files to a .csv format and plotting a graph using gnuplot. @@ -513,12 +592,20 @@
- <indexterm><primary>benchmarking</primary><secondary>OST I/O</secondary></indexterm>Testing OST I/O Performance (<literal>ost_survey</literal>) - The ost_survey tool is a shell script that uses lfs setstripe to perform I/O against a single OST. The script writes a file (currently using dd) to each OST in the Lustre file system, and compares read and write speeds. The ost_survey tool is used to detect anomalies between otherwise identical disk subsystems. + <indexterm> + <primary>benchmarking</primary> + <secondary>OST I/O</secondary> + </indexterm>Testing OST I/O Performance (<literal>ost-survey</literal>) + The ost-survey tool is a shell script that uses lfs + setstripe to perform I/O against a single OST. The script writes a file (currently + using dd) to each OST in the Lustre file system, and compares read and + write speeds. The ost-survey tool is used to detect anomalies between + otherwise identical disk subsystems. We have frequently discovered wide performance variations across all LUNs in a cluster. This may be caused by faulty disks, RAID parity reconstruction during the test, or faulty network hardware. - To run the ost_survey script, supply a file size (in KB) and the Lustre mount point. For example, run: + To run the ost-survey script, supply a file size (in KB) and the Lustre + mount point. For example, run: $ ./ost-survey.sh 10 /mnt/lustre Typical output is: diff --git a/ConfiguringLustre.xml b/ConfiguringLustre.xml index 417d410..438ae39 100644 --- a/ConfiguringLustre.xml +++ b/ConfiguringLustre.xml @@ -43,11 +43,16 @@ For information about configuring LNET, see . For information about testing LNET, see . - Run the benchmark script sgpdd_survey to determine baseline performance of your hardware. Benchmarking your hardware will simplify debugging performance issues that are unrelated to Lustre and ensure you are getting the best possible performance with your installation. For information about running sgpdd_survey, see . + Run the benchmark script sgpdd-survey to determine + baseline performance of your hardware. Benchmarking your hardware will + simplify debugging performance issues that are unrelated to Lustre and ensure you are + getting the best possible performance with your installation. For information about + running sgpdd-survey, see . - The sgpdd_survey script overwrites the device being tested so it must be run before the OSTs are configured. + The sgpdd-survey script overwrites the device being tested so it must + be run before the OSTs are configured. To configure a simple Lustre file system, complete these steps: @@ -111,10 +116,13 @@ (Optional) Run benchmarking tools to validate the performance of hardware and software layers in the cluster. Available tools include: - obdfilter_survey - Characterizes the storage performance of a Lustre file system. For details, see . + obdfilter-survey - Characterizes the storage performance of a + Lustre file system. For details, see . - ost_survey - Performs I/O against OSTs to detect anomalies between otherwise identical disk subsystems. For details, see . + ost-survey - Performs I/O against OSTs to detect anomalies + between otherwise identical disk subsystems. For details, see . diff --git a/SystemConfigurationUtilities.xml b/SystemConfigurationUtilities.xml index 534b63f..4dd84ff 100644 --- a/SystemConfigurationUtilities.xml +++ b/SystemConfigurationUtilities.xml @@ -2618,24 +2618,33 @@ lr_reader The following utilities are part of the Lustre I/O kit. For more information, see .
- <indexterm><primary>sgpdd_survey</primary></indexterm> -sgpdd_survey - The sgpdd_survey utility tests 'bare metal' performance, bypassing as much of the kernel as possible. The sgpdd_survey tool does not require Lustre, but it does require the sgp_dd package. + <indexterm> + <primary>sgpdd-survey</primary> + </indexterm> sgpdd-survey + The sgpdd-survey utility tests 'bare metal' performance, + bypassing as much of the kernel as possible. The sgpdd-survey tool does + not require Lustre, but it does require the sgp_dd package. - The sgpdd_survey utility erases all data on the device. + The sgpdd-survey utility erases all data on the device.
- <indexterm><primary>obdfilter_survey</primary></indexterm>obdfilter_survey - The obdfilter_survey utility is a shell script that tests performance of isolated OSTS, the network via echo clients, and an end-to-end test. + <indexterm> + <primary>obdfilter-survey</primary> + </indexterm>obdfilter-survey + The obdfilter-survey utility is a shell script that tests + performance of isolated OSTS, the network via echo clients, and an end-to-end test.
<indexterm><primary>ior-survey</primary></indexterm>ior-survey The ior-survey utility is a script used to run the IOR benchmark. Lustre includes IOR version 2.8.6.
- <indexterm><primary>ost_survey</primary></indexterm>ost_survey - The ost_survey utility is an OST performance survey that tests client-to-disk performance of the individual OSTs in a Lustre file system. + <indexterm> + <primary>ost-survey</primary> + </indexterm>ost-survey + The ost-survey utility is an OST performance survey that tests + client-to-disk performance of the individual OSTs in a Lustre file system.
<indexterm><primary>stats-collect</primary></indexterm>stats-collect -- 1.8.3.1