Overview -------- This survey script does sequential I/O with varying numbers of threads and objects (files) by using lctl::test_brw to drive the echo_client connected to local or remote obdfilter instances, or remote obdecho instances. It can be used to characterise the performance of the following lustre components. 1. The Stripe F/S. Here the script directly exercises one or more instances of obdfilter. They may be running on 1 or more nodes, e.g. when they are all attached to the same multi-ported disk subsystem. You need to tell the script all the names of the obdfilter instances. These should be up and running already . If some are on different nodes, you need to specify their hostnames too (e.g. node1:ost1). All the obdfilter instances are driven directly. The script automatically loads the obdecho module if required and creates one instance of echo_client for each obdfilter instance. 2. The Network. Here the script drives one or more instances of obdecho via instances of echo_client running on 1 or more nodes. You need to tell the script all the names of the echo_client instances. These should already be up and running. If some are on different nodes, you need to specify their hostnames too (e.g. node1:ECHO_node1). 3. The Stripe F/S over the Network. Here the script drives one or more instances of obdfilter via instances of echo_client running on 1 or more nodes. As with (2), you need to tell the script all the names of the echo_client instances, which should already be up and running. Note that the script is _NOT_ scalable to 100s of nodes since it is only intended to measure individual servers, not the scalability of the system as a whole. Running ------- The script must be customised according to the components under test and where it should keep its working files. Customisation variables are described clearly at the start of the script. If you are driving obdfilter instances directly, set the shell array variable 'ost_names' to the names of the obdfilter instances and leave 'client_names' undefined. If you are driving obdfilter or obdecho instances over the network, you must instantiate the echo_clients yourself using lmc/lconf. Set the shell array variable 'client_names' to the names of the echo_client instances and leave 'ost_names' undefined. You can optionally prefix any name in 'ost_names' or 'client_names' with the hostname that it is running on (e.g. remote_node:ost4) if your obdfilters or echo_clients are running on more than one node. In this case, you need to ensure... (a) 'custom_remote_shell()' works on your cluster (b) all pathnames you specify in the script are mounted on the node you start the survey from and all the remote nodes. Use 'lctl device_list' to verify the obdfilter/echo_client instance names e.g... [root@ns9 root]# lctl device_list 0 UP confobd conf_ost3 OSD_ost3_ns9_UUID 1 1 UP obdfilter ost3 ost3_UUID 1 2 UP ost OSS OSS_UUID 1 3 AT confobd conf_ost12 OSD_ost12_ns9_UUID 1 [root@ns9 root]# ...here device 1 is an instance of obdfilter called 'ost3'. To exercise it directly, add 'ns9:ost3' to 'ost_names'. If the script is only to be run on node 'ns9' you could simply add 'ost3' to 'ost_names'. When the script runs, it creates a number of working files and a pair of result files. All files start with the prefix given by ${rslt}. ${rslt}.summary same as stdout ${rslt}.script_* per-host test script files ${rslt}.detail_tmp* per-ost result files ${rslt}.detail collected result files for post-mortem The script iterates over the given numbers of threads and objects performing all the specified tests and checking that all test processes completed successfully. Note that the script does NOT clean up properly if it is aborted or if it encounters an unrecoverable error. In this case, manual cleanup may be required, possibly including killing any running instances of 'lctl' (local or remote), removing echo_client instances created by the script and unloading obdecho. Script output ------------- The summary file and stdout contain lines like... ost 8 sz 67108864K rsz 1024 obj 8 thr 8 write 613.54 [ 64.00, 82.00] ost 8 is the total number of OSTs under test. sz 67108864K is the total amount of data read or written (in K). rsz 1024 is the record size (size of each echo_client I/O). obj 8 is the total number of objects over all OSTs thr 8 is the total number of threads over all OSTs and objects write is the test name. If more tests have been specified they all appear on the same line. 613.54 is the aggregate bandwidth over all OSTs measured by dividing the total number of MB by the elapsed time. [64.00, 82.00] are the minimum and maximum instantaneous bandwidths seen on any individual OST. Note that although the numbers of threads and objects are specifed per-OST in the customisation section of the script, results are reported aggregated over all OSTs. Visualising Results ------------------- I've found it most useful to import the summary data (it's fixed width) into Excel (or any graphing package) and graph bandwidth v. # threads for varying numbers of concurrent regions. This shows how the OSS performs for a given number of concurrently accessed objects (i.e. files) with varying numbers of I/Os in flight. It is also extremely useful to record average disk I/O sizes during each test. These numbers help find pathologies in file the file system block allocator and the block device elevator.