Overview
--------

This survey script does sequential I/O with varying numbers of threads and
objects (files) by using lctl::test_brw to drive the echo_client connected
to local or remote obdfilter instances, or remote obdecho instances.

It can be used to characterise the performance of the following lustre
components.

1. The Stripe F/S.

   Here the script directly exercises one or more instances of obdfilter.
   They may be running on 1 or more nodes, e.g. when they are all attached
   to the same multi-ported disk subsystem.

   You need to tell the script all the names of the obdfilter instances.
   These should be up and running already .  If some are on different
   nodes, you need to specify their hostnames too (e.g. node1:ost1).

   All the obdfilter instances are driven directly.  The script
   automatically loads the obdecho module if required and creates one
   instance of echo_client for each obdfilter instance.

2. The Network.

   Here the script drives one or more instances of obdecho via instances of
   echo_client running on 1 or more nodes.

   You need to tell the script all the names of the echo_client instances.
   These should already be up and running.  If some are on different nodes,
   you need to specify their hostnames too (e.g. node1:ECHO_node1).
   
3. The Stripe F/S over the Network.

   Here the script drives one or more instances of obdfilter via instances
   of echo_client running on 1 or more nodes.

   As with (2), you need to tell the script all the names of the
   echo_client instances, which should already be up and running.

Note that the script is _NOT_ scalable to 100s of nodes since it is only
intended to measure individual servers, not the scalability of the system
as a whole.

   
Running
-------

The script must be customised according to the components under test and
where it should keep its working files.  Customisation variables are
described clearly at the start of the script.

To run against a local disk:
---------------------------

- Create a Lustre configuraton shell script and XML using your normal
methods
	- You do not need to specify and MDS or LOV 
	- List all OSTs that you wish to test

- On all OSS machines:
  # lconf --refomat <XML file> - Remember, write tests are
destructive! This test should be run prior to startup of your actual
Lustre filesystem. If that is the case, you will not need to reformat
to restart Lustre - however, if the test is terminated before
completion, you may have to remove objects from the disk. 

- Determine the obdfilter instance names on all the clients, column 4
of 'lctl dl'.  For example:

# pdsh -w oss[01-02] lctl dl |grep obdfilter |sort
oss01:   0 UP obdfilter oss01-sdb oss01-sdb_UUID 3
oss01:   2 UP obdfilter oss01-sdd oss01-sdd_UUID 3
oss02:   0 UP obdfilter oss02-sdi oss02-sdi_UUID 3
...

Here the obdfilter instance names are oss01-sdb, oss01-sdd, oss02-sdi.

Since you are driving obdfilter instances directly, set the shell array
variable 'ost_names' to the names of the obdfilter instances and leave
'client_names' undefined.
Example:

ost_names_str='oss01:oss01-sdb oss01:oss01-sdd oss02:oss02-sdi' \
   ./obdfilter-survey

To run against a network:
------------------------

If you are driving obdfilter or obdecho instances over the network, you
must instantiate the echo_clients yourself using lmc/lconf.  Set the shell
array variable 'client_names' to the names of the echo_client instances and
leave 'ost_names' undefined.

You can optionally prefix any name in 'ost_names' or 'client_names' with
the hostname that it is running on (e.g. remote_node:ost4) if your
obdfilters or echo_clients are running on more than one node.  In this
case, you need to ensure...

(a) 'custom_remote_shell()' works on your cluster
(b) all pathnames you specify in the script are mounted on the node you
    start the survey from and all the remote nodes.
(c) obdfilter-survey must be installed on the clients, in the same
 location as on the master node.

- First, bring up obdecho instances on the servers and echo_client instances
on the clients:
   - run the included echo.sh on a node that has Lustre installed.  
	 -shell variables:
	 - SERVERS: Set this to a list of server hostnames, or `hostname` of
	   the current node will be used.  This may be the wrong interface, so
	   check it.  NOTE: echo.sh could probably be smarter about this...

	   - NETS: set this if you are using a network type other than
tcp.
    - example: SERVERS=oss01-eth2 sh echo.sh

- On the servers start the obdecho server and verify that it is up:

# lconf --node (hostname) /(path)/echo.xml
# lctl dl
  0 UP obdecho ost_oss01.local ost_oss01.local_UUID 3
  1 UP ost OSS OSS_UUID 3

- On the clients start the other side of the echo connection:

# lconf --node client /(path)/echo.xml
# lctl dl
  0 UP osc OSC_xfer01.local_ost_oss01.local_ECHO_client 6bc9b_ECHO_client_2a8a2cb3dd 5
  1 UP echo_client ECHO_client 6bc9b_ECHO_client_2a8a2cb3dd 3

- verify connectivity from a client:
 - lctl ping SERVER_NID

- Run the script on the master node, specifying the client names in an
environment variable

Example:
# client_names_str='xfer01:ECHO_client xfer02:ECHO_client
xfer03:ECHO_client xfer04:ECHO_client xfer05:ECHO_client
xfer06:ECHO_client xfer07:ECHO_client xfer08:ECHO_client
xfer09:ECHO_client xfer10:ECHO_client xfer11:ECHO_client
xfer12:ECHO_client' ./obdfilter-survey


- When done: cleanup echo_client/obdecho instances:
       - on clients: lconf --cleanup --node client /(path)/echo.xml
       - on server(s): lconf --cleanup --node (hostname) /(path)/echo.xml

- When aborting: killall vmstat on clients:

pdsh -w (clients) killall vmstat

Use 'lctl device_list' to verify the obdfilter/echo_client instance names
e.g...

When the script runs, it creates a number of working files and a pair of
result files.  All files start with the prefix given by ${rslt}.

${rslt}.summary           same as stdout
${rslt}.script_*          per-host test script files
${rslt}.detail_tmp*       per-ost result files
${rslt}.detail            collected result files for post-mortem

The script iterates over the given numbers of threads and objects
performing all the specified tests and checking that all test processes
completed successfully.

Note that the script does NOT clean up properly if it is aborted or if it
encounters an unrecoverable error.  In this case, manual cleanup may be
required, possibly including killing any running instances of 'lctl' (local
or remote), removing echo_client instances created by the script and
unloading obdecho.


Script output
-------------

The summary file and stdout contain lines like...

ost 8 sz 67108864K rsz 1024 obj    8 thr    8 write  613.54 [ 64.00, 82.00] 

ost 8          is the total number of OSTs under test.
sz 67108864K   is the total amount of data read or written (in K).
rsz 1024       is the record size (size of each echo_client I/O).
obj    8       is the total number of objects over all OSTs
thr    8       is the total number of threads over all OSTs and objects
write          is the test name.  If more tests have been specified they
               all appear on the same line.
613.54         is the aggregate bandwidth over all OSTs measured by
	       dividing the total number of MB by the elapsed time.
[64.00, 82.00] are the minimum and maximum instantaneous bandwidths seen on
               any individual OST.  

Note that although the numbers of threads and objects are specifed per-OST
in the customisation section of the script, results are reported aggregated
over all OSTs.


Visualising Results
-------------------

I've found it most useful to import the summary data (it's fixed width)
into Excel (or any graphing package) and graph bandwidth v. # threads for
varying numbers of concurrent regions.  This shows how the OSS performs for
a given number of concurrently accessed objects (i.e. files) with varying
numbers of I/Os in flight.  

It is also extremely useful to record average disk I/O sizes during each
test.  These numbers help find pathologies in file the file system block
allocator and the block device elevator.

The included obparse.pl script is an example of processing the output files to
a .csv format.