lustre-iokit/obdfilter-survey/README.obdfilter-survey

   1
   2 Overview
   3 --------
   4
   5 This survey script does sequential I/O with varying numbers of threads and
   6 objects (files) by using lctl::test_brw to drive the echo_client connected
   7 to local or remote obdfilter instances, or remote obdecho instances.
   8
   9 It can be used to characterise the performance of the following lustre
  10 components.
  11
  12 1. The Stripe F/S.
  13
  14    Here the script directly exercises one or more instances of obdfilter.
  15    They may be running on 1 or more nodes, e.g. when they are all attached
  16    to the same multi-ported disk subsystem.
  17
  18    You need to tell the script all the names of the obdfilter instances.
  19    These should be up and running already .  If some are on different
  20    nodes, you need to specify their hostnames too (e.g. node1:ost1).
  21
  22    All the obdfilter instances are driven directly.  The script
  23    automatically loads the obdecho module if required and creates one
  24    instance of echo_client for each obdfilter instance.
  25
  26 2. The Network.
  27
  28    Here the script drives one or more instances of obdecho via instances of
  29    echo_client running on 1 or more nodes.
  30
  31    You need to tell the script all the names of the echo_client instances.
  32    These should already be up and running.  If some are on different nodes,
  33    you need to specify their hostnames too (e.g. node1:ECHO_node1).
  34
  35 3. The Stripe F/S over the Network.
  36
  37    Here the script drives one or more instances of obdfilter via instances
  38    of echo_client running on 1 or more nodes.
  39
  40    As with (2), you need to tell the script all the names of the
  41    echo_client instances, which should already be up and running.
  42
  43 Note that the script is _NOT_ scalable to 100s of nodes since it is only
  44 intended to measure individual servers, not the scalability of the system
  45 as a whole.
  46
  47
  48 Running
  49 -------
  50
  51 The script must be customised according to the components under test and
  52 where it should keep its working files.  Customisation variables are
  53 described clearly at the start of the script.
  54
  55 To run against a local disk:
  56 ---------------------------
  57
  58 - Create a Lustre configuraton shell script and XML using your normal
  59 methods
  60         - You do not need to specify and MDS or LOV
  61         - List all OSTs that you wish to test
  62
  63 - On all OSS machines:
  64   # lconf --refomat <XML file> - Remember, write tests are
  65 destructive! This test should be run prior to startup of your actual
  66 Lustre filesystem. If that is the case, you will not need to reformat
  67 to restart Lustre - however, if the test is terminated before
  68 completion, you may have to remove objects from the disk.
  69
  70 - Determine the obdfilter instance names on all the clients, column 4
  71 of 'lctl dl'.  For example:
  72
  73 # pdsh -w oss[01-02] lctl dl |grep obdfilter |sort
  74 oss01:   0 UP obdfilter oss01-sdb oss01-sdb_UUID 3
  75 oss01:   2 UP obdfilter oss01-sdd oss01-sdd_UUID 3
  76 oss02:   0 UP obdfilter oss02-sdi oss02-sdi_UUID 3
  77 ...
  78
  79 Here the obdfilter instance names are oss01-sdb, oss01-sdd, oss02-sdi.
  80
  81 Since you are driving obdfilter instances directly, set the shell array
  82 variable 'ost_names' to the names of the obdfilter instances and leave
  83 'client_names' undefined.
  84 Example:
  85
  86 ost_names_str='oss01:oss01-sdb oss01:oss01-sdd oss02:oss02-sdi' \
  87    ./obdfilter-survey
  88
  89 To run against a network:
  90 ------------------------
  91
  92 If you are driving obdfilter or obdecho instances over the network, you
  93 must instantiate the echo_clients yourself using lmc/lconf.  Set the shell
  94 array variable 'client_names' to the names of the echo_client instances and
  95 leave 'ost_names' undefined.
  96
  97 You can optionally prefix any name in 'ost_names' or 'client_names' with
  98 the hostname that it is running on (e.g. remote_node:ost4) if your
  99 obdfilters or echo_clients are running on more than one node.  In this
 100 case, you need to ensure...
 101
 102 (a) 'custom_remote_shell()' works on your cluster
 103 (b) all pathnames you specify in the script are mounted on the node you
 104     start the survey from and all the remote nodes.
 105 (c) obdfilter-survey must be installed on the clients, in the same
 106  location as on the master node.
 107
 108 - First, bring up obdecho instances on the servers and echo_client instances
 109 on the clients:
 110    - run the included create-echoclient on a node that has Lustre installed.
 111          -shell variables:
 112          - SERVERS: Set this to a list of server hostnames, or `hostname` of
 113            the current node will be used.  This may be the wrong interface, so
 114            check it.  NOTE: create-echoclient could probably be smarter about this...
 115
 116            - NETS: set this if you are using a network type other than
 117 tcp.
 118     - example: SERVERS=oss01-eth2 sh create-echoclient
 119
 120 - On the servers start the obdecho server and verify that it is up:
 121
 122 # lconf --node (hostname) /(path)/echo.xml
 123 # lctl dl
 124   0 UP obdecho ost_oss01.local ost_oss01.local_UUID 3
 125   1 UP ost OSS OSS_UUID 3
 126
 127 - On the clients start the other side of the echo connection:
 128
 129 # lconf --node client /(path)/echo.xml
 130 # lctl dl
 131   0 UP osc OSC_xfer01.local_ost_oss01.local_ECHO_client 6bc9b_ECHO_client_2a8a2cb3dd 5
 132   1 UP echo_client ECHO_client 6bc9b_ECHO_client_2a8a2cb3dd 3
 133
 134 - verify connectivity from a client:
 135  - lctl ping SERVER_NID
 136
 137 - Run the script on the master node, specifying the client names in an
 138 environment variable
 139
 140 Example:
 141 # client_names_str='xfer01:ECHO_client xfer02:ECHO_client
 142 xfer03:ECHO_client xfer04:ECHO_client xfer05:ECHO_client
 143 xfer06:ECHO_client xfer07:ECHO_client xfer08:ECHO_client
 144 xfer09:ECHO_client xfer10:ECHO_client xfer11:ECHO_client
 145 xfer12:ECHO_client' ./obdfilter-survey
 146
 147
 148 - When done: cleanup echo_client/obdecho instances:
 149        - on clients: lconf --cleanup --node client /(path)/echo.xml
 150        - on server(s): lconf --cleanup --node (hostname) /(path)/echo.xml
 151
 152 - When aborting: killall vmstat on clients:
 153
 154 pdsh -w (clients) killall vmstat
 155
 156 Use 'lctl device_list' to verify the obdfilter/echo_client instance names
 157 e.g...
 158
 159 When the script runs, it creates a number of working files and a pair of
 160 result files.  All files start with the prefix given by ${rslt}.
 161
 162 ${rslt}.summary           same as stdout
 163 ${rslt}.script_*          per-host test script files
 164 ${rslt}.detail_tmp*       per-ost result files
 165 ${rslt}.detail            collected result files for post-mortem
 166
 167 The script iterates over the given numbers of threads and objects
 168 performing all the specified tests and checking that all test processes
 169 completed successfully.
 170
 171 Note that the script does NOT clean up properly if it is aborted or if it
 172 encounters an unrecoverable error.  In this case, manual cleanup may be
 173 required, possibly including killing any running instances of 'lctl' (local
 174 or remote), removing echo_client instances created by the script and
 175 unloading obdecho.
 176
 177
 178 Script output
 179 -------------
 180
 181 The summary file and stdout contain lines like...
 182
 183 ost 8 sz 67108864K rsz 1024 obj    8 thr    8 write  613.54 [ 64.00, 82.00]
 184
 185 ost 8          is the total number of OSTs under test.
 186 sz 67108864K   is the total amount of data read or written (in K).
 187 rsz 1024       is the record size (size of each echo_client I/O).
 188 obj    8       is the total number of objects over all OSTs
 189 thr    8       is the total number of threads over all OSTs and objects
 190 write          is the test name.  If more tests have been specified they
 191                all appear on the same line.
 192 613.54         is the aggregate bandwidth over all OSTs measured by
 193                dividing the total number of MB by the elapsed time.
 194 [64.00, 82.00] are the minimum and maximum instantaneous bandwidths seen on
 195                any individual OST.
 196
 197 Note that although the numbers of threads and objects are specifed per-OST
 198 in the customisation section of the script, results are reported aggregated
 199 over all OSTs.
 200
 201
 202 Visualising Results
 203 -------------------
 204
 205 I've found it most useful to import the summary data (it's fixed width)
 206 into Excel (or any graphing package) and graph bandwidth v. # threads for
 207 varying numbers of concurrent regions.  This shows how the OSS performs for
 208 a given number of concurrently accessed objects (i.e. files) with varying
 209 numbers of I/Os in flight.
 210
 211 It is also extremely useful to record average disk I/O sizes during each
 212 test.  These numbers help find pathologies in file the file system block
 213 allocator and the block device elevator.
 214
 215 The included obparse.pl script is an example of processing the output files to
 216 a .csv format.