lustre-iokit/obdfilter-survey/README

   1
   2 Overview
   3 --------
   4
   5 This survey script does sequential I/O with varying numbers of threads and
   6 objects (files) by using lctl::test_brw to drive the echo_client connected
   7 to local or remote obdfilter instances, or remote obdecho instances.
   8
   9 It can be used to characterise the performance of the following lustre
  10 components.
  11
  12 1. The Stripe F/S.
  13
  14    Here the script directly exercises one or more instances of obdfilter.
  15    They may be running on 1 or more nodes, e.g. when they are all attached
  16    to the same multi-ported disk subsystem.
  17
  18    You need to tell the script all the names of the obdfilter instances.
  19    These should be up and running already .  If some are on different
  20    nodes, you need to specify their hostnames too (e.g. node1:ost1).
  21
  22    All the obdfilter instances are driven directly.  The script
  23    automatically loads the obdecho module if required and creates one
  24    instance of echo_client for each obdfilter instance.
  25
  26 2. The Network.
  27
  28    Here the script drives one or more instances of obdecho via instances of
  29    echo_client running on 1 or more nodes.
  30
  31    You need to tell the script all the names of the echo_client instances.
  32    These should already be up and running.  If some are on different nodes,
  33    you need to specify their hostnames too (e.g. node1:ECHO_node1).
  34
  35 3. The Stripe F/S over the Network.
  36
  37    Here the script drives one or more instances of obdfilter via instances
  38    of echo_client running on 1 or more nodes.
  39
  40    As with (2), you need to tell the script all the names of the
  41    echo_client instances, which should already be up and running.
  42
  43 Note that the script is _NOT_ scalable to 100s of nodes since it is only
  44 intended to measure individual servers, not the scalability of the system
  45 as a whole.
  46
  47
  48 Running
  49 -------
  50
  51 The script must be customised according to the components under test and
  52 where it should keep its working files.  Customisation variables are
  53 described clearly at the start of the script.
  54
  55 If you are driving obdfilter instances directly, set the shell array
  56 variable 'ost_names' to the names of the obdfilter instances and leave
  57 'client_names' undefined.
  58
  59 If you are driving obdfilter or obdecho instances over the network, you
  60 must instantiate the echo_clients yourself using lmc/lconf.  Set the shell
  61 array variable 'client_names' to the names of the echo_client instances and
  62 leave 'ost_names' undefined.
  63
  64 You can optionally prefix any name in 'ost_names' or 'client_names' with
  65 the hostname that it is running on (e.g. remote_node:ost4) if your
  66 obdfilters or echo_clients are running on more than one node.  In this
  67 case, you need to ensure...
  68
  69 (a) 'custom_remote_shell()' works on your cluster
  70 (b) all pathnames you specify in the script are mounted on the node you
  71     start the survey from and all the remote nodes.
  72
  73 Use 'lctl device_list' to verify the obdfilter/echo_client instance names
  74 e.g...
  75
  76 [root@ns9 root]# lctl device_list
  77   0 UP confobd conf_ost3 OSD_ost3_ns9_UUID 1
  78   1 UP obdfilter ost3 ost3_UUID 1
  79   2 UP ost OSS OSS_UUID 1
  80   3 AT confobd conf_ost12 OSD_ost12_ns9_UUID 1
  81 [root@ns9 root]#
  82
  83 ...here device 1 is an instance of obdfilter called 'ost3'.  To exercise it
  84 directly, add 'ns9:ost3' to 'ost_names'.  If the script is only to be run
  85 on node 'ns9' you could simply add 'ost3' to 'ost_names'.
  86
  87 When the script runs, it creates a number of working files and a pair of
  88 result files.  All files start with the prefix given by ${rslt}.
  89
  90 ${rslt}.summary           same as stdout
  91 ${rslt}.script_*          per-host test script files
  92 ${rslt}.detail_tmp*       per-ost result files
  93 ${rslt}.detail            collected result files for post-mortem
  94
  95 The script iterates over the given numbers of threads and objects
  96 performing all the specified tests and checking that all test processes
  97 completed successfully.
  98
  99 Note that the script does NOT clean up properly if it is aborted or if it
 100 encounters an unrecoverable error.  In this case, manual cleanup may be
 101 required, possibly including killing any running instances of 'lctl' (local
 102 or remote), removing echo_client instances created by the script and
 103 unloading obdecho.
 104
 105
 106 Script output
 107 -------------
 108
 109 The summary file and stdout contain lines like...
 110
 111 ost 8 sz 67108864K rsz 1024 obj    8 thr    8 write  613.54 [ 64.00, 82.00]
 112
 113 ost 8          is the total number of OSTs under test.
 114 sz 67108864K   is the total amount of data read or written (in K).
 115 rsz 1024       is the record size (size of each echo_client I/O).
 116 obj    8       is the total number of objects over all OSTs
 117 thr    8       is the total number of threads over all OSTs and objects
 118 write          is the test name.  If more tests have been specified they
 119                all appear on the same line.
 120 613.54         is the aggregate bandwidth over all OSTs measured by
 121                dividing the total number of MB by the elapsed time.
 122 [64.00, 82.00] are the minimum and maximum instantaneous bandwidths seen on
 123                any individual OST.
 124
 125 Note that although the numbers of threads and objects are specifed per-OST
 126 in the customisation section of the script, results are reported aggregated
 127 over all OSTs.
 128
 129
 130 Visualising Results
 131 -------------------
 132
 133 I've found it most useful to import the summary data (it's fixed width)
 134 into Excel (or any graphing package) and graph bandwidth v. # threads for
 135 varying numbers of concurrent regions.  This shows how the OSS performs for
 136 a given number of concurrently accessed objects (i.e. files) with varying
 137 numbers of I/Os in flight.
 138
 139 It is also extremely useful to record average disk I/O sizes during each
 140 test.  These numbers help find pathologies in file the file system block
 141 allocator and the block device elevator.