lustre-iokit/sgpdd-survey/README

   1 WARNING: Running sgp_dd will ERASE the contents of the disk devices.
   2          This is NOT to be run on any OST where you care about any data
   3          or you are not expecting to reformat the filesystem afterward.
   4
   5 Requirements
   6 ------------
   7
   8 . sg3_utils (for sgp_dd)
   9   SCSI device
  10   Or, if using non-scsi disk
  11   raw device support
  12   sg3_utils
  13
  14
  15 Overview
  16 --------
  17
  18 This survey may be used to characterise the performance of a SCSI device.
  19 It simulates an OST serving multiple stripe files.  The data gathered by it
  20 can help set expectations for the performance of a lustre OST exporting the
  21 device.
  22
  23 The script uses sgp_dd to do raw sequential disk I/O.  It runs with
  24 variable numbers of sgp_dd threads to show how performance varies with
  25 different request queue depths.
  26
  27 The script spawns variable numbers of sgp_dd instances, each reading or
  28 writing a separate area of the disk to show how performance varies with the
  29 number of concurrent stripe files.
  30
  31 The device(s) used must meet one of two tests:
  32 SCSI device:
  33         Must appear in the output of 'sg_map'
  34 Raw device:
  35         Must appear in the output of 'raw -qa'
  36 You may not mix raw and SCSI devices in the test specification.
  37
  38
  39 Running
  40 -------
  41
  42 The script must be customised according to the particular device under test
  43 and where it should keep its working files.   Customisation variables are
  44 described clearly at the start of the script.
  45
  46 When the script runs, it creates a number of working files and a pair of
  47 result files.  All files start with the prefix given by ${rslt}.
  48
  49 ${rslt}_<date/time>.summary    same as stdout
  50 ${rslt}_<date/time>_*          tmp files
  51 ${rslt}_<date/time>.detail     collected tmp files for post-mortem
  52
  53 The summary file and stdout contain lines like...
  54
  55 total_size  8388608K rsz 1024 thr     1 crg   1  180.45 MB/s   1 x  180.50 =  180.50 MB/s
  56
  57 The number immediately before the first MB/s is the bandwidth computed by
  58 measuring total data and elapsed time.  The other numbers are a check on
  59 the bandwidths reported by the individual sgp_dd instances.
  60
  61 If there are so many threads that sgp_dd is unlikely to be able to allocate
  62 I/O buffers, "ENOMEM" is printed.
  63
  64 If not all the sgp_dd instances successfully reported a bandwidth number
  65 "failed" is printed.
  66
  67
  68 Visualising Results
  69 -------------------
  70
  71 I've found it most useful to import the summary data (it's fixed width)
  72 into Excel (or any graphing package) and graph bandwidth v. # threads for
  73 varying numbers of concurrent regions.  This shows how the device performs
  74 with varying queue depth.  If the series (varying numbers of concurrent
  75 regions) all seem to land on top of each other, it shows the device is
  76 phased by seeks at the given record size.
  77
  78
  79 The included script, parse.pl will process output files and create .csv files fo spreadsheet import
  80