lustre-iokit/ior-survey/README.ior-survey

   1 Introduction :
   2
   3   The ior_survey script can be used to test the performance of the lustre
   4 file systems. It uses IOR (Interleaved Or Random), a script used for testing
   5 performance of parallel file systems using various interfaces and access
   6 patterns.  IOR uses MPI for process synchronization.
   7
   8 General Description:
   9
  10   ior_mpiio is a parallel file system test developed by the SIOP (Scalable
  11 I/O Project) at LLNL. This parallel program performs parallel writes and
  12 reads to/from a file using MPI-IO and reporting the throughput rates.
  13
  14   MPI is used for process synchronization.  Under the control of compile-time
  15 defined constants (and, to a lesser extent, environment variables), I/O is done
  16 via MPI-IO. The data are written and read using independent parallel transfers
  17 of equal-sized blocks of contiguous bytes that cover the file with no gaps and
  18 that do not overlap each other. The test consists of creating a new file,
  19 writing it with data, then reading the data back.
  20
  21   The data written are C integers. If the program runs successfully to
  22 completion, it returns 0. If a problem is detected with any I/O routine, the
  23 program exits with a value of IO_ERR.
  24
  25   If a non-I/O problem is detected, the program exits with a value of
  26 INTERNAL_ERR (this can be caused by a bug in the test program, or a problem in
  27 MPI, or by inconsistencies in the environment variable settings).
  28
  29 Requirements :
  30         To run the ior_survey script following items are required.
  31
  32 1: IOR
  33
  34   The IOR test should be obtained at
  35   ftp://ftp.llnl.gov/pub/siop/ior/
  36
  37 2: pdsh
  38         The tarball can be obtained from
  39    http://sourceforge.net/project/showfiles.php?group_id=33530&package_id=183641
  40
  41 3: pdsh-rcmd-ssh module
  42         The rpm for this could be found at
  43    http://sourceforge.net/project/showfiles.php?group_id=33530&package_id=183641
  44
  45 4: lam/mpi
  46         The tarball can be obtained from
  47    http://www.lam-mpi.org/7.1/download.php
  48
  49 5: You need to be a non-root user to execute the script and should have the
  50    super-user privileges.
  51
  52 6: The user should have login on all the nodes without password on which the
  53    test is going to be run.
  54
  55
  56
  57 To make an entry into the sudoers file :
  58
  59 1: Become super user (root)
  60
  61 2: type visudo
  62
  63 3: make an entry as
  64         username   ALL=(ALL) NOPASSWD: ALL //(username is the name of the user)
  65
  66
  67 Building IOR :
  68
  69   Type 'gmake mpiio' from the IOR/ directory.  In
  70   IOR/src/C, the file Makefile.config currently has settings for AIX, Linux,
  71   OSF1 (TRU64), and IRIX64 to model on.  Note that MPI must be present for
  72   building/running IOR, and that MPI I/O must be available for MPI I/O, HDF5,
  73   and Parallel netCDF builds.  As well, HDF5 and Parallel netCDF libraries are
  74   necessary for those builds.  All IOR builds include the POSIX interface.
  75
  76   Copy the IOR binary file in IOR/src/C/ to /usr/local/sbin/ using
  77
  78         sudo cp IOR/src/C/IOR /usr/local/sbin/
  79
  80
  81
  82 Installing pdsh and pdsh-rcmd-ssh module :
  83
  84 1: Download the pdsh tarball
  85
  86 2: untar it using tar -xzvf (if tar.gz) or tar -xjvf(if tar.bz2)
  87
  88 3: go to the pdsh directory and type ./bootstrap
  89
  90 4: configure it using the following command
  91
  92         ./configure --with-ssh
  93
  94 5: Build it using "make"
  95
  96 6: Install it using "sudo make install"
  97
  98 7: Download the pdsh-rcmd-ssh rpm
  99
 100 8: Install the rpm using "rpm -ivh pdsh-rcmd-ssh*"
 101
 102
 103 Installing lam/mpi :
 104
 105 1: Download the lam tarball
 106
 107 2: untar it using tar -xzvf (if tar.gz) or tar -xjvf(if tar.bz2)
 108
 109 3: go to the lam directory and type ./configure
 110
 111 4: Build it using "make"
 112
 113 5: Install it using "sudo make install"
 114
 115         The lam, IOR, pdsh should be installed on all the nodes on which the
 116         test is going to be run.
 117
 118 Note: Please make sure that you have installed the same version of lam on all
 119 the nodes on which the test is going to be run.
 120
 121
 122
 123 Running the ior_survey script :
 124
 125 1: Lustre should be mounted at /mnt/lustre. Do
 126         "touch /mnt/lustre/ior_survey_testfile"
 127
 128 2: Make a hostfile in which the ip addresses of all the nodes are present on
 129    the node from where the script is going to be executed.
 130
 131 3: run the lam using "lamboot -v -d hostfile". This will start lamd on all the
 132    nodes.
 133
 134 4: run the ior_survey script using "./ior_survey"
 135
 136 Note:
 137      The node names of the clients should be like rhea1, rhea2, rhea3, so on.
 138    The name of the cluster (1st part of the node name) should be set in the
 139    ior_survey script in the cluster name field.
 140    e.g.  cluster=rhea //name of the cluster
 141
 142      The client node numbers should be set as last part of the node name i.e.
 143    numeral part.
 144    e.g. client=(1)   //to run test on one node only node1.
 145         client=(1-2) //to run test on two nodes node1, node2.
 146
 147         Please note that the hostfile should contain the ip addresses of only
 148    those nodes on which the lustre filesystem is mounted i.e. clients are
 149    mounted.
 150
 151         The details of the test can be found on the node from where the
 152    test was run as /tmp/ior_survey_run_date@start_time_nodename.detail
 153
 154         The output of the IOR looks like
 155
 156 host1: access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   iter
 157 host1: ------    ---------  ---------- ---------  --------   --------   --------   ----
 158 host1: write     1.58       2097152    1024.00    0.000873   1299.37    0.000132   0
 159 host1:
 160 host1: Max Write: 1.58 MiB/sec (1.65 MB/sec)
 161
 162         where,
 163                 host1 : node on which the test is run
 164                 access: the test which is run (write, rewrite, read, reread)
 165                 bw    : band width
 166                 block : total size to be written
 167                 xfer  : block size to transfer here 1MB
 168                 open  : time taken for open
 169                 close : time taken for close
 170                 wr/rd : time taken for read/write
 171                 iteration : iteration no.
 172                 Max write : Max_write speed obtained
 173
 174 Note : MB is defined as 1,000,000 bytes and MiB is 1,048,576 bytes.
 175
 176         The summary of the test can be found on the node from where the
 177    test was run as /tmp/ior_survey_run_date@start_time_nodename.summary
 178    It contains the tests run and the status of those tests.
 179
 180
 181 Instructions for graphing IOR results
 182
 183         The plot-ior.pl script will plot the results from the .detail file
 184    generated by ior-survery. It will create a data file for writes as
 185    /tmp/ior_survey_run_date@start_time_nodename.detail.dat1 and for reads
 186    as /tmp/ior_survey_run_date@start_time_nodename.detail.dat2 and gnuplot
 187    file as /tmp/ior_survey_run_date@start_time_nodename.detail.scr.
 188
 189         $ perl parse-ior.pl /tmp/ior_survey_run_date@start_time_nodename.detail
 190