From 381850eb0c756cabf11e3dc2dbc3d4d9fc9ab122 Mon Sep 17 00:00:00 2001 From: nathan Date: Thu, 25 Jan 2007 19:56:22 +0000 Subject: [PATCH] b=10960 r=nathan add a real readme --- lustre-iokit/ior-survey/README.ior-survey | 190 +++++++++++++++++++++++++++++- 1 file changed, 188 insertions(+), 2 deletions(-) diff --git a/lustre-iokit/ior-survey/README.ior-survey b/lustre-iokit/ior-survey/README.ior-survey index bc886e8..8d734da 100644 --- a/lustre-iokit/ior-survey/README.ior-survey +++ b/lustre-iokit/ior-survey/README.ior-survey @@ -1,4 +1,190 @@ +Introduction : + The ior_survey script can be used to test the performance of the lustre +file systems. It uses IOR (Interleaved Or Random), a script used for testing +performance of parallel file systems using various interfaces and access +patterns. IOR uses MPI for process synchronization. + +General Description: + + ior_mpiio is a parallel file system test developed by the SIOP (Scalable +I/O Project) at LLNL. This parallel program performs parallel writes and +reads to/from a file using MPI-IO and reporting the throughput rates. + + MPI is used for process synchronization. Under the control of compile-time +defined constants (and, to a lesser extent, environment variables), I/O is done +via MPI-IO. The data are written and read using independent parallel transfers +of equal-sized blocks of contiguous bytes that cover the file with no gaps and +that do not overlap each other. The test consists of creating a new file, +writing it with data, then reading the data back. + + The data written are C integers. If the program runs successfully to +completion, it returns 0. If a problem is detected with any I/O routine, the +program exits with a value of IO_ERR. + + If a non-I/O problem is detected, the program exits with a value of +INTERNAL_ERR (this can be caused by a bug in the test program, or a problem in +MPI, or by inconsistencies in the environment variable settings). + +Requirements : + To run the ior_survey script following items are required. + +1: IOR + + The IOR test should be obtained at + ftp://ftp.llnl.gov/pub/siop/ior/ + +2: pdsh + The tarball can be obtained from + http://sourceforge.net/project/showfiles.php?group_id=33530&package_id=183641 + +3: pdsh-rcmd-ssh module + The rpm for this could be found at + http://sourceforge.net/project/showfiles.php?group_id=33530&package_id=183641 + +4: lam/mpi + The tarball can be obtained from + http://www.lam-mpi.org/7.1/download.php + +5: You need to be a non-root user to execute the script and should have the + super-user privileges. + +6: The user should have login on all the nodes without password on which the + test is going to be run. + + + +To make an entry into the sudoers file : + +1: Become super user (root) + +2: type visudo + +3: make an entry as + username ALL=(ALL) NOPASSWD: ALL //(username is the name of the user) + + +Building IOR : + + Type 'gmake mpiio' from the IOR/ directory. In + IOR/src/C, the file Makefile.config currently has settings for AIX, Linux, + OSF1 (TRU64), and IRIX64 to model on. Note that MPI must be present for + building/running IOR, and that MPI I/O must be available for MPI I/O, HDF5, + and Parallel netCDF builds. As well, HDF5 and Parallel netCDF libraries are + necessary for those builds. All IOR builds include the POSIX interface. + + Copy the IOR binary file in IOR/src/C/ to /usr/local/sbin/ using + + sudo cp IOR/src/C/IOR /usr/local/sbin/ + + + +Installing pdsh and pdsh-rcmd-ssh module : + +1: Download the pdsh tarball + +2: untar it using tar -xzvf (if tar.gz) or tar -xjvf(if tar.bz2) + +3: go to the pdsh directory and type ./bootstrap + +4: configure it using the following command + + ./configure --with-ssh + +5: Build it using "make" + +6: Install it using "sudo make install" + +7: Download the pdsh-rcmd-ssh rpm + +8: Install the rpm using "rpm -ivh pdsh-rcmd-ssh*" + + +Installing lam/mpi : + +1: Download the lam tarball + +2: untar it using tar -xzvf (if tar.gz) or tar -xjvf(if tar.bz2) + +3: go to the lam directory and type ./configure + +4: Build it using "make" + +5: Install it using "sudo make install" + + The lam, IOR, pdsh should be installed on all the nodes on which the + test is going to be run. + +Note: Please make sure that you have installed the same version of lam on all +the nodes on which the test is going to be run. + + + +Running the ior_survey script : + +1: Lustre should be mounted at /mnt/lustre. Do + "touch /mnt/lustre/ior_survey_testfile" + +2: Make a hostfile in which the ip addresses of all the nodes are present on + the node from where the script is going to be executed. + +3: run the lam using "lamboot -v -d hostfile". This will start lamd on all the + nodes. + +4: run the ior_survey script using "./ior_survey" + +Note: + The node names of the clients should be like rhea1, rhea2, rhea3, so on. + The name of the cluster (1st part of the node name) should be set in the + ior_survey script in the cluster name field. + e.g. cluster=rhea //name of the cluster + + The client node numbers should be set as last part of the node name i.e. + numeral part. + e.g. client=(1) //to run test on one node only node1. + client=(1-2) //to run test on two nodes node1, node2. + + Please note that the hostfile should contain the ip addresses of only + those nodes on which the lustre filesystem is mounted i.e. clients are + mounted. + + The details of the test can be found on the node from where the + test was run as /tmp/ior_survey_run_date@start_time_nodename.detail + + The output of the IOR looks like + +host1: access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) iter +host1: ------ --------- ---------- --------- -------- -------- -------- ---- +host1: write 1.58 2097152 1024.00 0.000873 1299.37 0.000132 0 +host1: +host1: Max Write: 1.58 MiB/sec (1.65 MB/sec) + + where, + host1 : node on which the test is run + access: the test which is run (write, rewrite, read, reread) + bw : band width + block : total size to be written + xfer : block size to transfer here 1MB + open : time taken for open + close : time taken for close + wr/rd : time taken for read/write + iteration : iteration no. + Max write : Max_write speed obtained + +Note : MB is defined as 1,000,000 bytes and MiB is 1,048,576 bytes. + + The summary of the test can be found on the node from where the + test was run as /tmp/ior_survey_run_date@start_time_nodename.summary + It contains the tests run and the status of those tests. + + +Instructions for graphing IOR results + + The plot-ior.pl script will plot the results from the .detail file + generated by ior-survery. It will create a data file for writes as + /tmp/ior_survey_run_date@start_time_nodename.detail.dat1 and for reads + as /tmp/ior_survey_run_date@start_time_nodename.detail.dat2 and gnuplot + file as /tmp/ior_survey_run_date@start_time_nodename.detail.scr. + + $ perl parse-ior.pl /tmp/ior_survey_run_date@start_time_nodename.detail -The IOR test should be obtained at -ftp://ftp.llnl.gov/pub/siop/ior/ -- 1.8.3.1