A set of scripts to collect and analyze RPC traces in a Lustre file system.
These scripts turn the Lustre debug-daemon on and off, before and after
launching a job respectively. They also allow users to parse and visualize
the logged RPC traces.
Test-Parameters: trivial
Change-Id: I6ee598ff6e49b2a7406354172b10bd295e801cb0
Signed-off-by: CongXu <cong.xu@intel.com>
Reviewed-on: https://review.whamcloud.com/25395
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Omkar Kulkarni <omkar.kulkarni@intel.com>
--- /dev/null
+# This file is provided under a dual BSD/GPLv2 license. When using or
+# redistributing this file, you may do so under either license.
+#
+# GPL LICENSE SUMMARY
+#
+# Copyright(c) 2016 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of version 2 of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# Contact Information:
+# Cong Xu, cong.xu@intel.com
+#
+# BSD LICENSE
+#
+# Copyright(c) 2016 Intel Corporation.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel Corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+
+1. Introduction to launcher.sh
+ This LIOProf logging services script is used to record detailed I/O Tracing
+information carried out on Lustre OSS nodes, it requires superuser privilege.
+
+[Input]
+(1) Input usage information
+ Usage: launcher.sh [-a] [-d] [-l] [-h] [-m] [-n] [-o] [-u]
+ -a command to launch application
+ -d shared nfs directory to store LIOProf logs
+ -l lowest Lustre OSS node [Hostname]
+ -h highest Lustre OSS node [Hostname]
+ -m lowest Lustre Client [Hostname]
+ -n highest Lustre Client [Hostname]
+ -o use Obdfilter-survey to measure Lustre bandwidth
+ -u user name
+
+(2) Input example
+ a. Launch application
+ # su
+ # launcher.sh -l wolf-33 -h wolf-36 -m wolf-38 -n wolf-41 -u USER_NAME \
+ # -d /home/USER_NAME/lioprof_home -a "mpirun -np 4 hostname"
+ b. Launch Obdfilter-survey to measure Lustre bandwidth
+ # su
+ # launcher.sh -l wolf-33 -h wolf-36 -u USER_NAME \
+ # -d /home/USER_NAME/lioprof_home -o
+
+[Output]
+(1) Output location
+ All the outputs locate in the directory configured by '-d' argument. In the
+above example, logs are stored in /home/USER_NAME/lioprof_home directory.
+
+(2) Application output information
+ a. job-output: Output of the job.
+ b. brw: Disk I/O sizes
+ c. iostat: Disks bandwidth, CPU utilization
+ d. rpc: Lustre rpc tracing information
+
+(3) Obdfilter-survey output information
+ a. obdfilter: Lustre OST bandwidth
+ *Note: OBDfilter-survey will be running in the background, it will take
+some time to finish the measurement. Need to check the status until you see "done!"
+in the output of obdfilter.
+
+2. Introduction to parser.sh
+ This LIOProf RPC Parser is used to parse rpc logs collected from OSS nodes,
+it can be run as a normal user (non-admin).
+
+[Input]
+(1) Input usage information
+ Usage: parser.sh [-i] [-x] [-y] [-t]
+ -i path to LIOProf rpc tracing logs
+ -x lowest Lustre Client [IB IP Address]
+ -y highest Lustre Client [IB IP Address]
+ -t Type of Operation Code (OPC) (OST_READ 3, OST_WRITE 4)
+
+(2) Input example
+ # parser.sh -i /home/USER_NAME/lioprof_home/job-1468618740/rpc \
+ # -x 192.168.1.38 -y 192.168.1.41 -t 3
+
+[Output]
+(1) Output location
+ Output is stored in the same directory as input rpc log. In the above
+example, output locates in /home/USER_NAME/lioprof_home/job-1468618740/rpc-out
+
+(2) Output information
+ Count I/O requests per second.
--- /dev/null
+#!/bin/bash
+#
+# This file is provided under a dual BSD/GPLv2 license. When using or
+# redistributing this file, you may do so under either license.
+#
+# GPL LICENSE SUMMARY
+#
+# Copyright(c) 2016 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of version 2 of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# Contact Information:
+# Cong Xu, cong.xu@intel.com
+#
+# BSD LICENSE
+#
+# Copyright(c) 2016 Intel Corporation.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel Corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+
+function usage() {
+ cat << EOF
+Usage: $0 [-a] [-d] [-l] [-h] [-m] [-n] [-o] [-u]
+ -a command to launch application
+ -d shared nfs directory to store LIOProf logs
+ -l lowest Lustre OSS node [Hostname]
+ -h highest Lustre OSS node [Hostname]
+ -m lowest Lustre Client [Hostname]
+ -n highest Lustre Client [Hostname]
+ -o use Obdfilter-survey to measure Lustre bandwidth
+ -u user name
+EOF
+ exit 0
+}
+
+
+while getopts ":a:d:l:h:m:n:ou:" arg; do
+ case "${arg}" in
+ a)
+ a=${OPTARG};;
+ d)
+ d=${OPTARG};;
+ l)
+ l=${OPTARG};;
+ h)
+ h=${OPTARG};;
+ m)
+ m=${OPTARG};;
+ n)
+ n=${OPTARG};;
+ o)
+ o="Obdfilter-survey";;
+ u)
+ u=${OPTARG};;
+ *)
+ usage;;
+ esac
+done
+shift $((OPTIND-1))
+
+if [ -n "${o}" ]; then
+ # Launch OBDfilter-survey to measure Lustre bandwidth
+ if [ -n "${a}" ] || [ -z "${d}" ] || [ -z "${l}" ] || [ -z "${h}" ] \
+ || [ -z "${u}" ]; then
+ usage
+ fi
+else
+ # Launch application
+ if [ -z "${a}" ] || [ -z "${d}" ] || [ -z "${l}" ] || [ -z "${h}" ] \
+ || [ -z "${m}" ] || [ -z "${n}" ] || [ -z "${u}" ]; then
+ usage
+ fi
+fi
+
+
+# Cluster Name
+cluster_name=$(cut -d- -f1 <<<"${l}")
+
+# Lustre OSS Nodes
+OSS_MIN=$(cut -d- -f2 <<<"${l}")
+OSS_MAX=$(cut -d- -f2 <<<"${h}")
+
+# Lustre Clients
+CLIENT_MIN=$(cut -d- -f2 <<<"${m}")
+CLIENT_MAX=$(cut -d- -f2 <<<"${n}")
+
+# Input user name
+USER_NAME=${u}
+
+# Commands information
+mpi_cmd=mpirun
+pdsh_cmd=/usr/bin/pdsh
+
+# Job ID (Based on job time)
+job_id=job-`date +%s`
+echo "Launch" ${job_id}
+
+
+if [ -n "${o}" ]; then
+ # OBDfilter-survey (Obtain maximum available bandwidth of Lustre)
+ echo "Running OBDfilter-survey in the background"
+
+ HOMEOBDFILTER=${d}/${job_id}/obdfilter
+ sudo -u ${USER_NAME} mkdir -p $HOMEOBDFILTER
+ sudo -u ${USER_NAME} chmod 777 -R ${d}/${job_id}
+ ${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ size=65536 nobjlo=1 nobjhi=2 thrlo=32 thrhi=64 \
+ obdfilter-survey > ${HOMEOBDFILTER}/\`hostname -s\` & \
+ "
+ exit 0
+fi
+
+
+# rpc and brw logs directories
+LOCALRPC=/lioprof_loc/${job_id}/rpc
+LOCALBRW=/lioprof_loc/${job_id}/brw
+LOCALIOSTAT=/lioprof_loc/${job_id}/iostat
+
+HOMERPC=${d}/${job_id}/rpc
+HOMEBRW=${d}/${job_id}/brw
+HOMEIOSTAT=${d}/${job_id}/iostat
+
+# Create logs directories
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ mkdir -p ${LOCALRPC} ${LOCALBRW} ${LOCALIOSTAT}; \
+ "
+
+# Change log directories permissions
+sudo -u ${USER_NAME} mkdir -p ${HOMERPC} ${HOMEBRW} ${HOMEIOSTAT}
+sudo -u ${USER_NAME} chmod 777 -R ${d}/${job_id}
+
+# Enable RPC Tracing
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] \
+ "lctl set_param debug=rpctrace"
+
+# Evaluate Performance
+
+# Clear Lustre cache
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$CLIENT_MAX] " \
+ echo 3 > /proc/sys/vm/drop_caches; echo 0 > /proc/sys/vm/drop_caches;
+"
+
+# Start RPC log service and brw_stats
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ echo > /proc/fs/lustre/obdfilter/*/brw_stats; \
+ lctl clear; lctl debug_daemon start ${LOCALRPC}/rpc.log 1024; \
+ "
+
+# Start iostat
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ iostat 1 > ${LOCALIOSTAT}/iostat.log&
+ "
+sleep 2
+
+######################## Launch Application ########################
+${a} > ${d}/${job_id}/job-output
+sleep 2
+####################################################################
+
+# Collect Lustre RPC and btw_stats logs
+${pdsh_cmd} -R ssh -w $cluster_name-[$CLIENT_MIN-$CLIENT_MAX] " \
+ lctl set_param ldlm.namespaces.*.lru_size=clear
+ "
+sleep 5
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ lctl debug_daemon stop; \
+ cat /proc/fs/lustre/obdfilter/*/brw_stats > \
+ ${HOMEBRW}/brw-\`hostname -s\`; \
+ lctl debug_file ${LOCALRPC}/rpc.log ${HOMERPC}/rpc-\`hostname -s\`; \
+"
+
+# Stop iostat and collect data
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ pkill iostat; cp -r ${LOCALIOSTAT}/iostat.log \
+ ${HOMEIOSTAT}/iostat-\`hostname -s\` \
+"
+
+# Change log file mode
+sleep 1
+sudo -u root chmod 755 -R ${HOMERPC}/* ${HOMEBRW}/* ${HOMEIOSTAT}/*
+
+###################################################################
+#### Warning! Pay much more attention to rm commands with root ####
+###################################################################
+# Clear local history logs
+sleep 2
+LOCAL_LIOPROF=/lioprof_loc
+${pdsh_cmd} -R ssh -w $cluster_name-[$OSS_MIN-$OSS_MAX] " \
+ pkill iostat; \
+ rm -rf ${LOCAL_LIOPROF}; \
+"
--- /dev/null
+#!/bin/bash
+#
+# This file is provided under a dual BSD/GPLv2 license. When using or
+# redistributing this file, you may do so under either license.
+#
+# GPL LICENSE SUMMARY
+#
+# Copyright(c) 2016 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of version 2 of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# Contact Information:
+# Cong Xu, cong.xu@intel.com
+#
+# BSD LICENSE
+#
+# Copyright(c) 2016 Intel Corporation.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of Intel Corporation nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+
+function usage() {
+ cat << EOF
+Usage: $0 [-i] [-x] [-y] [-t]
+ -i path to LIOProf rpc tracing logs
+ -x lowest Lustre Client [IB IP Address]
+ -y highest Lustre Client [IB IP Address]
+ -t Type of Operation Code (OPC) (OST_READ 3, OST_WRITE 4)
+EOF
+ exit 0
+}
+
+
+while getopts ":i:x:y:t:" o; do
+ case "${o}" in
+ i)
+ i=${OPTARG};;
+ x)
+ x=${OPTARG};;
+ y)
+ y=${OPTARG};;
+ t)
+ t=${OPTARG};;
+ *)
+ usage;;
+ esac
+done
+shift $((OPTIND-1))
+
+if [ -z "${i}" ] || [ -z "${x}" ] || [ -z "${y}" ] || [ -z "${t}" ]; then
+ usage
+fi
+
+
+# Cluster Name
+cluster_name=$(cut -d- -f1 <<<"${x}")
+
+# Lustre Clients
+CLIENT_PRE=$(cut -d. -f 1-3 <<<"${x}")
+CLIENT_MIN=$(cut -d. -f4 <<<"${x}")
+CLIENT_MAX=$(cut -d. -f4 <<<"${y}")
+
+# Type of Operation Code (OPC) (Defined in lustre/include/lustre/lustre_idl.h)
+OPC_TYPE=${t}
+
+# Input directory
+IN_PUT=${i}
+
+# Output directory
+OUT_PUT=${i}-out
+rm -rf $OUT_PUT
+mkdir -p $OUT_PUT
+
+for f in ${IN_PUT}/*
+do
+ echo "Processing ${f}"
+ for ((c = $CLIENT_MIN; c <= $CLIENT_MAX; c = c + 1))
+ do
+ ip=${CLIENT_PRE}.$c
+ CUR_OST=$(echo "${f}" | rev | cut -d'/' -f1 | rev)
+
+ cat ${f} | grep "Handling RPC pname" | grep "ll_ost_io" | \
+ grep o2ib:${OPC_TYPE} | grep ${ip} | \
+ awk 'BEGIN{FS=":"}{print $4}' | sort -n | \
+ awk 'BEGIN {count = 0; line = 0; FS="."} {
+ if (NR == 1) {curval = $1};
+ if($1 <= curval) {
+ count = count + 1;
+ } else {
+ print line "\t" count;
+ curval = curval + 1;
+ line = line + 1;
+ count = 1;
+ while(curval < $1) {
+ print line "\t" 0;
+ curval = curval + 1;
+ line = line + 1;
+ }
+ }
+ } END {
+ print line "\t" count;
+ }' \
+ > ${OUT_PUT}/$CUR_OST-Client-$c &
+ done
+done
+
+# Wait for completion
+wait
+
+
+# If the lines of the files are different, fill the end of files with zeros to
+# guarantee all the files have the same number of lines.
+
+# Get max line
+MAX_LINE=-1
+for f in ${OUT_PUT}/*
+do
+ LINE=$(wc -l < ${f})
+ if [ "$MAX_LINE" -lt "$LINE" ]
+ then
+ MAX_LINE=${LINE}
+ fi
+done
+
+# Append zeros
+for f in ${OUT_PUT}/*
+do
+ LINE=$(wc -l < ${f})
+ for ((i = $LINE; i < $MAX_LINE; i = i + 1))
+ do
+ printf "0\t0\n" >> ${f} &
+ done
+done