Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
1635 Discussions

How to collect software features in a cluster environment using Intel VTune


Intel VTune is a great performance analysis tool. I am currently experiencing some performance issues and would like to use this tool to analyze them.


Environment to run the program:

An x86 cluster, I can only log into the management node. Then submit the task to the compute node.

This is the script I used to submit the assignment,the main processes are:

  • Request computing resources
  • Generate the hostfile
  • Execute the program via mpirun
#DSUB -n template-26c
#DSUB -A huakemeiranshaoshiyanshi
#DSUB -T 1000h0m0s
#DSUB -N 1
#DSUB -R cpu=26
#DSUB -o out.%J
#DSUB -e err.%J
#DSUB --job_type cosched


source /home/huakemeiranshaoshiyanshi/zliu/weizy/

echo ----- print env vars -----
if [ "${CCSCHEDULER_ALLOC_FILE}" != "" ]; then
    echo "   "
    echo ------ cat ${CCSCHEDULER_ALLOC_FILE}

export HOSTFILE=/tmp/hostfile.$$
rm -rf $HOSTFILE

ntask=`cat ${CCSCHEDULER_ALLOC_FILE} | awk -v fff="$HOSTFILE" '{}
    split($0, a, " ")
    if (length(a[1]) >0 && length(a[3]) >0) {
        print a[1]":"a[2] >> fff
}END{print total_task}'`

echo "hostfile $HOSTFILE generated:"
echo "-----------------------"
echo "-----------------------"
echo "Total tasks is $ntask"
echo "mpirun -hostfile $HOSTFILE -n $ntask <your application>"

{ time -p `which mpirun` --hostfile $HOSTFILE -np $cores -env UCX_NET_DEVICES=mlx5_0:1 -env UCX_IB_GID_INDEX=3  -launcher ssh -launcher-exec /opt/batch/agent/tools/dstart $app -parallel > template-26c-1.log; }


Compilers and MPI:

Because the software version I use is very old, I also use the older gcc4.8.5 compiler. The MPI version I use is mpich3.4.

(base) [zliu@cli01 ~]$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl= --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

(base) [zliu@cli01 ~]$ mpirun --version
HYDRA build details:
    Version:                                 3.4b1
    Release Date:                            Mon Oct  5 21:47:25 CDT 2020
    CC:                              gcc -std=gnu99 -std=gnu99
    Configure options:                       '--disable-option-checking' '--prefix=/share/app/mpich/mpichapp' '--with-device=ch4:ucx' '--with-ucx=/share/app/mpich/ucx' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc -std=gnu99 -std=gnu99' 'CFLAGS= -O2' 'LDFLAGS= -L/share/app/mpich/ucx/lib' 'LIBS=-lucp -lucp ' 'CPPFLAGS= -I/share/app/mpich/ucx/include -DNETMOD_INLINE=__netmod_inline_ucx__ -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/json-c -I/share/app/mpich/mpich-3.4b1/modules/json-c -D_REENTRANT -I/share/app/mpich/mpich-3.4b1/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select


Now that I have installed Intel VTune, I would like to use it to analyze my performance problems.

My question is:

  • In order to collect software features through Intel VTune, do I need to recompile the program using ICC and Intel MPI?
  • How can I modify the script for submitting tasks so that VTune can collect information (I can't directly mpirun, I can only submit tasks)
  • Do I just need to copy the collected result files to view the results locally in a graphical way. I also installed Intel VTune on my local windows.

In the end, I hope to get the result like this:


I hope to get the characteristics of MPI, IO, compute........



0 Kudos
0 Replies