Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
1788 Discussions

How to collect software features in a cluster environment using Intel VTune

Xiaoqiang
New Contributor I
868 Views

Intel VTune is a great performance analysis tool. I am currently experiencing some performance issues and would like to use this tool to analyze them.

 

Environment to run the program:

An x86 cluster, I can only log into the management node. Then submit the task to the compute node.

This is the script I used to submit the assignment,the main processes are:

  • Request computing resources
  • Generate the hostfile
  • Execute the program via mpirun
#!/bin/bash
#DSUB -n template-26c
#DSUB -A huakemeiranshaoshiyanshi
#DSUB -T 1000h0m0s
#DSUB -N 1
#DSUB -R cpu=26
#DSUB -o out.%J
#DSUB -e err.%J
#DSUB --job_type cosched

cores='26'
app='FPVFoam_transNO_hybrid'


source /home/huakemeiranshaoshiyanshi/zliu/weizy/evn.sh

echo ----- print env vars -----
if [ "${CCSCHEDULER_ALLOC_FILE}" != "" ]; then
    echo "   "
    ls -la ${CCSCHEDULER_ALLOC_FILE}
    echo ------ cat ${CCSCHEDULER_ALLOC_FILE}
    cat ${CCSCHEDULER_ALLOC_FILE}
fi

export HOSTFILE=/tmp/hostfile.$$
rm -rf $HOSTFILE
touch $HOSTFILE

ntask=`cat ${CCSCHEDULER_ALLOC_FILE} | awk -v fff="$HOSTFILE" '{}
{
    split($0, a, " ")
    if (length(a[1]) >0 && length(a[3]) >0) {
        print a[1]":"a[2] >> fff
        total_task+=a[3]
    }
}END{print total_task}'`

echo "hostfile $HOSTFILE generated:"
echo "-----------------------"
cat $HOSTFILE
echo "-----------------------"
echo "Total tasks is $ntask"
echo "mpirun -hostfile $HOSTFILE -n $ntask <your application>"

{ time -p `which mpirun` --hostfile $HOSTFILE -np $cores -env UCX_NET_DEVICES=mlx5_0:1 -env UCX_IB_GID_INDEX=3  -launcher ssh -launcher-exec /opt/batch/agent/tools/dstart $app -parallel > template-26c-1.log; }

ret=$?

Compilers and MPI:

Because the software version I use is very old, I also use the older gcc4.8.5 compiler. The MPI version I use is mpich3.4.

(base) [zliu@cli01 ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)


(base) [zliu@cli01 ~]$ mpirun --version
HYDRA build details:
    Version:                                 3.4b1
    Release Date:                            Mon Oct  5 21:47:25 CDT 2020
    CC:                              gcc -std=gnu99 -std=gnu99
    Configure options:                       '--disable-option-checking' '--prefix=/share/app/mpich/mpichapp' '--with-device=ch4:ucx' '--with-ucx=/share/app/mpich/ucx' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc -std=gnu99 -std=gnu99' 'CFLAGS= -O2' 'LDFLAGS= -L/share/app/mpich/ucx/lib' 'LIBS=-lucp -lucp ' 'CPPFLAGS= -I/share/app/mpich/ucx/include -DNETMOD_INLINE=__netmod_inline_ucx__ -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/json-c -I/share/app/mpich/mpich-3.4b1/modules/json-c -D_REENTRANT -I/share/app/mpich/mpich-3.4b1/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select

  

Now that I have installed Intel VTune, I would like to use it to analyze my performance problems.

My question is:

  • In order to collect software features through Intel VTune, do I need to recompile the program using ICC and Intel MPI?
  • How can I modify the script for submitting tasks so that VTune can collect information (I can't directly mpirun, I can only submit tasks)
  • Do I just need to copy the collected result files to view the results locally in a graphical way. I also installed Intel VTune on my local windows.

In the end, I hope to get the result like this:

Xiaoqiang_0-1641377043452.png

I hope to get the characteristics of MPI, IO, compute........

 

Thanks!

0 Kudos
0 Replies
Reply