<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to collect software features in a cluster environment using Intel VTune in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-collect-software-features-in-a-cluster-environment-using/m-p/1348662#M7997</link>
    <description>&lt;P&gt;Intel VTune is a great performance analysis tool. I am currently experiencing some performance issues and would like to use this tool to analyze them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Environment to run the program：&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;An x86 cluster, I can only log into the management node.&amp;nbsp;Then submit the task to the compute node.&lt;/P&gt;
&lt;P&gt;This is the script I used to submit the assignment，the main processes are：&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Request computing resources&lt;/LI&gt;
&lt;LI&gt;Generate the &lt;EM&gt;hostfile&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;Execute the program via mpirun&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="bash"&gt;#!/bin/bash
#DSUB -n template-26c
#DSUB -A huakemeiranshaoshiyanshi
#DSUB -T 1000h0m0s
#DSUB -N 1
#DSUB -R cpu=26
#DSUB -o out.%J
#DSUB -e err.%J
#DSUB --job_type cosched

cores='26'
app='FPVFoam_transNO_hybrid'


source /home/huakemeiranshaoshiyanshi/zliu/weizy/evn.sh

echo ----- print env vars -----
if [ "${CCSCHEDULER_ALLOC_FILE}" != "" ]; then
    echo "   "
    ls -la ${CCSCHEDULER_ALLOC_FILE}
    echo ------ cat ${CCSCHEDULER_ALLOC_FILE}
    cat ${CCSCHEDULER_ALLOC_FILE}
fi

export HOSTFILE=/tmp/hostfile.$$
rm -rf $HOSTFILE
touch $HOSTFILE

ntask=`cat ${CCSCHEDULER_ALLOC_FILE} | awk -v fff="$HOSTFILE" '{}
{
    split($0, a, " ")
    if (length(a[1]) &amp;gt;0 &amp;amp;&amp;amp; length(a[3]) &amp;gt;0) {
        print a[1]":"a[2] &amp;gt;&amp;gt; fff
        total_task+=a[3]
    }
}END{print total_task}'`

echo "hostfile $HOSTFILE generated:"
echo "-----------------------"
cat $HOSTFILE
echo "-----------------------"
echo "Total tasks is $ntask"
echo "mpirun -hostfile $HOSTFILE -n $ntask &amp;lt;your application&amp;gt;"

{ time -p `which mpirun` --hostfile $HOSTFILE -np $cores -env UCX_NET_DEVICES=mlx5_0:1 -env UCX_IB_GID_INDEX=3  -launcher ssh -launcher-exec /opt/batch/agent/tools/dstart $app -parallel &amp;gt; template-26c-1.log; }

ret=$?&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Compilers and MPI:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Because the software version I use is very old, I also use the older gcc4.8.5 compiler. The MPI version I use is mpich3.4.&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;(base) [zliu@cli01 ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)


(base) [zliu@cli01 ~]$ mpirun --version
HYDRA build details:
    Version:                                 3.4b1
    Release Date:                            Mon Oct  5 21:47:25 CDT 2020
    CC:                              gcc -std=gnu99 -std=gnu99
    Configure options:                       '--disable-option-checking' '--prefix=/share/app/mpich/mpichapp' '--with-device=ch4:ucx' '--with-ucx=/share/app/mpich/ucx' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc -std=gnu99 -std=gnu99' 'CFLAGS= -O2' 'LDFLAGS= -L/share/app/mpich/ucx/lib' 'LIBS=-lucp -lucp ' 'CPPFLAGS= -I/share/app/mpich/ucx/include -DNETMOD_INLINE=__netmod_inline_ucx__ -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/json-c -I/share/app/mpich/mpich-3.4b1/modules/json-c -D_REENTRANT -I/share/app/mpich/mpich-3.4b1/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now that I have installed Intel VTune, I would like to use it to analyze my performance problems.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;My question is：&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In order to collect software features through Intel VTune, do I need to recompile the program using ICC and Intel MPI?&lt;/LI&gt;
&lt;LI&gt;How can I modify the script for submitting tasks so that VTune can collect information (I can't directly mpirun, I can only submit tasks)&lt;/LI&gt;
&lt;LI&gt;Do I just need to copy the collected result files to view the results locally in a graphical way. I also installed Intel VTune on my local windows.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In the end, I hope to get the result like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Xiaoqiang_0-1641377043452.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/25338i511D0537B61FB627/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="Xiaoqiang_0-1641377043452.png" alt="Xiaoqiang_0-1641377043452.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I hope to get the characteristics of MPI, IO, compute........&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 05 Jan 2022 10:07:51 GMT</pubDate>
    <dc:creator>Xiaoqiang</dc:creator>
    <dc:date>2022-01-05T10:07:51Z</dc:date>
    <item>
      <title>How to collect software features in a cluster environment using Intel VTune</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-collect-software-features-in-a-cluster-environment-using/m-p/1348662#M7997</link>
      <description>&lt;P&gt;Intel VTune is a great performance analysis tool. I am currently experiencing some performance issues and would like to use this tool to analyze them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Environment to run the program：&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;An x86 cluster, I can only log into the management node.&amp;nbsp;Then submit the task to the compute node.&lt;/P&gt;
&lt;P&gt;This is the script I used to submit the assignment，the main processes are：&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Request computing resources&lt;/LI&gt;
&lt;LI&gt;Generate the &lt;EM&gt;hostfile&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;Execute the program via mpirun&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="bash"&gt;#!/bin/bash
#DSUB -n template-26c
#DSUB -A huakemeiranshaoshiyanshi
#DSUB -T 1000h0m0s
#DSUB -N 1
#DSUB -R cpu=26
#DSUB -o out.%J
#DSUB -e err.%J
#DSUB --job_type cosched

cores='26'
app='FPVFoam_transNO_hybrid'


source /home/huakemeiranshaoshiyanshi/zliu/weizy/evn.sh

echo ----- print env vars -----
if [ "${CCSCHEDULER_ALLOC_FILE}" != "" ]; then
    echo "   "
    ls -la ${CCSCHEDULER_ALLOC_FILE}
    echo ------ cat ${CCSCHEDULER_ALLOC_FILE}
    cat ${CCSCHEDULER_ALLOC_FILE}
fi

export HOSTFILE=/tmp/hostfile.$$
rm -rf $HOSTFILE
touch $HOSTFILE

ntask=`cat ${CCSCHEDULER_ALLOC_FILE} | awk -v fff="$HOSTFILE" '{}
{
    split($0, a, " ")
    if (length(a[1]) &amp;gt;0 &amp;amp;&amp;amp; length(a[3]) &amp;gt;0) {
        print a[1]":"a[2] &amp;gt;&amp;gt; fff
        total_task+=a[3]
    }
}END{print total_task}'`

echo "hostfile $HOSTFILE generated:"
echo "-----------------------"
cat $HOSTFILE
echo "-----------------------"
echo "Total tasks is $ntask"
echo "mpirun -hostfile $HOSTFILE -n $ntask &amp;lt;your application&amp;gt;"

{ time -p `which mpirun` --hostfile $HOSTFILE -np $cores -env UCX_NET_DEVICES=mlx5_0:1 -env UCX_IB_GID_INDEX=3  -launcher ssh -launcher-exec /opt/batch/agent/tools/dstart $app -parallel &amp;gt; template-26c-1.log; }

ret=$?&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Compilers and MPI:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Because the software version I use is very old, I also use the older gcc4.8.5 compiler. The MPI version I use is mpich3.4.&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;(base) [zliu@cli01 ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)


(base) [zliu@cli01 ~]$ mpirun --version
HYDRA build details:
    Version:                                 3.4b1
    Release Date:                            Mon Oct  5 21:47:25 CDT 2020
    CC:                              gcc -std=gnu99 -std=gnu99
    Configure options:                       '--disable-option-checking' '--prefix=/share/app/mpich/mpichapp' '--with-device=ch4:ucx' '--with-ucx=/share/app/mpich/ucx' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc -std=gnu99 -std=gnu99' 'CFLAGS= -O2' 'LDFLAGS= -L/share/app/mpich/ucx/lib' 'LIBS=-lucp -lucp ' 'CPPFLAGS= -I/share/app/mpich/ucx/include -DNETMOD_INLINE=__netmod_inline_ucx__ -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/src/mpl/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/yaksa/src/frontend/include -I/share/app/mpich/mpich-3.4b1/modules/json-c -I/share/app/mpich/mpich-3.4b1/modules/json-c -D_REENTRANT -I/share/app/mpich/mpich-3.4b1/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now that I have installed Intel VTune, I would like to use it to analyze my performance problems.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;My question is：&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In order to collect software features through Intel VTune, do I need to recompile the program using ICC and Intel MPI?&lt;/LI&gt;
&lt;LI&gt;How can I modify the script for submitting tasks so that VTune can collect information (I can't directly mpirun, I can only submit tasks)&lt;/LI&gt;
&lt;LI&gt;Do I just need to copy the collected result files to view the results locally in a graphical way. I also installed Intel VTune on my local windows.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In the end, I hope to get the result like this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Xiaoqiang_0-1641377043452.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/25338i511D0537B61FB627/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="Xiaoqiang_0-1641377043452.png" alt="Xiaoqiang_0-1641377043452.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I hope to get the characteristics of MPI, IO, compute........&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jan 2022 10:07:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-collect-software-features-in-a-cluster-environment-using/m-p/1348662#M7997</guid>
      <dc:creator>Xiaoqiang</dc:creator>
      <dc:date>2022-01-05T10:07:51Z</dc:date>
    </item>
  </channel>
</rss>

