Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
4645 Discussions

Measure performance MPI + openMP

i9
Beginner
436 Views

There is a program written with MPI and openMP, and I want to measure the performance of MPI interfaces.

I tried vtune hotspots to profile it,

 

export OMP_NUM_THREAD=6

cat >vtune.conf <<EOF
0-34     ./app
35       amplxe-cl -collect hotspots -no-follow-child  -trace-mpi -r result -- ./app
EOF

srun -N 6 -n 36 --multi-prog vtune.conf

 

In the profiling results, the hotspots functions are  

opal_timer_base_get_usec_sys_timer

func@xxxx

_pthread

 

those results are not what I intended, I want to get the results of MPI interfaces, what should I tweak to get the right hotspots? Should I use a different tool?

Labels (1)
0 Kudos
6 Replies
AlekhyaV_Intel
Moderator
405 Views

Hi


Thank you for posting in Intel Forums. You could make the below corrections to your commands.


export OMP_NUM_THREADS=12

mpirun -n 16 –ppn 4 –l vtune -collect hotspots -k sampling-mode=hw -trace-mpi -result-dir <result directory path> -- ./app


If a MPI application is launched on multiple nodes, VTune Profiler creates a number of result directories per compute node in the current directory encapsulating the data for all the ranks running on the node in the same directory.

Please refer the below link to know the utilization of both the tools with MPI, Intel Advisor and Intel VTune profiler, to collect performance data at the node and core level

https://software.intel.com/content/www/us/en/develop/articles/using-intel-advisor-and-vtune-amplifie...

A separate tool exists to record the details of the communication patterns and communication costs of an MPI application, the Intel® Trace Analyzer and Collector. The information provided by VTune™ Amplifier and Intel® Advisor is focused on the core and node performance, and complements the specific MPI communication details provided by the Intel® Trace Analyzer and Collector.

Please check this and let us know if this works.


Regards,

Alekhya


i9
Beginner
402 Views

Thank you very much for your reply.

 

The version of VTune is `Intel(R) VTune(TM) Amplifier 2018 Update 4 (build 574913) Command Line Tool`, so I would use amplxe-cl instead.

 

May I show you more details for this experiment? There are 6 MPI procs launched on each node and on the last single node, 1 MPI proc is utilized to perform parallel io. I want to focus more on the performance of this parallel io MPI proc. Is measuring all nodes necessary?

 

Bests

AlekhyaV_Intel
Moderator
373 Views

Hi,


Profiling runs in Vtune would include profiling all the MPI processes inside a node, or a single MPI process on each node. You could use selective profiling to reduce the size of the results collected by VTune Profiler.


We have provided a command to profile a single mpi process after building the application. Among the 64 processes, 16 processes are allocated per node. Using option "-gtool"  you could launch tools such as Intel® VTune profiler, Intel Advisor, and GNU Debugger (GDB) for the specified processes through the mpiexec.hydra and mpirun commands and profile the required process.


mpirun -n 64 -ppn 16 -gtool "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out "vtune -collect <analysis-type> -r <result-dir> :5" ./a.out "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out -gtool "vtune -collect <analysis-type> -r <result-dir> :7" ./a.out


Please try this and let us know if this works.


Regards,

Alekhya


AlekhyaV_Intel
Moderator
328 Views

Hi,


Is your issue resolved? Could you give us an update?


Regards,

Alekhya


AlekhyaV_Intel
Moderator
308 Views

Hi


We assume that your issue is resolved. If you need any further assistance, please post a new question as this thread will no longer be monitored.


Regards,

Alekhya


Dmitry_P_Intel1
Employee
302 Views

Hello, 

 

If you need exact timing for MPI functions it makes sense to use tools based on MPI instrumentation like APS <VTune_install_dir/bin64/>aps or ITAC. VTune sampling approach usually does not work well with this.

 

Thanks & Regards, Dmitry

Reply