Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
5261 Discussions

Measure performance MPI + openMP

i9
Beginner
1,988 Views

There is a program written with MPI and openMP, and I want to measure the performance of MPI interfaces.

I tried vtune hotspots to profile it,

 

export OMP_NUM_THREAD=6

cat >vtune.conf <<EOF
0-34     ./app
35       amplxe-cl -collect hotspots -no-follow-child  -trace-mpi -r result -- ./app
EOF

srun -N 6 -n 36 --multi-prog vtune.conf

 

In the profiling results, the hotspots functions are  

opal_timer_base_get_usec_sys_timer

func@xxxx

_pthread

 

those results are not what I intended, I want to get the results of MPI interfaces, what should I tweak to get the right hotspots? Should I use a different tool?

Labels (1)
0 Kudos
6 Replies
AlekhyaV_Intel
Moderator
1,957 Views

Hi


Thank you for posting in Intel Forums. You could make the below corrections to your commands.


export OMP_NUM_THREADS=12

mpirun -n 16 –ppn 4 –l vtune -collect hotspots -k sampling-mode=hw -trace-mpi -result-dir <result directory path> -- ./app


If a MPI application is launched on multiple nodes, VTune Profiler creates a number of result directories per compute node in the current directory encapsulating the data for all the ranks running on the node in the same directory.

Please refer the below link to know the utilization of both the tools with MPI, Intel Advisor and Intel VTune profiler, to collect performance data at the node and core level

https://software.intel.com/content/www/us/en/develop/articles/using-intel-advisor-and-vtune-amplifier-with-mpi.html

A separate tool exists to record the details of the communication patterns and communication costs of an MPI application, the Intel® Trace Analyzer and Collector. The information provided by VTune™ Amplifier and Intel® Advisor is focused on the core and node performance, and complements the specific MPI communication details provided by the Intel® Trace Analyzer and Collector.

Please check this and let us know if this works.


Regards,

Alekhya


0 Kudos
i9
Beginner
1,954 Views

Thank you very much for your reply.

 

The version of VTune is `Intel(R) VTune(TM) Amplifier 2018 Update 4 (build 574913) Command Line Tool`, so I would use amplxe-cl instead.

 

May I show you more details for this experiment? There are 6 MPI procs launched on each node and on the last single node, 1 MPI proc is utilized to perform parallel io. I want to focus more on the performance of this parallel io MPI proc. Is measuring all nodes necessary?

 

Bests

0 Kudos
AlekhyaV_Intel
Moderator
1,925 Views

Hi,


Profiling runs in Vtune would include profiling all the MPI processes inside a node, or a single MPI process on each node. You could use selective profiling to reduce the size of the results collected by VTune Profiler.


We have provided a command to profile a single mpi process after building the application. Among the 64 processes, 16 processes are allocated per node. Using option "-gtool"  you could launch tools such as Intel® VTune profiler, Intel Advisor, and GNU Debugger (GDB) for the specified processes through the mpiexec.hydra and mpirun commands and profile the required process.


mpirun -n 64 -ppn 16 -gtool "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out "vtune -collect <analysis-type> -r <result-dir> :5" ./a.out "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out -gtool "vtune -collect <analysis-type> -r <result-dir> :7" ./a.out


Please try this and let us know if this works.


Regards,

Alekhya


0 Kudos
AlekhyaV_Intel
Moderator
1,880 Views

Hi,


Is your issue resolved? Could you give us an update?


Regards,

Alekhya


0 Kudos
AlekhyaV_Intel
Moderator
1,860 Views

Hi


We assume that your issue is resolved. If you need any further assistance, please post a new question as this thread will no longer be monitored.


Regards,

Alekhya


0 Kudos
Dmitry_P_Intel1
Employee
1,854 Views

Hello, 

 

If you need exact timing for MPI functions it makes sense to use tools based on MPI instrumentation like APS <VTune_install_dir/bin64/>aps or ITAC. VTune sampling approach usually does not work well with this.

 

Thanks & Regards, Dmitry

0 Kudos
Reply