- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a program written with MPI and openMP, and I want to measure the performance of MPI interfaces.
I tried vtune hotspots to profile it,
export OMP_NUM_THREAD=6
cat >vtune.conf <<EOF
0-34 ./app
35 amplxe-cl -collect hotspots -no-follow-child -trace-mpi -r result -- ./app
EOF
srun -N 6 -n 36 --multi-prog vtune.conf
In the profiling results, the hotspots functions are
opal_timer_base_get_usec_sys_timer
_pthread
those results are not what I intended, I want to get the results of MPI interfaces, what should I tweak to get the right hotspots? Should I use a different tool?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Thank you for posting in Intel Forums. You could make the below corrections to your commands.
export OMP_NUM_THREADS=12
mpirun -n 16 –ppn 4 –l vtune -collect hotspots -k sampling-mode=hw -trace-mpi -result-dir <result directory path> -- ./app
If a MPI application is launched on multiple nodes, VTune Profiler creates a number of result directories per compute node in the current directory encapsulating the data for all the ranks running on the node in the same directory.
Please refer the below link to know the utilization of both the tools with MPI, Intel Advisor and Intel VTune profiler, to collect performance data at the node and core level
A separate tool exists to record the details of the communication patterns and communication costs of an MPI application, the Intel® Trace Analyzer and Collector. The information provided by VTune™ Amplifier and Intel® Advisor is focused on the core and node performance, and complements the specific MPI communication details provided by the Intel® Trace Analyzer and Collector.
Please check this and let us know if this works.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your reply.
The version of VTune is `Intel(R) VTune(TM) Amplifier 2018 Update 4 (build 574913) Command Line Tool`, so I would use amplxe-cl instead.
May I show you more details for this experiment? There are 6 MPI procs launched on each node and on the last single node, 1 MPI proc is utilized to perform parallel io. I want to focus more on the performance of this parallel io MPI proc. Is measuring all nodes necessary?
Bests
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Profiling runs in Vtune would include profiling all the MPI processes inside a node, or a single MPI process on each node. You could use selective profiling to reduce the size of the results collected by VTune Profiler.
We have provided a command to profile a single mpi process after building the application. Among the 64 processes, 16 processes are allocated per node. Using option "-gtool" you could launch tools such as Intel® VTune profiler, Intel Advisor, and GNU Debugger (GDB) for the specified processes through the mpiexec.hydra and mpirun commands and profile the required process.
mpirun -n 64 -ppn 16 -gtool "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out "vtune -collect <analysis-type> -r <result-dir> :5" ./a.out "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out -gtool "vtune -collect <analysis-type> -r <result-dir> :7" ./a.out
Please try this and let us know if this works.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is your issue resolved? Could you give us an update?
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
We assume that your issue is resolved. If you need any further assistance, please post a new question as this thread will no longer be monitored.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
If you need exact timing for MPI functions it makes sense to use tools based on MPI instrumentation like APS <VTune_install_dir/bin64/>aps or ITAC. VTune sampling approach usually does not work well with this.
Thanks & Regards, Dmitry

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page