Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Novice
19 Views

VTune to profile and trace MPI code?

Can we use VTune to trace execution and perform profiling of MPI code? Any ideas how to goa bout accomplishing this?

thanks ....
-Michael
0 Kudos
3 Replies
Highlighted
Black Belt
19 Views

If you're satisfied to collect events on a shared memory machine, you can set up a script which does the entire job, including environment settings, and run that under VTune. Or, you can make the mpirun the application run directly by VTune, and the entire remainder of the command line go in the corresponding entry in VTune setup.
If you wish to collect events across a cluster, the usual recommendation is to run an SEP job per node under mpirun, generating a .tb5 file for each node. If you are able to make an X window or VNC connection to the cluster, you can use PTU to generate the command string and to analyze the results. PTU and SEP are available for download and discussion on their section of the WhatIf forum; they use your VTune license.
As with any VTune event sampling, you would build the application with debug symbols (assuming you are analyzing that rather than the MPI itself), and choose that application as module of interest. If it is an open source MPI, you can build the libraries with symbols to facilitate their inclusion in your analysis.
You may wish to form a strategy about the MPI waits, such as setting the spin wait time-out to a high value so that the CPU stays busy during waits. The waits will vary greatly among ranks, even if the work is well balanced.
0 Kudos
Highlighted
Novice
19 Views

Hi Tim,

thanks for the reply. I will look into SEP and PTU from whatif site. So is the less hassle way tp trace MPI code by the Intel trace analyzer tool?

I was looking into VTune for MPI so I can also capture performance on the variou ranks using the h/w perf counters.

regards
Michael

0 Kudos
Highlighted
Black Belt
19 Views

Trace analyzer is useful for different purposes than VTune/PTU/SEP. In some overlapping areas, such as measuring MPI_Wait, it should be "less hassle."
0 Kudos