I am trying to profile an MPI application. 6 nodes, 120 total MPI ranks.
I am interested in either a) Metrics on a single node or b) average metrics across all nodes.
Is there an easy way to do an option b) run?
Can anyone think of any ways that my data can be skewed if I'm only collecting metrics on a single rank?
This subject seems more likely to get the attention of experts if posted on the HPC clusters companion forum.
I don't consider it easy, but you may need to run VTune command line, collecting a separate .tb5 for each rank. This would enable you to analyze performance issues which aren't replicated in all ranks. Presumably easier would be to use Intel Trace Analyzer to determine whether there are such issues.
Useful results usually may be obtained by running a single copy of VTune when running a job on a single node.
Trace Analyzer is a descendant of the original vampir which is available as open source; another open source alternative is jumpshot.