I am using VTune Amplifier XE to profile WRF MPI and hybrid performance. I use the following command to produce to outputs:
ibrun -n 16 tacc_affinity amplxe-cl -collect hotspots -result-dir r001hs wrf.exe
This will give me 16 result directories. I was wondering if there is any way to combine all the output files to compare which subroutine/module are the most expensive. Or if there is any way to just produce one output instead of many.
I highly appreciate your help.
The offered -gtool options that can significantly help in selective rank and node-wide profiling is available only for Intel MPI starting from 5.0.3 and it assumes that Intel MPI job launcher is used.
As I see you use ibrun on TACC and it is hardly possible to switch to Intel MPI job launcher. Also taking into account the fact that ibrun does not support ability to use ":" syntax to point different apps for ranks I cannot offer driver-base collectors that can profile nodes in system-wide mode.
One opportunity is to use Intel MPS that is a part of Intel ITAC https://software.intel.com/en-us/articles/getting-started-with-the-mpi-performance-snapshot. It can show MPI and OpenMP imbalance in min/max/average form and depending on this you can choose a result directory of interest for particulart MPI rank.
In general the question is valid and I will experiment with TACC cluster a bit and see if we can do something more gracefully.
Thanks & Regards, Dmitry