I'm looking for help on profiling MPI program with Intel Fortran compiler.
Currently I'm profiling MPI Fortran program by compiling the code using "-profile-functions -profile-loops=all" options in Intel compiler, then run the generated code on multiple compute nodes. After the execution I got one "loop_prof_funcs_xxx.dump" and one "loop_prof_loops_xxx.dump" profiling files showing the aggregated execution costs at job level. This is helpful but it still miss the process level details. Is there any way to profile MPI program and generate profiling data (e.g. .dump files) for each individual process/rank?
Any guidance and advise is appreciated!