I'd like to profile my MPI application with Vtune.
In ordered to see the inter-node behaviors,I definitely need to use '-gtool' options to aggregate the profiled result into one file.
1) When I run the application without profiling, the following command works perfect:
2) The following command also does the job (running multiple MPI processes on a machine). I can see the aggregated results of them.
3) However, when I add '-machinefile (~)' option to specify the nodes to run, the profiled result data are not combined together. Each of the result file shows profiled result for each MPI process run on that node.
Am I doing something wrong here? I googled a lot and read lots of documents but I cannot get any glimpse on what's going on here...
For more complete information about compiler optimizations, see our Optimization Notice.