Performance teams – if you do optimization for a number of groups, you can use the application and MPI performance snapshots to prioritize your workload. Run a quick analysis on each application and see which can benefit most from optimization. Even better, ask the application owner to run a snapshot and send you the results.
No tricky options to figure out and understand - just prefix your app invocation with the aps script/batch file and generate an HTML file with easy to understand metrics.
It's available today in the beta package of VTune Amplifier XE and we are desperately looking for your feedback. Not a replacement for VTune Amplifier but, rather, a quick start to measuring application performance.
Will your app benefit from better vectorization & threading?
Get a quick snapshot of:
Collect data on a Windows* or Linux* system. View results in a web browser.
Several things worth to notice about Application Performance Snapshot:
The version in just announced VTune Amplifier 2017 Beta Update 1 shows not only "effective" CPU utilization by users code but also breakdown by MPI/threading runtimes.
Since FPU utilization metrics are built on PMU events, this aspect is supported for 3rd Generation Intel Core processors, 5th Generation Intel processors, and 6th Generation Intel processors.
The analysis shows statistics aggregated by the whole workload (that's why the report should show up immediately after collection is done). In VTune Amplifier XE we have predefined analysis type "HPC Performance Characterization" that shows statistics by the same aspects but allows to have detailed breakdown by regions/functions/loops and view overtime with the full VTune UI power to group/filter data.
Thanks & Regards, Dmitry