I've just started using VTune amplifier to profile my code, and I have a very basic question. How do I find functions that are taking the most wall clock time overall, and the most wall clock time per invocation? Basically, I have a Fortran application with regions of OpenMP parallel code and regions of serial code. My aim is to figure out if any of the serial regions can be speeded up by parallelizing them, for which I first need to find out what the time consuming serial portions are.
Thanks in advance,
If you run basic or advanced-hotspots analysis on your OpenMP application (I assume you use Intel Compiler) you will be able to get OpenMP analysis information that includes serial vs parallel time metric and ability to look at serial hotspots. Please look at https://software.intel.com/en-us/node/543979 for details. Will be willing to help in case of any questions.
Thanks & Regards, Dmitry
Oh, and I would recommend to use the latest VTune Amplifier XE 2016 Gold that has just been released. It has full feature set for OpenMP analysis and significant improvements in UI responsiveness that is very important for analysis on multi/many -core systems.
Once you have clicked on the link to "upgrade" your license, just click on the "Intel Software Products" link in the nav bar on the right. Then, scroll down and locate VTune Amplifier XE for Windows in the Parallel Studio XE product list.
Alternatively, use the Intel® Parallel Studio XE "online" installer, then customer the installation and only install the VTune Amplifier XE.