I just installed the latest release of the Fortran Studio (Compiler, VTune, etc). I'm running in VS 2010.
Code I've been working on for a few months now requires 10 minutes for a VTune hotspot analysis. If I DO NOTHING ELSE but point VS to the previous version of the compiler, recompile and rerun the VTune analysis, it completes in ~ 1.5 minutes. In the old case, the recorded CPU times for the various components sum to what VTune lists as "Total CPU Time" -- about 70 seconds. The greater lapsed time presumably was VTune's overhead and such. The new compiler produces code that is somewhat slower (!), nevertheless the sum of the component times is about what VTune lists as the total CPU time -- However the lapsed time (as reported by VTune and my watch) is 630 seconds. Any ideas? Is this a known, new incompatiblity? What could the new compiler be doing (or not doing) that alters the operation of VTune so drastically.
If the new ifort has switched your default from /MT to /MD, you may have difficulty comparing the builds under VTune. Could you check that point or force both builds to use the same link options? I suspect /MT may be more satisfactory for VTune analysis, but the compiler may have changed in order to have a default compatible with C++.
I think Steve posted a notice about this, but I'm having difficulty searching for it.
The problem remains. /MT didn't change the behavior (I'm already using /libs:static -- is that the same?) What is strange is that VTune doesn't report where the additional 8.5 minutes is being spent. No other programs are running (other than VS 2010). The OS is 32 bit Win 7.
Unhappily, I need to revert to the pervious release of the compiler, I cannot afford to have evaluation cycle times increased by a factor of 10. In any event, thanks for your help.
Hi David try to check with Xperf, it is recommended if applicable to your case to use a multitude of profiling tools.
Also hardware counters are not restricted to track specific thread virtual address space so they will increment the counter on every occurrence of some event which can come from context switching.This is the responsibility of Vtune to track and resolve the specific thread IP.
I need to clarify "total CPU time" is sum of all CPU time in cores, it is not real Elapsed Time (code paralleling execution is not considered)
What was elapsed time you saw in VTune report? What was elapsed time you ran without VTune?
Is it possible that you zip result directory and attach it, point out that your concerns are in report?