The VTune and Advisor analyses run extremly fast when compared to my non-parallelized raw C++ code in VS2013.
Does VTune/Advisor use their own switch settings? Does the Intel analyses actually run the code or simulate it?
The code is an electromagnetic simulation and only generates matrices and has nothing to do with graphics. I have an AMD 8-core CPU with two Radeon 6990 boards (4 GPUs total because they each have 2 GPUs). I haven't parallized anything in the code yet. I need to understand the fundamentals of what I'm seeing first.
I'm using VS2013 with Intel Composer XE SP1
>>> Does the Intel analyses actually run the code or simulate it?>>>
Actually this is a cooperation of kernel mode modules (producers) and user mode modules (consumers). Your code is probably scheduled to run from inside the VTune UI and kernel mode modules are gathering CPU stats by reading and writing MSR registers. Specific module vtss.sys is used probably to walk thread stacks(kernel and user mode) and resolve function calls.
I've used Basic Hotspots, Advanced Hotspots, and any of the Advisor tools. The code runs and cycles through expected screen outputs in any of these analyses in under a second. If I run the code by itself, it'll take several minutes.
There's something fundamental that I'm missing about it's use. Iliy post is insightful though the program does appear to run as expected, just faster.
Yes, the program runs faster within VTune and it generates the expected outputs (data files and screen output).
I can run the program in VTune and get great performance. The executable is about 15x slower than if I run it in VTune. Additionally running it from the program debugger (F5 from within VS2013) is about 100x slower.
I'd like to identify how I can match the performance of the VTune run code from the executable and a get the Visual Studios run release version somewhere close.
VTune Amplifier never "simulates" your code. However, depending on your analysis type, it may inject some code. Under what analysis type(s) are you seeing this behavior?
AFAIK some kind of instrumentation is injected inside the profiled process address space,but I do not know how it can contribute to increased code performance. Maybe it is dependent of specific analysis type as @MrAnderson hints.
>>>Additionally running it from the program debugger (F5 from within VS2013) is about 100x slower.>>>
In this case VS debugger can perform additional checks on memory buffers and stack guard pages moreover it can execute breakpoints and attempt to handle various exceptions.
Thanks again Iliy. I did just that by putting a timer in the code and running it through different processes.
Human error and inexperience prevail in this case. The Intel VTune and Intel Survey Analysis all had virtually identical run times to the executable when run from the prompt. My failure was in two parts: #1 running an older release version from the prompt which was had 15x longer run-time (configuration control); and #2 not having my release configuration setup properly in VS to run at a speed representative of the executable.
Sorry to have wasted your time. I'm sure there are more satisfying problems to solve on the forum. You did give me an insight to the Intel tools which was, in part, what I was seeking.