I compile the code by VC and ICC, and benchmark them. I find the ICC and VC is EXACTLY SAME for IJL.
I have tried O2, O3 and full optimization.
The different is about +-1%.
I have also tried matrix multiply, the VC and ICC is the same performance.
So, I just want to ask, what condition I should use ICC for acceleration?
test platform :
intel duel core E6500(2.93GHz) with DDR 2
windows XP sp3
Visual Studio 2005 (VC 8)
Intel C++ 10.1
Back when I was using VC++ 6, I was able to switch to the Intel compiler, set some flags and get 25%-30% gain without any additional work. Since VS2005, I have been seeing results similar to your - the Microsoft compiler has just gotten better. Additional, I find it more difficult now to get a speed gain by reworking the code with intrinsics - again the compilers have gotten better, and are doing that on their own.
At this point I find that no matter what, I need to really put in additional effort to eck out additional performance gains, but the Intel compiler can be very helpful in that area: now I tend to use the Intel specific pragmas to help guide the compiler to make more informed decisions. I find this easier than figuring out the intrinsics myself, and it helps keep the code portable. They also have profile guided optimization, but I have not tried using that.
In my personal opinion, the most notable features of the Intel compiler are the advanced vectorizer and inter-procedural optimization (IPO). If these optimization features can solve a major problem in the code, this can result in significant performance improvements.
It is true that the VS compiler has improved.
But for matrix like calculation heavy and loopy code, the Intel C++ Compiler should still do better.
Maybe add some more specific optimization like: /QxSSE3 or /QxSSE4.1; or /arch:SSE3 or /arch:SSE4.1 and try with or without /Qparallel (parallelizing the outer loop, but vectorizing the inner loop).
I download the Jpeg Viewer - IPP, then I use ICC and VC to build it, notice that the Jpeg Viewer-IPP could be supported by OpenMP.
I catch a image from world of warcraft( near well in front of south bank at Dalaran, becourse a lot players gathers there for the screen is full of varius ), then encode this image 100 time.
As you see, when the OpenMP turned off, there is no different between ICC and VC; but when OpenMP on, The ICC is better obviously.
My test platform :
VC 8(VS 2005)
intel Duel core E6500(2.93GHz , L2 2M)
window XP sp3
below is my test image(1440*900).
You can use Parallel Amplifier or VTune to find out if there are more hotspots that are still in serial code.