Hello; we are currently developing a rendering engine using IPP ray tracing primitives, and Intel C++ compiler v 11.
Allour development machines have intel processors. (Core i7 920 with some Core 2 Duos of various versions).
Normally Intel compiler gives around a 10% speed boost compared to MS VS compiler on our development machines.
Last week I decided to look at the performance of the program at an AMD processor, and to my surprise, Intel compiler is more than 50% slower compared to MS VS compiler on a phenom x4.
Is that behaviour expected, I mean a small performance degradation should be OK(due to processor specific intruction reordering etc.), but what takes 60 seconds to complete with MS compiler, takes around 130 seconds with Intel compiler. That's verystrange. (I'm not using any vectorization, parallelization or any other processor specific flags, just /O2 general optimization).
Ipp on the other hand, looks to be running fine and top speed on AMD processors.
I returned back to MSVC compiler for shipping builds because of that reasons, and gave up 10% speedup on Intel processors. (loosing 50% is much worse than gaining 10%)...
This behavior is not expected and may be a bug. Can you tell me what options you're using (both for MS and ICL)?
Ideally if you can provide a test case (you can post it either in a private thread or through premier.intel.com) we can figure out what's going on and try to address it. While there are some differences as you point out, e.g. we don't produce 3DNow instructions or SSE4a instructions, but I wouldn't think that would account for a 50% degradation compared to MS.
Thanks for the reply.
Afterasking at the forum, I tried to measure some other performance critical areas of the software, and they seem to be OK. (Same executable). I'm not using any SSE or 3DNow specific instructions(Neither in code nor at the compiler flags), just generic speed optimized x86...
Isolating the code and providing a test case would be kinda hard as you may guess, because the engine itself is very complex, but I'll try.
Could you please try to compile your code on the AMD machine itself. And then, report back you performance results. It also depends on your code design whether it is compute or memory intensive. The possibilities are endless.
Not sure if you're able to get a testcase for this. But if you could post the compile options here, it might help us to provide you with some suggestions to try.
Sorry for the late answer. We are on a tight schedule, and I couldn't investigate this any further. (Just switched back to VC compiler for a quick & easy solution).
I just wanted to know if it's a known behaviour or not. I fear that I can't experiment with various options and profiling on the AMD machine due to lack of time. VC compiler provides "enough" performance on both platforms for now...
Here are my compiler options:
/c /O2 /Og /Ob2 /Oi /Ot /Oy /Qipo /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "COMPILING_IDERENDER" /D "_VC80_UPGRADE=0x0700" /D "_AFXDLL" /D "_MBCS" /GF /EHsc /MD /GS- /fp:fast /Zc:wchar_t- /Yu"StdAfx.h" /Fp"out\Release/iderender.pch" /Fo"out\Release/" /W0 /nologo /Qopenmp /Qparallel /MP
And linker options:
glu32.lib opengl32.lib winmm.lib shlwapi.lib vfw32.lib lib\ltwvc_n.lib htmlhelp.lib /OUT:"out\Release/ideCADRender.exe" /INCREMENTAL:NO /nologo /MANIFEST
/MANIFESTFILE:"out\Release\ideCADRender.exe.intermediate.manifest" /MANIFESTUAC:NO /NODEFAULTLIB:"libc"
/TLBID:1 /MAP:"out\Release/ideRender.map" /MAPINFO:EXPORTS /SUBSYSTEM:WINDOWS
/LARGEADDRESSAWARE /OPT:REF /OPT:ICF /DYNAMICBASE:NO
I don't see anything too strange in there. I'm curious, do you know if you see benefit from /QParallel? It might be worth playing with that to see if it makes a difference. When you get a chance if you could provide a test case, or analyze as Tim suggested, we'd love to get more info so we can address the issue.