Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Strange code generation

Unfortunately, I am not able to provide source code because it is a whole project, and simple examples do not reproduce the issue.

The program is like:

[bash]void Fun()
float *x, *a, *b;
for(i = 0; i < n; ++i)
x = a + b;

void main()
The code generated for the loop in Fun() is great:

[bash]00401663  movaps      xmm0,xmmword ptr [esi+eax*4] 
00401667 movaps xmm1,xmmword ptr [esi+eax*4+10h]
0040166C addps xmm0,xmmword ptr [ecx+eax*4]
00401670 addps xmm1,xmmword ptr [ecx+eax*4+10h]
00401675 movaps xmmword ptr [edi+eax*4],xmm0
00401679 movaps xmmword ptr [edi+eax*4+10h],xmm1
0040167E add eax,8
00401681 cmp eax,dword ptr [ebp-78h]
00401684 jb TestPerf1+2E3h (401663h)
Now, if we add another function in main() before calling Fun():

[bash]void main()
where Prepare() is very complex and Fun() is NOT changed, then the code for the same loop in Fun() is now:

[bash]00412BD0  mov         edi,dword ptr [ebp-40h] 
00412BD3 movaps xmm0,xmmword ptr [edi+eax*4]
00412BD7 movaps xmm1,xmmword ptr [edi+eax*4+10h]
00412BDC mov edi,dword ptr [ebp-34h]
00412BDF addps xmm0,xmmword ptr [edi+eax*4]
00412BE3 addps xmm1,xmmword ptr [edi+eax*4+10h]
00412BE8 mov edi,dword ptr [ebp-4Ch]
00412BEB movaps xmmword ptr [edi+eax*4],xmm0
00412BEF mov edi,dword ptr [ebp-4Ch]
00412BF2 movaps xmmword ptr [edi+eax*4+10h],xmm1
00412BF7 add eax,8
00412BFA cmp eax,edx
00412BFC jb TestPerf1+2D0h (412BD0h)[/bash]

This code is executed almost 2 times slower than previous one.

Any idea why the compiler generates such bad code in this case? Unsuccessful attempt for global optimization? I tried removing /Og and /Qipo, no effect.

Visual Studio 2008
Compiling with Intel C++ 11.1.038 [IA-32]... (Intel C++ Environment)

/c /O3 /Og /Qipo /I "C:\\QFL\\external" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /GF /EHsc /MT /GS- /Gy /fp:fast /Fo"Release/" /W4 /nologo /Wp64 /Zi /Qfp-speculationsafe /QxSSE4.1

/OUT:"C:\\BUILD\\prj\\vc2008\\Release\\vc2008.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\\vc2008.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /TLBID:1 /DEBUG /PDB:"C:\\BUILD\\prj\\vc2008\\Release\\vc2008.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /OPT:NOWIN98 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\\BUILD\\prj\\vc2008\\Release\\vc2008.lib" /MACHINE:X86

Thank you.

0 Kudos
5 Replies
Well, there are a lot of possibilities, but it's hard to say without seeing a little more detail. Are the parameters to Fun affected by the call to Prepare? Could either Fun or Prepare be inlined (this can still happen without -ipo if their definitions are in the same file)? Are any of these parameters actually global vars?

You could try isolating Fun to a separate file to be sure, then the position of the call should be irrelevant to the code generated for the function.

Black Belt
I didn't see the /Qansi-alias option in your list. Without that option, the compiler assumes you may violate the standard on typed aliasing. For optimization of a function which takes pointers to more than one object of similar data type, optimization can occur only by successful inter-procedural optimization, or by use of restrict qualifier. The latter requires the /Qstd=c99 or /Qrestrict options, which have corresponding Visual Studio "language" settings.
Black Belt
It looks to me as if something is different in fun between float *x, *a, *b; and the for loop.
The symptoms are the pointers are not held in registers (new copies are fetched on each indirection).

float *x, *a, *b;
volatile float *x, *a, *b;

Is the code in the elipsis (your ...) different?

Another oddity is:

mov edi,dword ptr [ebp-4Ch]

is executed twice (between uses) which indicates either optimizations if off or indicates pointers are now volatile.

Jim Dempsey
I think I resolved the issue.

Jim, nothing is changed in Fun(), it is one and the same in both cases.

Originally Fun() and Prepare() were defined in one source file. Each function uses local variables, no shared globals. I separated them in two files and marked each as "noinline". No effect - calling Prepare() always "destoyed" the loop in Fun(), regardless of /Og and /Qipo options.

Then, following tim18 I simply compiled with /Qansi-alias and for my surprise the issue was resolved (the program really has no aliasing anywhere). But I still don't understand the logic of the compiler...

NB: both functions perform heavy floating point math with "float" type. If I change the type in Prepare() to "double", but keep "float" in Fun(), then the issue does not appear, pointers are in registers...

Thank you very much for the suggestions :)

Black Belt
Even if the compiler were told there were alias issues, the following code misses an optimization opportunity:

[bash]00412BE8 mov edi,dword ptr [ebp-4Ch] 00412BEB movaps xmmword ptr [edi+eax*4],xmm0 00412BEF mov edi,dword ptr [ebp-4Ch] <******* edi reloaded 00412BF2 movaps xmmword ptr [edi+eax*4+10h],xmm1 [/bash]

I suppose that one could argue that the array store (movapsxmmword ptr [edi+eax*4],xmm0) could potentially have overwritten the pointer to the data and thus necessitate re-fetching the pointer, however then you would have to also assume the store of the dual double in xmm0 could also have overwritten the pointer on the 1st of the dual doubles stored, and thus require storing each double individually with reload of pointer between each store.

Yourcode illustrates an inefficency in the optimization.