Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Possible compiler bug: pessimization due to failed aliasing optimization

dark_shikari
Beginner
513 Views
Simple test case:

static __attribute__((noinline)) void copy_column8( uint8_t *dst, uint8_t *src )
{
for( i = -4; i < 4; i++ )
dst[i*32] = src[i*32];
}

Compiled on x86_64 Linux for Core 2 with ICC.

ICC attempts to do an alias optimization on this code, figuring that if the pointers don't alias, it can generate a faster code branch for that case. It generates the following assembly:

4111b0: lea rdx,[rsi-0x80]
4111b4: lea rax,[rdi-0x80]
4111b8: cmp rax,rdx
4111bb: jbe 4111cc
4111bd: mov rcx,rax
4111c0: sub rcx,rdx
4111c3: cmp rcx,0x100
4111ca: ja 4111dd
4111cc: cmp rdx,rax
4111cf: jbe 41121d
4111d1: sub rdx,rax
4111d4: cmp rdx,0x100
4111db: jbe 41121d
4111dd: movzx eax,[rsi-0x80]
4111e1: mov [rdi-0x80],al
4111e4: movzx edx,[rsi-0x60]
4111e8: mov [rdi-0x60],dl
4111eb: movzx ecx,[rsi-0x40]
4111ef: mov [rdi-0x40],cl
4111f2: movzx r8d,[rsi-0x20]
4111f7: mov [rdi-0x20],r8b
4111fb: movzx r9d,[rsi]
4111ff: mov [rdi],r9b
411202: movzx r10d,[rsi+0x20]
411207: mov [rdi+0x20],r10b
41120b: movzx r11d,[rsi+0x40]
411210: mov [rdi+0x40],r11b
411214: movzx esi,[rsi+0x60]
411218: mov [rdi+0x60],sil
41121c: ret
41121d: movzx eax,[rsi-0x80]
411221: mov [rdi-0x80],al
411224: movzx edx,[rsi-0x60]
411228: mov [rdi-0x60],dl
41122b: movzx ecx,[rsi-0x40]
41122f: mov [rdi-0x40],cl
411232: movzx r8d,[rsi-0x20]
411237: mov [rdi-0x20],r8b
41123b: movzx r9d,[rsi]
41123f: mov [rdi],r9b
411242: movzx r10d,[rsi+0x20]
411247: mov [rdi+0x20],r10b
41124b: movzx r11d,[rsi+0x40]
411250: mov [rdi+0x40],r11b
411254: movzx esi,[rsi+0x60]
411258: mov [rdi+0x60],sil
41125c: ret

It checks to see if the pointers alias (don't you just love C's rules about uint8_t?) and branches accordingly, but the generated code in both branches is exactly identical--oops. The compiler should have terminated this attempt at optimization as soon as it realized both branches of the function were exactly the same. This clearly results in significantly larger and slower code than the naive equivalent produced by GCC:

41b0d0: movzx eax,[rsi-0x80]
41b0d4: mov [rdi-0x80],al
41b0d7: movzx r11d,[rsi-0x60]
41b0dc: mov [rdi-0x60],r11b
41b0e0: movzx r10d,[rsi-0x40]
41b0e5: mov [rdi-0x40],r10b
41b0e9: movzx r9d,[rsi-0x20]
41b0ee: mov [rdi-0x20],r9b
41b0f2: movzx r8d,[rsi]
41b0f6: mov [rdi],r8b
41b0f9: movzx ecx,[rsi+0x20]
41b0fd: mov [rdi+0x20],cl
41b100: movzx edx,[rsi+0x40]
41b104: mov [rdi+0x40],dl
41b107: movzx eax,[rsi+0x60]
41b10b: mov [rdi+0x60],al
41b10e: ret

I'm going to assume the compiler was simply never programmed to check to see if the aliasing-branching optimization actually did anything before deciding to use it. Either way though, it's clearly unintended behavior.
0 Kudos
4 Replies
TimP
Honored Contributor III
513 Views
I'm guessing you didn't set equivalent options for gcc (-fstrict-aliasing is default) and icc (-ansi-alias is not the default). If your source code complies with C or C++ standard, you would normally set -ansi-alias. You could set it in the icc.cfg and icpc.cfg files. I suppose that default compatibility with Microsoft is considered more important here than compatibility with gcc.
As to the generation of multiple identical, or virtually identical, code branches in other situations, I consider it a significant issue, but I've been told it's not worth the effort to correct it.
0 Kudos
jimdempseyatthecove
Honored Contributor III
513 Views

oops, post in error
0 Kudos
dark_shikari
Beginner
513 Views
Quoting - tim18
I'm guessing you didn't set equivalent options for gcc (-fstrict-aliasing is default) and icc (-ansi-alias is not the default). If your source code complies with C or C++ standard, you would normally set -ansi-alias. You could set it in the icc.cfg and icpc.cfg files. I suppose that default compatibility with Microsoft is considered more important here than compatibility with gcc.
As to the generation of multiple identical, or virtually identical, code branches in other situations, I consider it a significant issue, but I've been told it's not worth the effort to correct it.
No, -O3 is used with GCC, which implies -fstrict-aliasing. It's just that all char* pointers are assumed to be able to alias each other under ANSI C, so the compiler has to make that assumption.
0 Kudos
TimP
Honored Contributor III
513 Views
Quoting - dark_shikari
No, -O3 is used with GCC, which implies -fstrict-aliasing. It's just that all char* pointers are assumed to be able to alias each other under ANSI C, so the compiler has to make that assumption.
Sorry, you're right. restrict keyword should be used for optimization if pointers of like type point to non-overlapping regions. Either -std=c99 or -restrict would be required.
0 Kudos
Reply