- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have identified what we consider to be a compiler bug. Verified with the latest version of the compiler (2013 SP1) just now.
A simple code sample to illustrate the issue can be found at this address:
https://gist.github.com/lemire/6642148
and I also include it below.
It suffices to compile and run the code as directed. A print-out will then show the result of the computation vs. the expected result. The function that triggers the issue has only 7 (simple) lines. It appears that the keyword "restrict" is needed for the bug to appear.
We checked the assembly generated by the compiler and it makes no sense to us.
Note that the same code sample was tested successfully with several compilers including gcc.
Context: we identified this bug while working on a fast integer compression library (https://github.com/lemire/FastPFor). After getting the library to pass all tests with clang, gcc, VS2012... the intel compiler gave us grief. The code sample is the simplest case we could come up with to trigger the bug. To get around the bug, we simply manually unrolled the loop. This is obviously not very desirable in general.
[cpp]
// compile with:
// icc -std=c99 -O2 iccbug.c -o iccbug
// then run iccbug
// Tested in Linux Ubuntu 12.10 (Intel Core i7)
#include <stdint.h>
#include <stdio.h>
// expect: out[0] = in[0] + in[1] + in[2] + in[3];
// out[1] = in[4] + in[5] + in[6] + in[7];
__attribute__((noinline))
void broken_with_O2(int * restrict in, int * out) {
for(int outer = 0; outer < 2; outer++) {
*out = *in++;
for (int inner = 1; inner < 4; inner++) {
*out += *in++;
}
++out;
}
}
int main() {
int in[8] = {1,1,1,1,1,1,1,1};
int out[2];
broken_with_O2(&in[0],&out[0]);
printf(" got = %d %d\n",out[0], out[1]);
printf(" expected = %d %d\n", 4,4);
}
[/cpp]
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is simply to let you know that a support for restrict keyword needs to enabled in the command line. I'm not sure that usage of c99 option is enough ( at least I always use it ).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Adding the "-restrict" flag changes the output but it is still buggy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The spelling __restrict is accepted without the -restrict compile option.
I would prefer to see a sum result build in a local variable which is copied to *out after the loop is complete. For diagnosis purposes, if I get a chance, I'll compare such a version, including comparing the opt-report. Maybe the compiler has attempted to unroll the loop completely.
The new compiler release is much more aggressive with optimizations based on restrict pointers, so I agree that it appears a bug has been exposed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"I would prefer to see a sum result build in a local variable which is copied to *out after the loop is complete. "
There might be several ways to implement this, but I just ran a quick check and the bug remains if I write a local array, and then copy the local array to the out pointer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP: I think you are correct that the bug depends on the direct usage of "*out" within the loop. For example, this works:
[cpp]
void works_with_O2(uint32_t * restrict in, uint32_t * restrict out) {
for(uint32_t outer = 0; outer < 2; outer++) {
uint32_t temp = *in++;
for (int inner = 1; inner < 4; inner++) {
temp += *in++;
}
*out++ = temp;
}
}
[/cpp]
We had this as part of the original test code, but took it out to make it simpler. Additionally, the bug only occurs if there is both an inner and an outer loop.
Your guess about unrolling being part of the problem likely correct, but the assembly produced is nonsense rather than just suffering from a off-by-one type error:
[cpp]
/*
0000000000000000 <broken_with_O2>:
0: 8b 07 mov eax,DWORD PTR [rdi]
2: 8b 57 10 mov edx,DWORD PTR [rdi+0x10]
5: 03 c2 add eax,edx
7: 03 c2 add eax,edx
9: 8b 4f 20 mov ecx,DWORD PTR [rdi+0x20]
c: 03 c2 add eax,edx
e: 03 d1 add edx,ecx
10: 03 d1 add edx,ecx
12: 03 ca add ecx,edx
14: 89 06 mov DWORD PTR [rsi],eax
16: 89 4e 04 mov DWORD PTR [rsi+0x4],ecx
19: c3 ret
1a: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
*/
[/cpp]
At least when compiling with icc -std=c99 I find the same bug occurs both with and without the "-restrict" option added to the command line. The bug does disappear though if the restrict/__restrict/__restrict__ is removed from the test code.
Sergey: Could you offer a definitive statement on whether '-restrict' affects the interpretation of 'restrict' when used with "-std=c99". I've yet to find anything definitive. It's part of the C99 standard, so I'd consider this a bug if -std=c99 does not imply '-restrict'.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page