- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*****************************
Bug Description
*****************************
The program below should always print "0". That is indeed the case when compiling without optimizations or only with size optimizations (/O1). However, with speed optimizations enabled (/O2 or /O3) the program prints "1". The compiler attempts to auto-vectorize the inner loop using SSE2 instructions, but the generated code incorrectly sets the variable "flag" to 1.
This does not appear to be related to precision since none of the conditions that can set the variable "flag" to a nonzero value are even remotely close to being true.
Additional observations:
- the bug does not appear when setting the architecture to SSE or older, only with SSE2 and up.
- the bug does not appear if the variable "flag" is an int instead of a short. This may be related to the fact that in the former case the compiler does not generate the "packssdw" instruction.
*****************************
Configuration
*****************************
Compiler Version
"Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.3.300 Build 20120130"
Operating System:
Windows 7 32-bit
CPU:
This was found on an Intel Core i7-870 CPU at 2.93 GHz. It was also reproduced on a Intel Core i7-2700 at 3.40 GHz.
How To Reproduce:
To produce the bug:
>> icl /arch:SSE2 /O2 auto_vectorizer_bug.c
>> auto_vectorizer_bug.exe
Result: 1
To get the correct result:
>> icl /Od auto_vectorizer_bug.c
>> auto_vectorizer_bug.exe
Result: 0
*****************************
Sample Program (also attached)
*****************************
[cpp]
// auto_vectorizer_bug.c : Demonstrates what appears to be a bug in the Intel compiler's auto-vectorization for SSE
// When compiled with -Od, the program prints "0", which is the correct result.
// When compiled with -O3 or -O2 the program prints "1".
#include <stdio.h>
#include <assert.h>
#define DIM 8
float g_buffer[DIM][DIM];
void init_buffer(float p_buf[DIM][DIM])
{
int i, j;
/* initialize all elements to 0.5 */
for (i = 0; i < DIM; i++)
{
for (j = 0; j < DIM; j++)
{
p_buf = 0.5;
}
}
}
int main(int argc, char** argv[])
{
int i ,j;
short flag;
float x1, x2, x3, x4, x5, x6;
int dim;
flag = 0;
/* initialize all array entries to 0.5 */
init_buffer(g_buffer);
/* make it appear as if the array dimensions are not known
* at compile time */
assert(argc==1);
dim = argc*DIM;
for (i = 1; i < dim; i++)
{
for (j = 0; j < dim; j++)
{
x1 = g_buffer;
x2 = g_buffer[i-1];
/* this condition should never be true */
if ((x1 == 0) || (x2 == 0))
{
flag = 1;
}
else
{
x3 = x1 * x1;
x4 = x1 * x3;
x5 = x2 * x2;
/* this condition should never be true */
if ((x4 * 0.1) > x5)
{
flag = 1;
}
else
{
x6 = x2 * x5;
/* this condition should never be true */
if (x3 < (x6 * 0.1))
{
flag = 1;
}
}
}
}
}
/* The result should always be 0, but with optimizations enabled we get the result 1. */
printf("Result: %d\n", flag);
return 0;
}
[/cpp]
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Roy,
>>...This was found on an Intel Core i7-870 CPU at 2.93 GHz. It was also reproduced on a Intel Core i7-2700 at 3.40 GHz.
That looks odd and I'll do a verification on a computer with Intel Core i7-3840QM and Windows 7 64-bit OS. Thanks for the test-case.
Best regards,
Sergey
PS: So far my main concern that you use Windows 7 32-bit OS and I use Windows 7 64-bit OS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't reproduce such a problem with either the 64-bit 12.1.7.371 nor 13.0.1.119 compilers. I don't have the 32-bit one installed.
The only vectorization is init_buffer(). There's not much room for variations there.
There is no /arch:SSE distinct from SSE2 in recent compilers. Is it treated as /arch:IA32 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The inner loop in the main() function is getting vectorized. The compiler's optimization report explicitly states that (let me know if you'd like me to attach the report). And it's in that loop that the problem occurs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was able to reproduce this error in all the 12.1 icc compilers, But was unable to reproduce it in 13.0 compiler. So this issue has been fixed in 13.0 compiler. You can download it from registrationcenter.intel.com.
Regards,
Sukruth h V
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i have compiled your sample with parallel studio XE 2013 Compiler 13.0.1 20121010 under openSUSE 12.2 linux,
no problems i use -O1, -O2 -03 and allways the result was '0'
greatings
Franz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...I was able to reproduce this error in all the 12.1 icc compilers, But was unable to reproduce it in 13.0 compiler...
That is a good news and thanks for the update.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good news indeed. My company doesn't yet offer the 13.0 compiler, but I'll ask for it.
For others who may be using the 12.1 compiler and experiencing this issue, a simple workaround you can use is to add a "#pragma novector" right before the problematic loop.
Thank you all for your help!
Roy

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page