Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7953 Discussions

ICC 11 (and possibly 10) vectorizer bug (but ICC 9.x works fine)

gordan
Beginner
573 Views
This sort of bugginess has been the bane of my code since ICC 10, and it has found it's way into ICC 11 as well, it seems. ICC 9, however handles this sort of thing just fine, especially when it's told to just do it.

Here's a code example:

$ cat test.cxx
#include

int main ()
{
unsigned int x;
float xx;
float Frequency = 0.1f;
unsigned int LocalDataC = 100;
float LocalDataV[100];
float Amplitude = 5.0f;
float OffsetX = 2.0f;
float OffsetY = 1.0f;

#pragma ivdep
for (x = 0, xx = 0.0f; x < LocalDataC; x++)
LocalDataV -= Amplitude * sinf (Frequency * xx++ + OffsetX) + OffsetY;
}

# with ICC 11.1.064
$ icpc -msse3 -xP -O3 -ansi-alias -fargument-alias -fp-model fast=2 -rcd -align -Zp16 -ipo -fomit-frame-pointer -funroll-loops -fpic -w1 -vec-report3 -c text.cxx
$ icpc -msse3 -xP -O3 -ansi-alias -fargument-alias -fp-model fast=2 -rcd -align -Zp16 -ipo -fomit-frame-pointer -funroll-loops -fpic -w1 -vec-report3 -static text.o -o text
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_icpczjFuVx.o
test.cxx(15): (col. 2) remark: loop was not vectorized: existence of vector dependence.
test.cxx(16): (col. 50) remark: vector dependence: assumed ANTI dependence between xx line 16 and xx line 16.
test.cxx(16): (col. 50) remark: vector dependence: assumed FLOW dependence between xx line 16 and xx line 16.
test.cxx(16): (col. 50) remark: vector dependence: assumed FLOW dependence between xx line 16 and xx line 16.
test.cxx(16): (col. 57) remark: vector dependence: assumed ANTI dependence between xx line 16 and xx line 16.

This is despite the #pragma ivdep
At the very least it should do as it's told.

# with ICC 9.1.053, OTOH, even without #pragma ivdep

$ icpc -msse3 -xP -O3 -ansi-alias -no-alias-args -fp-model fast=2 -rcd -align -Zp16 -ipo -fomit-frame-pointer -funroll-loops -fpic -w1 -vec-report3 -c test.cxx
$ icpc -msse3 -xP -O3 -ansi-alias -fargument-alias -fp-model fast=2 -rcd -align -Zp16 -ipo -fomit-frame-pointer -funroll-loops -fpic -w1 -vec-report3 -static test.o -o test
IPO: performing single-file optimizations
IPO: generating object file /tmp/ipo_icpcXonqxM.o
test.cxx(14) : (col. 2) remark: LOOP WAS VECTORIZED.

I have pointed things like this out back when ICC 10 was just released. It's more than a little disappointing that 2 years and two major versions later ICC 9's vectorizer is still unmatched and unsurpassed. This is particularly disappointing for those of us who put in the effort to write code specifically in a way that should be vecorizable.
0 Kudos
1 Solution
Dale_S_Intel
Employee
573 Views

I have pointed things like this out back when ICC 10 was just released. It's more than a little disappointing that 2 years and two major versions later ICC 9's vectorizer is still unmatched and unsurpassed. This is particularly disappointing for those of us who put in the effort to write code specifically in a way that should be vecorizable.

I understand your frustration. I have submitted this as an issue. Do you happen to know if this was submitted previously?

Meanwhile, I notice that if you change the type of the iteration variable from 'unsigned int' to 'int' it vectorizes OK. I'm not sure why, but that seems to fix this case at least. It also works if you take the post increment out of the one statement in the loop and do it on a separate statement. I can't think of any good reason that should work, but if either of these workarounds are feasible it may help. I'll let you know if I learn any more about it.

Thanks!
Dale

View solution in original post

0 Kudos
6 Replies
Dale_S_Intel
Employee
574 Views

I have pointed things like this out back when ICC 10 was just released. It's more than a little disappointing that 2 years and two major versions later ICC 9's vectorizer is still unmatched and unsurpassed. This is particularly disappointing for those of us who put in the effort to write code specifically in a way that should be vecorizable.

I understand your frustration. I have submitted this as an issue. Do you happen to know if this was submitted previously?

Meanwhile, I notice that if you change the type of the iteration variable from 'unsigned int' to 'int' it vectorizes OK. I'm not sure why, but that seems to fix this case at least. It also works if you take the post increment out of the one statement in the loop and do it on a separate statement. I can't think of any good reason that should work, but if either of these workarounds are feasible it may help. I'll let you know if I learn any more about it.

Thanks!
Dale
0 Kudos
gordan
Beginner
573 Views

I understand your frustration. I have submitted this as an issue. Do you happen to know if this was submitted previously?

Meanwhile, I notice that if you change the type of the iteration variable from 'unsigned int' to 'int' it vectorizes OK. I'm not sure why, but that seems to fix this case at least. It also works if you take the post increment out of the one statement in the loop and do it on a separate statement. I can't think of any good reason that should work, but if either of these workarounds are feasible it may help. I'll let you know if I learn any more about it.

Thanks!
Dale


Thanks for submitting the bug report. I would have done it myself, but I have never been able to successfully log into the Intel Premier Support site to get to the bug submission form (been trying for years!). I've mentioned this sort of thing here on the forum in the past, but I don't know if anybody ever filed a bug report for it - I suspect not.

I can confirm that changing the iterator x from unsigned int to int gets it to vectorize. Thanks for that hint. It is, however entirely unclear why this would affect the loop vectorization, and even more unclear why the vector dependence beeing reported by the vectorizer is related to the xx variable.

However, your other suggestion, about breaking xx++ to a separate statement, doesn't work for me. Changing the loop to:

for (x = 0, xx = 0.0f; x < LocalDataC; x++)
{
LocalDataV -= Amplitude * sinf (Frequency * xx + OffsetX) + OffsetY;
xx++;
}

still doesn't vectorize for me (with or without #pragma ivdep). So, the issue seems to be specifically related to the iterator being signed.

I hope this gets fixed soon. I'll try changing a few of the other critical loops in my code in this way and see if it helps to get them to vectorize again.
0 Kudos
Dale_S_Intel
Employee
573 Views
Quoting - gordan
. . .
However, your other suggestion, about breaking xx++ to a separate statement, doesn't work for me. Changing the loop to:

for (x = 0, xx = 0.0f; x < LocalDataC; x++)
{
LocalDataV -= Amplitude * sinf (Frequency * xx + OffsetX) + OffsetY;
xx++;
}

still doesn't vectorize for me (with or without #pragma ivdep). So, the issue seems to be specifically related to the iterator being signed.


Curse those curly braces, I've been spending too much time daydreaming in python :-). Moving the postincrement actually doesn't change, I had a typo on that one. Anyway, I'll let you know what I find about the unsigned change.

Thanks!

Dale

0 Kudos
gordan
Beginner
573 Views

Curse those curly braces, I've been spending too much time daydreaming in python :-). Moving the postincrement actually doesn't change, I had a typo on that one. Anyway, I'll let you know what I find about the unsigned change.

Thanks!

Dale


Thanks for the cross-check. :-)

The unsigned issue is potentially a problem in terms of code correctness. It might not make a difference purely in terms of the iterator, but the problem is that there are a lot of cases where the iterator might be being checked for limit against something that is by design an unsigned int (e.g. an array index). A comparison between different types (even if it is just signedness) is at best a bad practice, and would fail coding standards such as MISRA. All of this can, of course, be worked around with explicit cast initialisation and checking, but that all adds overhead and makes things slower. And since it worked correctly in all 9.x versions it's arguably a regression.

Looking forward to hearing when/if the fix might be available.
0 Kudos
Dale_S_Intel
Employee
573 Views
Quoting - gordan

Thanks for the cross-check. :-)

The unsigned issue is potentially a problem in terms of code correctness. It might not make a difference purely in terms of the iterator, but the problem is that there are a lot of cases where the iterator might be being checked for limit against something that is by design an unsigned int (e.g. an array index). A comparison between different types (even if it is just signedness) is at best a bad practice, and would fail coding standards such as MISRA. All of this can, of course, be worked around with explicit cast initialisation and checking, but that all adds overhead and makes things slower. And since it worked correctly in all 9.x versions it's arguably a regression.

Looking forward to hearing when/if the fix might be available.

You're right, it is a regression and coding standards are good things to have. Our developers have identified a fix and it will be in a future version. I'll post here when the fix is released.

Thanks!

Dale
0 Kudos
gordan
Beginner
573 Views

You're right, it is a regression and coding standards are good things to have. Our developers have identified a fix and it will be in a future version. I'll post here when the fix is released.

Thanks!

Dale

Awesome, thanks. :-)
0 Kudos
Reply