Possible optimization problem

techcowilf · ‎04-08-2011

I've got a very simple piece of code that gives very different answers depending on the -O level I use in icc. The code is this:

#include
#include

int main(int argc, char **argv)
{
int i;
float val=0.f;

for(i = 0; i < 100000000; i++)
val += 1.f+sqrtf((float)i/100.f);

printf("val %f\\n", val);
return 0;
}

The difference occurs between icc -O1 and icc -O2; -O2 is wrong. Have I missed something in this example that is obvious to someone else? If not then there a problem with the compiler. As an aside, gcc handle this OK. I have the same wrong-answer problem if I replace sqrtf with lrintf (which is actually what I'm interested in using).

I'm using the compiler which came with composerxe-2011.1.107.

icc -v gives Version 12.0.0.

Thanks,

techcowilf · ‎04-08-2011

This may actually be come kind of overflow problem which is handled differently by the various optimization levels. A reforumlation of the example gives more consistent results with only small differences:

#include
#include

#define NUM 100000

int main(int argc, char **argv)
{
int i, j;
float val[NUM];

memset(val, 0, NUM*sizeof(float));

for(j = 0; j < 10000; j++)
for(i = 0; i < NUM; i++)
val += sqrtf((float)i/1000.f);

for(i = 0; i < NUM; i++)
printf("val[%d] = %f\n", i, val);
return 0;
}

TimP · ‎04-08-2011

In the first case, you are asking for a sum reduction, and inviting the compiler to attempt to produce the result at compile time. -O1 would appear to instruct the compiler not to attempt auto-vectorization, which, if it were successful, might produce slightly more accurate results, which you might consider "wrong."
In the second case, you are inviting the compiler to figure out that your outer loop does the same thing on the final pass, so it can be reduced to a single inner loop, but you avoid questions of accuracy depending on the implementation (as long as you use SSE code throughout).
You might be better off to investigate one issue at a time, whether the issues of interest might be the compiler short-cutting dead code, the accuracy of float data type operations, or something else.

Brandon_H_Intel · ‎04-08-2011

I would definitely recommend using -fp-model precise for this case.

TimP · ‎04-08-2011

-fp-model precise (or source) would instruct the compiler not to vectorize the sum reduction, as you found with -O1. This would avoid any increase in accuracy associated with batching the sums, rather than summing in sequential order.