- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've got a very simple piece of code that gives very different answers depending on the -O level I use in icc. The code is this:
#include
#include
int main(int argc, char **argv)
{
int i;
float val=0.f;
for(i = 0; i < 100000000; i++)
val += 1.f+sqrtf((float)i/100.f);
printf("val %f\\n", val);
return 0;
}
The difference occurs between icc -O1 and icc -O2; -O2 is wrong. Have I missed something in this example that is obvious to someone else? If not then there a problem with the compiler. As an aside, gcc handle this OK. I have the same wrong-answer problem if I replace sqrtf with lrintf (which is actually what I'm interested in using).
I'm using the compiler which came with composerxe-2011.1.107.
icc -v gives Version 12.0.0.
Thanks,
#include
#include
int main(int argc, char **argv)
{
int i;
float val=0.f;
for(i = 0; i < 100000000; i++)
val += 1.f+sqrtf((float)i/100.f);
printf("val %f\\n", val);
return 0;
}
The difference occurs between icc -O1 and icc -O2; -O2 is wrong. Have I missed something in this example that is obvious to someone else? If not then there a problem with the compiler. As an aside, gcc handle this OK. I have the same wrong-answer problem if I replace sqrtf with lrintf (which is actually what I'm interested in using).
I'm using the compiler which came with composerxe-2011.1.107.
icc -v gives Version 12.0.0.
Thanks,
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may actually be come kind of overflow problem which is handled differently by the various optimization levels. A reforumlation of the example gives more consistent results with only small differences:
#include
#include
#define NUM 100000
int main(int argc, char **argv)
{
int i, j;
float val[NUM];
memset(val, 0, NUM*sizeof(float));
for(j = 0; j < 10000; j++)
for(i = 0; i < NUM; i++)
val += sqrtf((float)i/1000.f);
for(i = 0; i < NUM; i++)
printf("val[%d] = %f\n", i, val);
return 0;
}
#include
#include
#define NUM 100000
int main(int argc, char **argv)
{
int i, j;
float val[NUM];
memset(val, 0, NUM*sizeof(float));
for(j = 0; j < 10000; j++)
for(i = 0; i < NUM; i++)
val += sqrtf((float)i/1000.f);
for(i = 0; i < NUM; i++)
printf("val[%d] = %f\n", i, val);
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the first case, you are asking for a sum reduction, and inviting the compiler to attempt to produce the result at compile time. -O1 would appear to instruct the compiler not to attempt auto-vectorization, which, if it were successful, might produce slightly more accurate results, which you might consider "wrong."
In the second case, you are inviting the compiler to figure out that your outer loop does the same thing on the final pass, so it can be reduced to a single inner loop, but you avoid questions of accuracy depending on the implementation (as long as you use SSE code throughout).
You might be better off to investigate one issue at a time, whether the issues of interest might be the compiler short-cutting dead code, the accuracy of float data type operations, or something else.
In the second case, you are inviting the compiler to figure out that your outer loop does the same thing on the final pass, so it can be reduced to a single inner loop, but you avoid questions of accuracy depending on the implementation (as long as you use SSE code throughout).
You might be better off to investigate one issue at a time, whether the issues of interest might be the compiler short-cutting dead code, the accuracy of float data type operations, or something else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would definitely recommend using -fp-model precise for this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-fp-model precise (or source) would instruct the compiler not to vectorize the sum reduction, as you found with -O1. This would avoid any increase in accuracy associated with batching the sums, rather than summing in sequential order.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page