- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there!
In my code there's an if-statement which fetches an item from an array like this,
--------------------------------
if (j>i) then
k=i+(j-1)*j/2
else
k=j+(i-1)*i/2
end if
a=sqrt(3.)*(2.*rnd_cache(k)-1.)
--------------------------------
I have ensured that k is always within the bounds of the array, so this cannot pose any problem. In fact, this part of the code works fine if I compile the entire code without optimization. However, if i turn on the -fast flag I get different results in the end. That's why I tried to check the values of a. The elements of rnd_cache are real numbers between 0 and 1. So I added the following lines immediately after the above lines of code:
--------------------------------
if (a<-sqrt(3.).or.a>sqrt(3.)) then
write(*,*) 'error'
call exit()
end if
--------------------------------
And now, the code suddenly works fine with the -fast flag, even though these additional lines of code should be entirely redundant! Hence, I have the suspicion that something else is going wrong and I don't want to rely on compiled code with such a strange behavior. Do you have any ideas what could go wrong here?
Thanks a lot!
mgaro
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>if i turn on the -fast flag I get different results in the end.
Are these different results within the margin of error for the calculation?
IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.
Jim Dempsey
Are these different results within the margin of error for the calculation?
IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.
Jim Dempsey
Link Copied
11 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This sounds like an optimization bug. Could you attach a complete version of your program or a smaller program that reproduces the problem?
Thanks,
Annalee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was just working on a somewhat similar situation, where icc 12.x became unreliable in optimization of if(). In my experiments for C++ and Fortran, max() and min() are working better with both icc and gcc:
k = min(i,j) + (max(i,j) -1) * max(i,j) / 2
but back in earlier icpc versions I submitted a bug report on min which has since been fixed; now a similar bug came up in g++ 4.7 and has now been fixed so it gets correct fully optimized results.
k = min(i,j) + (max(i,j) -1) * max(i,j) / 2
but back in earlier icpc versions I submitted a bug report on min which has since been fixed; now a similar bug came up in g++ 4.7 and has now been fixed so it gets correct fully optimized results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the quick replies! I'm afraid the entire program is far too big to post it here. I tried to replace if() by the min(), max() functions but I'm getting the same behavior again - without optimization, it works, with -fast flag, it doesn't. For the time being, I resort to the unoptimized program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mgaro,
We would like to be able to fix this bug. Would it be possible to send us your code through premier support?
Thanks,
Annalee
We would like to be able to fix this bug. Would it be possible to send us your code through premier support?
Thanks,
Annalee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you wish, I can send the code to you. Before, however, I'd like to make sure that no other bug in my code is responsible for the odd behavior. I heard from colleagues that this strong numerical dependence on the optimization may hint at a yet unspotted bug somewhere else which just by chance does not pop up in the unoptimized program but may appear there as well for different parameter sets. I will let you know in time. Best, mgaro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>if i turn on the -fast flag I get different results in the end.
Are these different results within the margin of error for the calculation?
IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.
Jim Dempsey
Are these different results within the margin of error for the calculation?
IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again. After extensively browsing the parameter space with and without optimization, it is most likely that the wrong behavior of the results is caused by a bug in the program which I am still trying to find. It is very weird though that the unoptimized version seems to be more robust against this bug since the (supposedly) wrong results (which are clearly not within the error margins) occur more often in the optimized version. I wonder then how the program can yield sensible results in the unoptimized version.
Thanks for all your help! I will tell you what's going wrong when I found it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Optimization, particularly interprocedural, often exposes additonal opportunities for storing data out of bounds, or uninitialized data, to break the program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Are these different results within the margin of error for the calculation?
It seems indeed to be something along these lines. The round off errors accumulate and this rather tiny accumulation is amplified by a threshold query somewhere else in the program. I found that this deviating result actually is a possible outcome of the theory I'm simulating, even though it seemed not sensible at all to me and only occurs for certain parameter sets. So, I consider this 'problem' as solved. I guess, I should read a bit more about optimization. Many thanks to you all!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>The round off errors accumulate and this rather tiny accumulation is amplified by a threshold query somewhere else in the program
If accuracy is more important than performance then sort the values to be added in ascending order, then accumulate from smaller to larger. This strategy works well when the magnitudes of the numbers vary widely. In lieu of sort you could use two (or more) accumulators, one for numbers less than a threshold and the other(s) for larger than threashold(s). Then post loop, accumulate the accumulators.
There are other techniques you can employ when your numbers have a large number of significant digits (bits) of precision and your algorithm is diddling with the least significant digits (bits).
Jim Dempsey
If accuracy is more important than performance then sort the values to be added in ascending order, then accumulate from smaller to larger. This strategy works well when the magnitudes of the numbers vary widely. In lieu of sort you could use two (or more) accumulators, one for numbers less than a threshold and the other(s) for larger than threashold(s). Then post loop, accumulate the accumulators.
There are other techniques you can employ when your numbers have a large number of significant digits (bits) of precision and your algorithm is diddling with the least significant digits (bits).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For example, Kahan's algorithm for extra precision addition may work as well as sorting in ascending magnitudes. You will require -assume protect_parens or -fp-model source in order for these methods to work. Of course, you would have tried those options already when observing symptoms as described.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page