Solved: Optimization issue / fetching values from arrays

mgaro · ‎01-19-2012

Hi there!

In my code there's an if-statement which fetches an item from an array like this,

--------------------------------

if (j>i) then

k=i+(j-1)*j/2

else

k=j+(i-1)*i/2

end if

a=sqrt(3.)*(2.*rnd_cache(k)-1.)

--------------------------------

I have ensured that k is always within the bounds of the array, so this cannot pose any problem. In fact, this part of the code works fine if I compile the entire code without optimization. However, if i turn on the -fast flag I get different results in the end. That's why I tried to check the values of a. The elements of rnd_cache are real numbers between 0 and 1. So I added the following lines immediately after the above lines of code:

--------------------------------

if (a<-sqrt(3.).or.a>sqrt(3.)) then

write(*,*) 'error'

call exit()

end if

--------------------------------

And now, the code suddenly works fine with the -fast flag, even though these additional lines of code should be entirely redundant! Hence, I have the suspicion that something else is going wrong and I don't want to rely on compiled code with such a strange behavior. Do you have any ideas what could go wrong here?

Thanks a lot!

mgaro

jimdempseyatthecove · ‎01-25-2012

>>if i turn on the -fast flag I get different results in the end.

Are these different results within the margin of error for the calculation?

IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.

Jim Dempsey

View solution in original post

Anonymous66 · ‎01-19-2012

This sounds like an optimization bug. Could you attach a complete version of your program or a smaller program that reproduces the problem?

Thanks,
Annalee

TimP · ‎01-19-2012

I was just working on a somewhat similar situation, where icc 12.x became unreliable in optimization of if(). In my experiments for C++ and Fortran, max() and min() are working better with both icc and gcc:
k = min(i,j) + (max(i,j) -1) * max(i,j) / 2

but back in earlier icpc versions I submitted a bug report on min which has since been fixed; now a similar bug came up in g++ 4.7 and has now been fixed so it gets correct fully optimized results.

mgaro · ‎01-23-2012

Thanks for the quick replies! I'm afraid the entire program is far too big to post it here. I tried to replace if() by the min(), max() functions but I'm getting the same behavior again - without optimization, it works, with -fast flag, it doesn't. For the time being, I resort to the unoptimized program.

Anonymous66 · ‎01-23-2012

mgaro,

We would like to be able to fix this bug. Would it be possible to send us your code through premier support?

Thanks,
Annalee

mgaro · ‎01-24-2012

If you wish, I can send the code to you. Before, however, I'd like to make sure that no other bug in my code is responsible for the odd behavior. I heard from colleagues that this strong numerical dependence on the optimization may hint at a yet unspotted bug somewhere else which just by chance does not pop up in the unoptimized program but may appear there as well for different parameter sets. I will let you know in time. Best, mgaro

jimdempseyatthecove · ‎01-25-2012

>>if i turn on the -fast flag I get different results in the end.

Are these different results within the margin of error for the calculation?

IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.

Jim Dempsey

mgaro · ‎01-26-2012

Hi again. After extensively browsing the parameter space with and without optimization, it is most likely that the wrong behavior of the results is caused by a bug in the program which I am still trying to find. It is very weird though that the unoptimized version seems to be more robust against this bug since the (supposedly) wrong results (which are clearly not within the error margins) occur more often in the optimized version. I wonder then how the program can yield sensible results in the unoptimized version.

Thanks for all your help! I will tell you what's going wrong when I found it.

TimP · ‎01-26-2012

Optimization, particularly interprocedural, often exposes additonal opportunities for storing data out of bounds, or uninitialized data, to break the program.

mgaro · ‎02-03-2012

>>Are these different results within the margin of error for the calculation?

It seems indeed to be something along these lines. The round off errors accumulate and this rather tiny accumulation is amplified by a threshold query somewhere else in the program. I found that this deviating result actually is a possible outcome of the theory I'm simulating, even though it seemed not sensible at all to me and only occurs for certain parameter sets. So, I consider this 'problem' as solved. I guess, I should read a bit more about optimization. Many thanks to you all!

jimdempseyatthecove · ‎02-03-2012

>>The round off errors accumulate and this rather tiny accumulation is amplified by a threshold query somewhere else in the program

If accuracy is more important than performance then sort the values to be added in ascending order, then accumulate from smaller to larger. This strategy works well when the magnitudes of the numbers vary widely. In lieu of sort you could use two (or more) accumulators, one for numbers less than a threshold and the other(s) for larger than threashold(s). Then post loop, accumulate the accumulators.

There are other techniques you can employ when your numbers have a large number of significant digits (bits) of precision and your algorithm is diddling with the least significant digits (bits).

Jim Dempsey

TimP · ‎02-04-2012

For example, Kahan's algorithm for extra precision addition may work as well as sorting in ascending magnitudes. You will require -assume protect_parens or -fp-model source in order for these methods to work. Of course, you would have tried those options already when observing symptoms as described.