Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29285 Discussions

Optimization issue / fetching values from arrays

mgaro
Beginner
1,373 Views
Hi there!
In my code there's an if-statement which fetches an item from an array like this,
--------------------------------
if (j>i) then
k=i+(j-1)*j/2
else
k=j+(i-1)*i/2
end if
a=sqrt(3.)*(2.*rnd_cache(k)-1.)
--------------------------------
I have ensured that k is always within the bounds of the array, so this cannot pose any problem. In fact, this part of the code works fine if I compile the entire code without optimization. However, if i turn on the -fast flag I get different results in the end. That's why I tried to check the values of a. The elements of rnd_cache are real numbers between 0 and 1. So I added the following lines immediately after the above lines of code:
--------------------------------
if (a<-sqrt(3.).or.a>sqrt(3.)) then
write(*,*) 'error'
call exit()
end if
--------------------------------
And now, the code suddenly works fine with the -fast flag, even though these additional lines of code should be entirely redundant! Hence, I have the suspicion that something else is going wrong and I don't want to rely on compiled code with such a strange behavior. Do you have any ideas what could go wrong here?
Thanks a lot!
mgaro
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
1,373 Views
>>if i turn on the -fast flag I get different results in the end.

Are these different results within the margin of error for the calculation?

IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.

Jim Dempsey

View solution in original post

0 Kudos
11 Replies
Anonymous66
Valued Contributor I
1,373 Views

This sounds like an optimization bug. Could you attach a complete version of your program or a smaller program that reproduces the problem?

Thanks,
Annalee

0 Kudos
TimP
Honored Contributor III
1,373 Views
I was just working on a somewhat similar situation, where icc 12.x became unreliable in optimization of if(). In my experiments for C++ and Fortran, max() and min() are working better with both icc and gcc:
k = min(i,j) + (max(i,j) -1) * max(i,j) / 2

but back in earlier icpc versions I submitted a bug report on min which has since been fixed; now a similar bug came up in g++ 4.7 and has now been fixed so it gets correct fully optimized results.
0 Kudos
mgaro
Beginner
1,373 Views
Thanks for the quick replies! I'm afraid the entire program is far too big to post it here. I tried to replace if() by the min(), max() functions but I'm getting the same behavior again - without optimization, it works, with -fast flag, it doesn't. For the time being, I resort to the unoptimized program.
0 Kudos
Anonymous66
Valued Contributor I
1,373 Views
mgaro,

We would like to be able to fix this bug. Would it be possible to send us your code through premier support?

Thanks,
Annalee
0 Kudos
mgaro
Beginner
1,373 Views
If you wish, I can send the code to you. Before, however, I'd like to make sure that no other bug in my code is responsible for the odd behavior. I heard from colleagues that this strong numerical dependence on the optimization may hint at a yet unspotted bug somewhere else which just by chance does not pop up in the unoptimized program but may appear there as well for different parameter sets. I will let you know in time. Best, mgaro
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,374 Views
>>if i turn on the -fast flag I get different results in the end.

Are these different results within the margin of error for the calculation?

IOW in the unoptimized compiled version the compiler would advance the loop control variables (I assume j and i) in the order as written. Whereas in the optimized version, the compiler may observe the if(j>i) and reorder the code such that all (j>i) is performed together followed by all .NOT.(j>i) performed together. Then if in your codea=sqrt(3.)*(2.*rnd_cache(k)-1.) is accumulated, you would expect to see a round off error accumulation different from that of the unoptimized version.

Jim Dempsey
0 Kudos
mgaro
Beginner
1,373 Views
Hi again. After extensively browsing the parameter space with and without optimization, it is most likely that the wrong behavior of the results is caused by a bug in the program which I am still trying to find. It is very weird though that the unoptimized version seems to be more robust against this bug since the (supposedly) wrong results (which are clearly not within the error margins) occur more often in the optimized version. I wonder then how the program can yield sensible results in the unoptimized version.
Thanks for all your help! I will tell you what's going wrong when I found it.
0 Kudos
TimP
Honored Contributor III
1,373 Views
Optimization, particularly interprocedural, often exposes additonal opportunities for storing data out of bounds, or uninitialized data, to break the program.
0 Kudos
mgaro
Beginner
1,373 Views
>>Are these different results within the margin of error for the calculation?

It seems indeed to be something along these lines. The round off errors accumulate and this rather tiny accumulation is amplified by a threshold query somewhere else in the program. I found that this deviating result actually is a possible outcome of the theory I'm simulating, even though it seemed not sensible at all to me and only occurs for certain parameter sets. So, I consider this 'problem' as solved. I guess, I should read a bit more about optimization. Many thanks to you all!

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,373 Views
>>The round off errors accumulate and this rather tiny accumulation is amplified by a threshold query somewhere else in the program

If accuracy is more important than performance then sort the values to be added in ascending order, then accumulate from smaller to larger. This strategy works well when the magnitudes of the numbers vary widely. In lieu of sort you could use two (or more) accumulators, one for numbers less than a threshold and the other(s) for larger than threashold(s). Then post loop, accumulate the accumulators.

There are other techniques you can employ when your numbers have a large number of significant digits (bits) of precision and your algorithm is diddling with the least significant digits (bits).

Jim Dempsey
0 Kudos
TimP
Honored Contributor III
1,373 Views
For example, Kahan's algorithm for extra precision addition may work as well as sorting in ascending magnitudes. You will require -assume protect_parens or -fp-model source in order for these methods to work. Of course, you would have tried those options already when observing symptoms as described.
0 Kudos
Reply