Array operations fails when Intel Processort Extensions are enabled

onkelhotte · ‎10-08-2007

Hi there,
I have a problem with mathematicaloperations performed at a whole array, when I enable the Intel Processor Extensions (XP SP2, IVF10.0.027 with VS2003).

In my program I change array values not within a do loop but in one line:

real*4 array(100)
!... reading 100 values from file and save them in array
array=array/1000.

When I dont use the Intel Processor Extension as an optimization, everything works fine. But if I do, the value differs in the real*8 "section". The debugger wont work with the optimizations, so I wrote the data into a file.

Code:
array(1)=6500.00
write(19,'(a,f27.16)') 'before: ',array(1)
array=array/1000.
write(19,'(a,f27.16)') 'after: ',array(1)

Textfile:
before: 6500.0000000000000000
after: 6.5000004768371582

I recognized that there was i difference, when I compared another value with array(1) which should be equal, but they werent, although they seemed to be (real*4 of array(1) is 6.5000000 as well as the real*4 value, which I compared).
When I disable the optimizations, the textfile shows
after: 6.5000000000000000

My question is: Is that some sort of compiler bug or are operations like array = array / 1000 bad programming style? (I hope not, I use them a lot)

Thanks in advance,
Markus

TimP · ‎10-08-2007

If, using a 32-bit compiler, you switched from backward-compatible x87 code to SSE code (e.g. by adding an option such as /QxW) , you removed the automatic promotion of intermediate calculation to double precision. If your application depends on double precision, you should write it in e.g.
write(19,'(a,f27.16)') 'before: ',array(1)
write(19,'(a,f27.16)') 'after: ',array(1)*0.001d0
The inequality of similar expressions is quite likely to occur when you compare expressions resulting from implicit promotion to double with those not involving promotion, i.e. there are many such situations in x87 code.

Steven_L_Intel1 · ‎10-08-2007

This is not a bug.

1. You are displaying a REAL(4) value to many more digits than is appropriate for its precision. You get seven correct decimal digits, which is what I'd expect from REAL(4)
2. When you enable the processor extensions, the array operation is vectorized and is done with SSE instructions. These operate in declared precision (REAL(4) here) rather than double or extended precision that you would get for arithmetic operations without the SSE instructions. While this extended precision can give you "better" results, it is also inconsistent, depending on when the compiler chooses to round an intermediate result to declared precision.

There's nothing wrong with the array operation you did. What is wrong is assuming that you will get more precision out of single-precision than is warranted. Perhaps you want to use double-precision instead.

jimdempseyatthecove · ‎10-08-2007

>> This is not a bug

Perhaps, perhaps not.

6500 is expressed exactlyinternally in floating point
1000 is expressed exactlyinternally in floating point
1/1000 is an infinitely repeating (approximate)binary fraction

If the computation performed 6500/1000 then it should be exact
If the computation performed 6500 * (1/1000) then it should be approximate.

The optimization codemight be using the multiplication of the inverse of 1000 as opposed to using the division by 1000.

This is not a bug provided that the rules of expressions are known.

Jim Dempsey

Steven_L_Intel1 · ‎10-08-2007

True - the compiler will generally do division as multiplication by an inverse unless told not to. You could use /fp:precise or /Qprec-div- to disable this optimization.

onkelhotte · ‎10-09-2007

Thanks for your replies.

My description was not good and I investigated my problem yesterday and this morning a little bit more.

The problem is that a comparism is true which should be false. I read certain data from a file. After that I convert them to European SI Units, in my case I have to divide the real*4 array endOfZone (endOfZone=endOfZone/1000.)

Before the calculation starts, I check the data, whether the user has changed them in a way he should not. One comparism is

if (overallLength.le.endOfZone(4)) then
! Error Message

overallLength is real*4. endOfZone(4) has to be less or equal overallLength (like the if clause says). In my case overallLength is 11.50000 and endOfZone(4) is 11.50000,the if clause should be false.

But the if clause is true when I
1) use Intel Processor Extensions On (/QaxK) AND
2) devide endOfZone as an whole array (endOfZone=endOfZone/1000.)

The if clause is false when I
1) dont use /QaxK (performance loss 80%) OR
2) use /fp:precise (performace loss 90%!!!) OR
3) use /Qprec-div- (no performance loss) OR
4) divide endOfZone in a do loop, endOfZone(i)=endOfZone(i)/1000.

This is still strange to me. Why is the if clause true when I divide the whole array? I did not mention in my prior post that I use the 32 bit compiler version.

Markus

Steven_L_Intel1 · ‎10-09-2007

Going by the previous posts, the array element you're comparing against is slightly larger than 11.5 and therefore the comparison is true. Print out the value to more digits or print the hex value (using Z8.8) to see. As we have seen, optimizations can change results slightly.

onkelhotte · ‎10-10-2007

Here is my code:

write(19,'(f19.12)') endOfZone(4)
endOfZone=endOfZone/1000.
write(19,'(f19.12)') endOfZone(4)
write(19,'(z8.8)') endOfZone(4)

data of fort.19 (itsthe same with /Qprec-div- turned on or off)
11500.000000000000
11.500000953674
41380001

But with /Qprec-div- turned on, my if clause is false which it should.

So this is not a compiler bug, but things you have to get used to when you use compiler optimizations?

Markus

TimP · ‎10-10-2007

As you have demonstrated, division of an array by a scalar is greatly accelerated by inverting once, to change the array operation to multiplication. Thus, Intel and gnu compilers, and others, make such an optimization available. With promotion to double precision, you would still get a result faster than with repeated division. Several early machines with support for vector operations supported only this method, so it is a traditional transformation, going back at least 30 years. Only IA64, to my knowledge, supports invert and multiply with IEEE accurate results.
In this case, it seems trivial to write x*0.001 or x*0.001d0, maybe even anint(x*0.001) if that is the intent. Certain major applications have the inversions written into source wherever they have been tested successfully for performance and accuracy, so that options /Qprec-div /Qprec-sqrt should be set.
As I mentioned a few days ago, /fp:precise includes the effect of the switches /assume:protect_parens /Qprec-div /Qprec-sqrt /Qftz- as well as disabling some additional optimizations which have minor numerical effects.

jimdempseyatthecove · ‎10-10-2007

The observation of results not appearing decimally correct also occurs without optimizations as well. This example shows where the discrepancy between decimal math and binary math is different between using xmm floating point consistency whereas it does not appear using FPP floating point. You will find many cases where both are inconsistent.

The inconsistencies are due to truncation/rounding of infinitely repeating fractional values Of particular interest is the fraction 1./10. requires an infinite number of bits in binary to hold the correct value.

When testing for limits you should consider including an epsilon that is at least one significant bit (e.g. the position of the 1 in the 4138001).

Jim Dempsey