Re: sum of large real array: ia32 vs em64t

David2 · ‎04-06-2007

So I have what to me is a mystery:

If I compile the attached code to take the average of a large real*4 array on a ia32 machine, I get an exact answer, but if I compile it on an em64t machine there is a considerable roundoff errror in the sumation!

I can understand the cause of the problem, but why is it different on the two machine?

What is the best way to do this kind of operation? I could put the sum in a double, but what if I am summing a double- that is not really a solution!

David

Steven_L_Intel1 · ‎04-06-2007

Hint - what happens on IA-32 when you compile with -xW? (Assuming Pentium 4 or later.)

The default on IA-32 is to use the X87 floating point registers and instructions, where single precision calculations are carried out in double precision. On Intel 64, the SSE2 instructions are (typically) used and these will use the declared precision.

David2 · ‎04-06-2007

Thanks Steve,

Is there also an option to make the em64t ifort compiler use the X87 floating point registers and instructions?

David

Steven_L_Intel1 · ‎04-06-2007

Wouldn't it be better just to use double precision (real(8)) in the source rather than relying on the compiler doing double precision behind your back? For an application such as this, I'd recommend that.

What switches are you using on EM64T now?

David2 · ‎04-06-2007

I agree, it is much better to be explicit about what precision is used. That was the cause of the confusion in the first place.

What are switches?
If you are refering to the OS, it is the CentOS, kernel 2.6.9-42.0.10.ELsmp

Thanks again for your help!

David

jimdempseyatthecove · ‎04-06-2007

David,

10.0/3 is a repeating binary floating point value which is rounded to the precision used (real*4 here).

In your example, each element in the array "vals" held an imprecise value of 10.0/3.

The fact the the print x produced the exact value was more of by chance than by design.

True, the summation loop likely kept X in the FPU register may have aided you in accidentily producing the correct result. (FPU uses 80 bits internnaly)

Declaring x as real*8 will produce a closer result. And the program optimization will likely produce faster code too. (SSE uses 64 bits internally)

Note, as you accumulate vals(i), you are accumulating inexact values, but these values also fill out the mantissa portion the real*4 number. Therefore, as X doubles in value, the exact sum of the inexact values requires one more bit of mantissa. The 100,000 entries in vals will require approximately 17 additional bits in the mantissa to hold the exact sum of the inexact values. This should fit in the extra bits of the real*8.

Jim Dempsey

Steven_L_Intel1 · ‎04-07-2007

By switches I meant compiler options, such as -xP and -O3. In other words, what is the ifort command line you are using?

Even so, as Jim points out, this program needs double precision to geta meaningful result.

jimdempseyatthecove · ‎04-07-2007

Both results are meaningful... it just depends of what you mean.

The accuracy of the results depend on the nature of the numbers being manipulated and the methods and order of manipulation. Depending on the circumstances both real(4) and real(8) will give exact results. In some situations real(8) will give an exact result when real(4) fails. And other situations were both real(4) and real(8) produce inexact results but in such cases real(8)almost always gives a closer result.

Jim