- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey guys,
In general, given a Fortran application, should differences between the 32 bit and 64 bit results be expected?
If all calculations are single precision, it seems that differences can arise in even a simple program. For instance, the following program might represent a fairly complex scientific calculation, showing that significant differences might occur when using single precision arithmetic.
Is that right? Why should there be any difference at all?
Thanks!
!-------------------------------------------------
! with real*4 x,y,z,sum
! using (2.0*i*pi/num)
! 32 bit sum = -0.0030936133116484
! 64 bit sum = -0.0020905509591103
!
! using (2.d0*i*pi/num)
! 32 bit sum = -0.0028321230784059
! 64 bit sum = -0.0028321230784059
!
! with real*8 x,y,z,sum
! using (2.0*i*pi/num)
! 32 bit sum = 0.0000000000347261
! 64 bit sum = 0.0000000000254766
!
! using (2.d0*i*pi/num)
! 32 bit sum = 0.0000000000010576
! 64 bit sum = 0.0000000000010576
!-------------------------------------------------
program simpleSum
real, parameter :: pi=3.14159265358979
real*8 :: x,y,z,sum
integer :: i,j,k,num
!
sum = 0.0
num = 1000
do i=1,num
x = cos(2.0*i*pi/num)
do j=1,num
y = sin(2.0*j*pi/num)
do k=1,num
z = cos(2.0*k*pi/num)
sum = sum + x*y*z
enddo
enddo
enddo
!
write(*,'(f25.16)') sum
!
pause
!
end program simpleSum
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
x,y and z are calculated to real(4) precision in the example you provided. The difference between 32-bit and 64-bit could be how the real(8) register values are being retained for the x*y*z. I am assuming the register calculations are done using 64-bit.
You could force pi and 2.0 to be real(8), which would remove most of the differences.
program simpleSum
real*8, parameter :: pi = 3.14159265358979_8
real*8, parameter :: two = 2
real*8 :: x,y,z,sum
integer :: i,j,k,num
!
sum = 0.0
num = 1000
do i=1,num
x = cos(two*i*pi/num)
do j=1,num
y = sin(two*j*pi/num)
do k=1,num
z = cos(two*k*pi/num)
sum = sum + x*y*z
enddo
enddo
enddo
The correct answer is zero. You could try the attached example with 32 and 64, which gives examples of different precision an d the resulting round off error. I did not include real*4 sum as an option.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In 32-bit mode, depending on compiler options, the compiler may choose to use either the FPU (x87) instruction set or the SSE/AVX instruction set. When using the FPU instructions, the numbers fetched from memory, being REAL*4, REAL*8 (and REAL*10, REAL*16), are internally promoted to REAL*10. The x*y*z use the promoted numbers as to the two*k*pi/num. Storing back into memory demotes the result to the precision specified. When using the SSE/AVX, no promotion is performed, therefore there will be precision differences.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default for many years has been to use SSE and not x87 instructions for floating point. John does have a good point in that your constant for PI was limited to single precision. My guess is that the math libraries might give slightly different results for 32 and 64 bit instruction sets, but it would take deeper analysis to see where the difference lay.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The short answer is, yes, but differences can be minimized. As Steve says, math functions can give different results. There is no IEEE standard for the results of math functions, other than square root. Results can sometimes differ between different processor types, different optimization levels or different compiler versions, not just between different architectures.
Compiling with default optimization, e.g., ifort ssum.f90, the loop at line 12 will be vectorized. By default, the vectorized math functions in libsvml are slightly less accurate than the default scalar functions in libm. (by about a couple of bits, out of 53 bits). This isn't much, but this code involves large cancellations, which can greatly magnify small differences, as well as computing sines and cosines of large angles, which are subject to additional rounding when they are reduced to the principal range. These different implementations of math functions can lead to significant variations in results, even on the same architecture, whether or not PI is declared as double precision. For example, on Intel 64, compare
ifort ssum.f90
ssum
0.0000000000254766
With
ifort /fp:precise ss.f90 (or /Qfast-transcendentals-)
ssum
0.0000000000353784
or
ifort /Qimf-arch-consistency:true ss.f90 (uses math functions you would get on an older processor)
ssum
0.0000000000283784
There is no guarantee of getting the same results on different architectures; however, the accepted way to make results as consistent as possible is to build with /fp:source, (for Fortran, this is equivalent to /fp:precise). See the article attached at http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/. For your program, just /Qfast-transcendentals- may be sufficient. (For me, it gave the same results on Intel64 and IA-32, but that will depend on the particular processors that you use). The switch /Qimf-arch-consistency:true may be used in addition to get consistent results between different processor types on the same architecture, but not between architectures, (because the 32 bit compiler and libraries need to support legacy processors, whereas the 64 bit compiler and libraries do not).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might want to consider the alternative version attached, which changes the precision of SIN and COS for the _16 case.
I also changed the approach for real*16, as the run times increased dramatically. Calculation of SIN and COS to 28 sig figures is much slower.
It is interesting that the round-off error remains similar to the earlier calculation where sin and cos were calculated as real*8, while the accumulator was real*16, as apposed to this new approach where sin, cos and the accumulator are all calculated to real*16.
For this example, I ran for both 64-bit and 32-bit compilers. The error differed only for the case of real*4 calculation of sin and cos, while the *8 and *16 cases were identical. I used default compilation so expect that SSE instructions were used in all cases.
John
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.5.344 Build 20120612
4.280957512580678E-011 _4 4 3.141593 8.32300000000000
3.966462661791904E-011 _8 8 3.14159265358979
11.5850000000000
-4.762877414616551103087790451409393E-0030 _16 16
3.14159265358979323846264338327950 53.5720000000000
Intel(R) Visual Fortran Compiler XE for applications running on IA-32, Version 12.1.5.344 Build 20120612
3.319366564382906E-011 _4 4 3.141593 9.18100000000000
3.966462661791904E-011 _8 8 3.14159265358979
13.5070000000000
-4.762877414616551103087790451409393E-0030 _16 16
3.14159265358979323846264338327950 100.590000000000
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
REAL(16) is not supported by the hardware. i.e. SSE, AVX, AVX2 do not have a REAL(16) format. Instead software functions are called to perform the operations (thus accounting for the dramatic increase in runtimes).
N.B. IA32 with FPU (x87) does support REAL(10), but this is not the same as real REAL(16).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ifort doesn't support real(10), although gfortran does, and icc has long double (no run-time support for long double in the Microsoft dependencies of ICL).
real(10) (with compilers which support it) also doesn't use SSE, AVX, ... and has no vectorization. Scalar math functions might use in effect real(10) internally, if it proves the most efficient way to achieve increased accuracy.
After my experience in the last couple years, I'm suspicious of any application which beats on circular functions with arguments outside +- 2 Pi. Those rubbishy pseudo-random number generators which pick up the garbage bits from transcendentals of large arguments should be replaced by documented random generators, or at least by the standard Fortran random_number intrinsic. For parallel random number generators, there are MKL implementations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By looking at the generated assembler code, 32-bit generates a call to "cosf" while 64-bit calls "cos" with the 32-bit real*4 passed in
Change the code to use: cos(dble(x))
This forces both the 32-bit and 64-bit version to call the same "cos" which makes consistent round off resulting in consistent results.
We can debate which results are "right" later but our old HP calculator got a slightly different answer too.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page