Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Multiple machines, multiple outputs

aurora
Beginner
697 Views

Hi,

I'm having different results in floating point operations in my algorithms depending on the machine that is being executed. Same architecture (x64), same binaries, but not same processor. It also occurs with /debug:full, /Qfp-speculation:off,   /fp:strict...

NOT using BLAS, LAPACK, any external library or random data. Only floating point instructions.

Is this normal? Is it possible to produce reproducible results?

Thanks in advance!

Intel C++ Compiler 12.1 update 258

0 Kudos
15 Replies
TimP
Honored Contributor III
697 Views

One would suspect programming faults, such as uninitialized data or array bounds violations.  Use of options such as /fp:source /Qimf-arch-consistency:true /arch:SSE3 would avoid differences among various CPU brands, once you have resolved such faults.

0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
>>...I'm having different results in floating point operations in my algorithms depending on the machine that is >>being executed... >>... >>Is this normal? Yes and it would be nice if you provide an example of your results. >>...Is it possible to produce reproducible results? Yes - If all software and hardware systems are the same. No - If some software or some hardware systems are Not the same ( this is your case ) What are your relative and absolute errors?
0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
>>...One would suspect programming faults, such as uninitialized data or array bounds violations... The user doesn't have these cases ( from my point of view... ) because he gets some results. Anyway, let's see if he will provide some results for review. PS: I really miss the Preview Post feature of the old ISN website...
0 Kudos
aurora
Beginner
697 Views

Is the result of an iterative method that computes hundred of millions times some instructions. i.e.:

------

t=difx1*d*difx[i+m];
dd=(c[i+1]-d)/(t-c[i+1]);
d=c[i+1]*dd;
c=t*dd;

-------

All are double operations. On each iteration I compare the error with a thresold (order 10^-3= and I decide whether to continue or not. So it is not easy to show results (too many output data)

In that case, this could be just an inestable algorithm? How can I do it better? Using more precission?

0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
>>...Using more precission? Yes and a double-precision floating point data type ( long double / 80-bit precision ) needs to be used. Also, regarding applications of long double take a look at these threads: Forum topic: Mathimf and Windows Web-link: software.intel.com/en-us/forums/topic/357759 Forum topic: Support of Extended or Quad IEEE FP formats Web-link: software.intel.com/en-us/forums/topic/358472 Forum topic: Using 'long double' in Parallel Studio? Web-link: software.intel.com/en-us/forums/topic/266290 Forum topic: Why function printf does not support long double? Web-link: software.intel.com/en-us/forums/topic/372720
0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
Also, take a look at: Forum topic: Mixing of Floating-Point Types ( MFPT ) when performing calculations. Does it improve accuracy? Web-link: software.intel.com/en-us/forums/topic/361134 Results of a real test provided in the thread.
0 Kudos
jimdempseyatthecove
Honored Contributor III
697 Views

Computation using irrational numbers mostly produce irrational results. Double precision floating point uses 52 bits for mantissa with and implied 1 (53 bits of precision for fraction). Some fractional numbers cannot be exactly represented using a finite number of bits (some whole cannot either). A good example of this is the binary value of the decimal 0.1, this is 0.1100110011001100... This is an infinite repeating fraction. In DP FP this fraction will be left shifted by 1 bit and binary exponent diminished by 1 (to account for the shift). the 1. is removed as excepting for 0 and denormalized numbers, it will always be 1. this leave a binary fraction of:

[cpp]

1001100110011001100110011001100110011001100110011001100110011001100... (infinite irrational)
0000000001111111111222222222233333333334444444444555 (10's bit counter)
1234567890123456789012345678901234567890123456789012 (1's bit counter)
1001100110011001100110011001100110011001100110011010 (rounded to 52 bits)
00000000000000000000000000000000000000000000000000000110011001100... (+error)
[/cpp]

Depending on how and how often you manipulate these numbers, the error grows. Using longer floating point formats postpone the error from creaping into the results beyond acceptible levels, but will not eliminate it from happening. Programmers can work around these errors if need be but in many cases the error is within an acceptible range and can be ignored.

Even before the days of binary computers, numerical computations had to take into consideration error in values. SIN, COS, LN tables were published to finite number of places.

Jim Dempsey

0 Kudos
Bernard
Valued Contributor I
697 Views

As Jim said usage of irrational numbers and also real numbers witch are not exactly representable by binary number encoding can lead tonaccumulation of the errors related to the accuracy of the final result.

0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
>>... How can I do it better? Using more precission? Aurora, please let me know if you need more practical help rather then theoretical.
0 Kudos
aurora
Beginner
697 Views

Hi,

We are testing our algorithms with long double type data in terms of accuracy and times. A priori, we think the problem is solved. Would you say that operations with long double are much heavier in time?

I understand your explanations, but I still dont see why results changes depending on the machine. i.e. Intel Xeon and Core Duo with same binaries.

Thanks

0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
>>...Would you say that operations with long double are much heavier in time? Yes.
0 Kudos
Bernard
Valued Contributor I
697 Views

 >>>but I still dont see why results changes depending on the machine. i.e. Intel Xeon and Core Duo with same binaries.>>>

Maybe this is due to various microcode and/or hardware implementation of the rounding algorithms.As Tim said there are also programming errors and there is also some possibility of the hardware errors which could manifest themselves as a loss of accurracy.

0 Kudos
SergeyKostrov
Valued Contributor II
697 Views
>>... I still dont see why results changes depending on the machine. i.e. Intel Xeon and Core Duo with same binaries... Please try to look at CRT-libraries since older versions could be considered as "obsolete" ( it is applicable for any platform ). In practice, I never had identical results when the same test-case was compiled with, for example Visual C++ v6.0 and Visual Studio 2005, and then executed.
0 Kudos
Mark_S_Intel1
Employee
697 Views
0 Kudos
Bernard
Valued Contributor I
697 Views

@mark-sabahi

It was a very interesting article.Thanks for link.

0 Kudos
Reply