Re: A problem related to floating point arithmetics on Intel's FPUs? Or, something else?

SergeyKostrov · ‎09-26-2011

Hello everybody,

I detected some problem related to floating point arithmetics on Intel's FPUs ( Floating Point Unit ). Or, it could be

related to something else and I simply missed it.

In general, I'm very surprised with my results and I wonder if Intel's engineers have any idea what could be

possibly wrong.

Best regards,

Sergey

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Let's say two square 8x8 matrices need to be multiplied. Here are examples of these test matrices:

// Matrix A - 8x8 - 'float' type:

101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0

901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0

1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0

2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0

3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0

4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0

4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0

5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

// Matrix B - 8x8 - 'float' type:

101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0

901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0

1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0

2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0

3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0

4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0

4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0

5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

In two Test-Cases classic 3-Loop matrix multiplication algorithm is used with Rolled and

Unrolled Loops:

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

// Test-Case 1 - Rolled Loops - 1-in-1

for( i = 0; i < 8; i++ )

{

for( j = 0; j < 8; j++ )

{

fC[i][j] = 0.0f;

for( k = 0; k < 8; k++ )

{

fC[i][j] += ( fA[i][k] * fB[k][j] );

}

// Matrix C = Matrix A * Matrix B - 8x8 - 'float' type

13826808.0 14187608.0 14548408.0 14909208.0 15270008.0 15630808.0 15991608.0 16352408.0

32393208.0 33394008.0 34394808.0 35395608.0 36396408.0 37397208.0 38398008.0 39398808.0

>50959604.0<</strong> 52600404.0 54241204.0 55882004.0 57522804.0 59163604.0 60804404.0 62445204.0 69526008.0 71806808.0 74087608.0 76368408.0 78649208.0 80930008.0 83210808.0 85491608.0 88092408.0 91013208.0 93934008.0 96854808.0 99775608.0 102696408.0 105617208.0 108538008.0 106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0 125225208.0 129426008.0 133626808.0 137827616.0 142028400.0 146229216.0 150430000.0 154630816.0 143791600.0 148632416.0 153473200.0 158314016.0 163154800.0 167995616.0 172836416.0 177677200.0I marked an incorrect value as '>50959604.0<</strong>'. By some reason it is NOT '50959608.0'. There are also a couple of moreincorrect values ( underlined ) because in all cases last two digits of any value in the Matrix C must be '08'.It can't be '...00' or '...16'.////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// // Test-Case 2 - Unrolled Loops - 4-in-1for( i = 0; i < 8; i++ ) { for( j = 0; j < 8; j++ ) { fC[i][j] = 0.0f; for( k = 0; k < 8; k += 4 ) { fC[i][j] += ( fA[i][k ] * fB[k ][j] ) + ( fA[i][k+1] * fB[k+1][j] ) + ( fA[i][k+2] * fB[k+2][j] ) + ( fA[i][k+3] * fB[k+3][j] ); } } }// Matrix C = Matrix A * Matrix B - 8x8 - 'float' type 13826808.0 14187608.0 14548408.0 14909208.0 15270008.0 15630808.0 15991608.0 16352408.0 32393208.0 33394008.0 34394808.0 35395608.0 36396408.0 37397208.0 38398008.0 39398808.0 50959608.0 52600408.0 54241208.0 55882008.0 57522808.0 59163608.0 60804408.0 62445208.0 69526008.0 71806808.0 74087608.0 76368408.0 78649208.0 80930008.0 83210808.0 85491608.0 88092408.0 91013208.0 93934008.0 96854808.0 99775608.0 102696408.0 105617208.0 108538008.0 106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0 125225208.0 129426008.0 133626808.0 137827616.0 142028416.0 146229216.0 1...

idata · ‎09-27-2011

Sergey,

You used the float data type which is single precisioin which offers slightly more than six significant digits. You need double precision for the number of significants used in your example. In double precision, you will see correct results.

Robert Miller

SergeyKostrov · ‎09-27-2011

If I change to 32-bit 'int', for example, I will also get a right 'C Matrix':

13826808 14187608 14548408 14909208 15270008 15630808 15991608 16352408

32393208 33394008 34394808 35395608 36396408 37397208 38398008 39398808

50959608 52600408 54241208 55882008 57522808 59163608 60804408 62445208

69526008 71806808 74087608 76368408 78649208 80930008 83210808 85491608

88092408 91013208 93934008 96854808 99775608 102696408 105617208 108538008

106658808 110219608 113780408 117341208 120902008 124462808 128023608 131584408

125225208 129426008 133626808 137827608 142028408 146229208 150430008 154630808

143791608 148632408 153473208 158314008 163154808 167995608 172836408 177677208

This is not about changing a data type from 'float' to 'double'. I'll need to understand why precision is lost for

different values in the 'C' Matrix in case of 'Rolled' and 'Unrolled' loops.

In my Test-Cases I use 8x8 matrices, that is, very-very small matrices. In a real life a customer

could use a matrix with size up to 8192x8192. In case of declaration as 'double' an application will

use significantly more memory. Then, problem escalates because a Strassen's algorithm for matrix

multiplication needs to be used in order to speed up calculations, and it creates another impact on memory.

What I can see that with 'float' type I reached a limitation of precision defined in IEEE 754 standard.

I also looked at assembler codes generated by Visual C++ and I confirm FPU's register ST0 gets a '50959608.0' value ( 1st Test-Case ).

After that a variable fC[2][0] gets '50959604.0'.

Thank you for your response!

idata · ‎09-28-2011

Your inputs are all integers, and floats can represent them exactly because they are less than or equal to 2^24-1 = 16777215. Your results are all integers, but they are too big to be represented exactly (in general) in floats. That's the first problem.

Your intermediate computations are being done in the FPU in double (or extended) precision (unless you changed the processor precision mode to 24-bit). As doubles, your results are exact integers. But intermediate computations can be "flushed" back to your float variables, at which point they will be rounded. How long your values are kept in the FPU before being flushed back depends on what instructions the compiler generates; certainly your different algorithms will generate different code -- and potentially different results. That's the second problem.

(Please read my article http://www.exploringbinary.com/when-floats-dont-behave-like-floats/ http://www.exploringbinary.com/when-floats-dont-behave-like-floats/ for more information.)

SergeyKostrov · ‎09-29-2011

Thank you to everybody who responded to my initial post.

I've just completed implementation of two prototypes for Precision Loss Detection ( PLD ) subsystem. Both

prototypes don't use exceptions, C++ or SEH, etc.

For example, for the 'float' data type detection works when a result is greater than '16777216.0':

4096.0 * 4096.0 = 16777216.0 - No Precision Loss - No Detection

4097.0 * 4097.0 = 16785409.0 - Yes Precision Loss - Yes Detected

A first prototype is extremely simple and it "monitors" results of all calculations, additions or multiplications. It is absolutely portable,

but affects performance.

A second prototype uses CRT-functions 'signal(...)', 'setjmp(...)' and 'longjmp(...)'. I coded it as 'SSL'. But, some modern C++ compilers failed to fully support it and

it can't be used on the project. Here is some information:

MS Visual C++ : SSL based PLD didn't work

Borland C++ v5.x: SSL based PLD didn't work

MinGW v3.4.2 : SSL based PLD didn't work

Turbo C++ 3.x : SSL based PLD worked absolutely fine!

PS: Unfortunately, due to project constraints C++ or SEH exceptions can't be used at all.