Concerns on using AVX double floating point instructions for integer data

cagribal · ‎12-13-2012

Hi all,

As you might know, AVX does not provide instructions for integer types, which are planned to arrive with AVX2. I have a code written using AVX instructions, which basically use _mm256_*_pd() variants of instructions that operate on double-precision floating-point values (the instructions I use are min, max, shuffle, blend, load, loadu, etc.). However my data is actually integers, which I load by casting integer pointers to double pointers, i.e. __m256d reg = _mm256_loadu_pd((double*)intPtr) etc. Functionality wise the code seems to do what I expect, i.e. sorts the data. However, as I haven't tested with all sorts of different data, I'm concerned whether the output will always be correct. What corner cases should I be concerned with? Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison would not work?

Thanks for comments and suggestions

Jeffrey_A_Intel · ‎12-13-2012

From IEEE Std 754-2008, section 5.11:

Four mutually exclusive relations are possible: less than, equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself.

Thus, comparisons involving integers whose bit pattern matches that of a floating-point NaN would be problematic.

SergeyKostrov · ‎12-13-2012

Of course you can do int-to-double cast in order to use AVX, however... >>...Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison >>would not work? I would be very carefull because your processing will be dependent on limitation of IEEE 754 Standard and, as recommended in many-many sources, a comparison with an Epsilon could be added ( expect a performance impact ). If your tests are deterministic ( No Random data ) an accuracy of processings, I mean based in integers and then based on doubles, could be verified as soon as both outputs are saved. There are single- and double-precision binary format viewers on the web and you could look / verify how some integer values will look like after conversion to double type.

SergeyKostrov · ‎12-13-2012

>>...Thus, comparisons involving integers whose bit pattern matches that of a floating-point NaN would be problematic... That looks interesting and could you give us at least one example when some integer value could be converted to a double-precision NaN value?

SergeyKostrov · ‎12-13-2012

>>...Thus, comparisons involving integers whose bit pattern matches that of a floating-point NaN would be problematic... I'm very surprized when Intel engineers make some statements without any real verification(s) ( sometimes very simple ), like: [ Test-case ] ... int iIsNan = 0; double dValue = -1.0; double dValueLn = 0.0L; unsigned __int64 iValue = 0U; printf( "dValue = %f\n", dValue ); printf( "dValueLn = %f\n", dValueLn ); printf( "iValue = %I64d\n", iValue ); dValueLn = CrtLog( dValue ); printf( "dValueLn = %f\n", dValueLn ); iValue = ( __int64 )dValueLn; printf( "iValue = %I64d\n", iValue ); iIsNan = _isnan( dValueLn ); if( iIsNan == 0 ) printf( "dValueLn is Not NaN\n" ); else printf( "dValueLn is NaN\n" ); dValue = ( double )iValue; printf( "dValue = %f\n", dValue ); iValue = 9223372036854775800i64; dValue = 0.0L; printf( "iValue = %I64d\n", iValue ); printf( "dValue = %f\n", dValue ); dValue = ( double )iValue; printf( "dValue = %f\n", dValue ); iIsNan = _isnan( dValue ); if( iIsNan == 0 ) printf( "dValue is Not NaN\n" ); else printf( "dValue is NaN\n" ); ... [ Output ] dValue = -1.000000 dValueLn = 0.000000 iValue = 0 dValueLn = -1.#IND00 iValue = -9223372036854775808 dValueLn is NaN dValue = 9223372036854775800.000000 iValue = 9223372036854775800 dValue = 0.000000 dValue = 9223372036854775800.000000 dValue is Not NaN Please let me know if you find any problems with the test-case. Best regards, Sergey { UPDATED }Fixed: printf( "iValue = %f\n", iValue ); to printf( "iValue = %I64d\n", iValue );

Patrick_F_Intel1 · ‎12-13-2012

Hello cagribal, I assume when you say 'integers' you do mean 4 byte signed variables... so 32bit and includes one sign bit. The double precision IEEE mantissa is 53 bits plus one sign bit. If the question is, can every 32bit integer value be converted to double and, when I convert back to integer, will I get back the original integer? The answer to this is yes. If you are just doing compares (that is, not changing the value of your converted 32bit INTs) in your AVX code, you will not get NANs, and you will get the compare results you expect (there will be no unordered results). Pat

SergeyKostrov · ‎12-13-2012

Hi everybody, There are cases ( I detected 3 so far ) wheh 64-bit Integer ( boundary signed & unsigned ) and Double-Precision values do not match. Please take a look at cases 2.x: [ Output ] Test-Case 1 dValue = -1.000000 dValueLn = 0.000000 iValue = 0 dValueLn = -1.#IND00 iValue = -9223372036854775808 dValueLn is NaN dValue = 9223372036854775800.000000 Verifications for Boundary values ( signed and unsigned ) of 64-bit range: Test-Case 2.1 iValueS = 9223372036854775807 dValue = 0.000000 dValue = 9223372036854775800.000000 dValue is Not NaN Test-Case 2.2 iValueS = -9223372036854775808 dValue = 0.000000 dValue = -9223372036854775800.000000 dValue is Not NaN Test-Case 2.3 iValueU = 9223372036854775807 dValue = 0.000000 dValue = 9223372036854775800.000000 dValue is Not NaN Test-Case 2.4 iValueU = 0 dValue = 0.000000 dValue = 0.000000 dValue is Not NaN I'll post source codes of my quick test later after additional verification.

Patrick_F_Intel1 · ‎12-13-2012

64bit integers (if the span of non-zero bits in the 64bit integer is more than 53 bits) cannot be represented without a loss of precision. That is, converting a 64bit integer to double and back to 64bit may or may not give you back the original 64bit integer, depending on how many bits are used in the original 64bit integer. But 32bit integers will be okay.

SergeyKostrov · ‎12-13-2012

>>...64bit integers (if the span of non-zero bits in the 64bit integer is more than 53 bits) cannot be represented without a loss of precision... Exactly and this is how it looks like: >>... >>Test-Case 2.1 >>iValueS = 9223372036854775807 >>... >>dValue = 9223372036854775800.000000 >>... Thanks Patrick for the comment!

SergeyKostrov · ‎12-13-2012

>>...What corner cases should I be concerned with? Look for a Patrick's post for a case with 32-bit integers. There are 2 generic cases wheh 64-bit Integer ( boundary signed & unsigned ) and Double-Precision values do not match ( 64-bit is converted to 53-bit DP as Patrick mentioned in his post ). You need to verify some range of boundary integer values ( next to min and max values ). >>...Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison >>would not work? Yes if a precision of the source integer value is not lost during the conversion. Does it make sense?

cagribal · ‎12-13-2012

Hi all, thanks for your replies. @Patrick: Actually, as integer I meant 64-bit signed integers. So as I understood, it is possible that some 64-bit integer might have bit pattern of NaN and might result in an incorrect result. Here are small test cases that I'm using: [cpp] double NaN; *(uint64_t *)(&NaN) = 0x7FF0000000000001; // Test.1) Prints "NEQ : nan" , as NaN != NaN if(NaN == NaN) printf("EQ : %.20f\n", NaN); else printf("NEQ : %.20f\n", NaN); double x = 87.0d; // Test.2) Prints Unordered as comparison with a NaN is always Unordered if(NaN < x) printf("LT\n"); else if(NaN > x) printf("GT\n"); else if(NaN == x) printf("EQ\n"); else printf("Unordered\n"); // Test.3) Comparisons with AVX, basically min(NaN, 10) returns NaN (?) int64_t arr1[4] = {10, 20, 30, 40}; int64_t arr2[4] = {50, 20, 40, 10}; *(double *)(&arr2[0]) = NaN; __m256d a = _mm256_loadu_pd((double *) arr1); __m256d b = _mm256_loadu_pd((double *) arr2); printf("A = "); p256i(a); // A = AVXVector: {10 ; 20 ; 30 ; 40} printf("B = "); p256i(b); // B = AVXVector: {9218868437227405313 ; 20 ; 40 ; 10} __m256d ret = _mm256_min_pd (a, b); printf("MIN = "); p256i(ret); // MIN = AVXVector: {9218868437227405313 ; 20 ; 30 ; 10} [/cpp]

Patrick_F_Intel1 · ‎12-13-2012

Hello Cagribal, Yes, one can certainly generate double precision NANs from 64bit bit patterns. And one can generate 64bit ints which won't convert to doubles without loss of precision (such as bigint = (1LL << 55) + 1.) From my old PhD days, there were whole sections dedicated to what can/can't be represented/converted and back. You will need to check that your 64bit integer ranges do not exceed the 53 bit mantissa of the double precision value. Pat

SergeyKostrov · ‎12-13-2012

Hi everybody, >>...I'll post source codes of my quick test later after additional verification... Here it is: ... int iIsNaN = 0; // Test-Case 1 printf( "Test-Case 1\n" ); double dValue = -1.0; double dValueLn = 0.0L; unsigned __int64 iValue = 0U; printf( "\tdValue = %f\n", dValue ); printf( "\tdValueLn = %f\n", dValueLn ); printf( "\tiValue = %I64d\n", iValue ); dValueLn = CrtLog( dValue ); printf( "\tdValueLn = %f\n", dValueLn ); iValue = ( unsigned __int64 )dValueLn; printf( "\tiValue = %I64d\n", iValue ); iIsNaN = _isnan( dValueLn ); if( iIsNaN == 0 ) printf( "\tdValueLn is Not NaN\n" ); else printf( "\tdValueLn is NaN\n" ); dValue = ( double )iValue; printf( "\tdValue = %f\n", dValue ); printf( "Verifications for Boundary values ( Signed and UnSigned ) of 64-bit range:\n" ); __int64 iValueS = 0LL; unsigned __int64 iValueU = 0ULL; // Test-Case 2.1 printf( "Test-Case 2.1\n" ); iValueS = ( 9223372036854775807LL ); dValue = 0.0L; printf( "\tiValueS = %I64d\n", iValueS ); printf( "\tdValue = %f\n", dValue ); dValue = ( double )iValueS; printf( "\tdValue = %f\n", dValue ); iIsNaN = _isnan( dValue ); if( iIsNaN == 0 ) printf( "\tdValue is Not NaN\n" ); else printf( "\tdValue is NaN\n" ); // Test-Case 2.2 printf( "Test-Case 2.2\n" ); iValueS = ( -9223372036854775807LL - 1 ); dValue = 0.0L; printf( "\tiValueS = %I64d\n", iValueS ); printf( "\tdValue = %f\n", dValue ); dValue = ( double )iValueS; printf( "\tdValue = %f\n", dValue ); iIsNaN = _isnan( dValue ); if( iIsNaN == 0 ) printf( "\tdValue is Not NaN\n" ); else printf( "\tdValue is NaN\n" ); // Test-Case 2.3 printf( "Test-Case 2.3\n" ); iValueU = ( 9223372036854775807ULL ); dValue = 0.0L; printf( "\tiValueU = %I64d\n", iValueU ); printf( "\tdValue = %f\n", dValue ); dValue = ( double )iValueU; printf( "\tdValue = %f\n", dValue ); iIsNaN = _isnan( dValue ); if( iIsNaN == 0 ) printf( "\tdValue is Not NaN\n" ); else printf( "\tdValue is NaN\n" ); // Test-Case 2.4 printf( "Test-Case 2.4\n" ); iValueU = ( 0ULL ); dValue = 0.0L; printf( "\tiValueU = %I64d\n", iValueU ); printf( "\tdValue = %f\n", dValue ); dValue = ( double )iValueU; printf( "\tdValue = %f\n", dValue ); iIsNaN = _isnan( dValue ); if( iIsNaN == 0 ) printf( "\tdValue is Not NaN\n" ); else printf( "\tdValue is NaN\n" ); ...

SergeyKostrov · ‎12-13-2012

>>...it is possible that some 64-bit integer might have bit pattern of NaN and might result in an incorrect result... I'll do a couple of tests and I'll be back. Thanks guys for that really nice discussion!

Bernard · ‎12-13-2012

>>>Exactly and this is how it looks like: >>... >>Test-Case 2.1 >>iValueS = 9223372036854775807 >>... >>dValue = 9223372036854775800.000000 >>... Please bear in mind that exact implementation of printf()(I mean here some kind of formatting performed by this function) should be also taken into account when the same primitive types are converted from one type to other.The best example of such a conversion,albeit not applicable to your case is reduction of long double 80-bit type to 64-bit which is performed by MSVCRT printf() function.

Jeffrey_A_Intel · ‎12-13-2012

I'm very surprized when Intel engineers make some statements without any real verification(s)...

Perhaps you missed this part of the original post: However my data is actually integers, which I load by casting integer pointers to double pointers... If one of those "doubles" now points to 64 bits which has the long int value 92211202370041090560 (= 0x7ff8000000000000), it will be intepreted as a (quiet) NaN, and it will compare as "unordered" with any other value.

Jeffrey_A_Intel · ‎12-13-2012

Make that 9221120237041090560 and not 92211202370041090560.

SergeyKostrov · ‎12-13-2012

>>...in mind that exact implementation of printf()(I mean here some kind of formatting performed by this function) should be also taken into account... It affects only how the value is displayed not as how it is stored.

SergeyKostrov · ‎12-13-2012

>>...I'll do a couple of tests and I'll be back... Here is a small Test-Case 1.2 ... // Test-Case 1.2 printf( "Test-Case 1.2\n" ); unsigned __int64 iNaNIntValue = 0ULL; // iNaNIntValue = 0x1020304050607080; dValueLn = 0; iNaNIntValue = 18444492273895866368ULL; // 0xfff8000000000000 = NaN-raw-value ( binary representation ) dValueLn = ( double )iNaNIntValue; iIsNaN = _isnan( dValueLn ); if( iIsNaN == 0 ) printf( "\tdValueLn is Not NaN\n" ); else printf( "\tdValueLn is NaN\n" ); ... When debugging this is how variables look like in a Visual Studio 'Memory' window: [ 'double' with NaN value ] ... 00 00 00 00 00 00 f8 ff ... [ '__int64' after assignment from 'double' with NaN value ] ... 00 00 00 00 00 00 00 80 ... So, it looks like a developer should watch out for a 0xfff8000000000000 or 18444492273895866368 value. No and let me continue. Next, if a developer converts it back to 'double' it will get 0x43efff0000000000 or 4895411695440101376 and that is done by a C++ compiler (!). It looks like a magic but actually there are No any uncertanties here because only 53 bits (!) will be copied into mantissa and a part of 64-bit integer which is "responsible" for a NaN-code won't be re-created in the 'double'. So, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast, like: ... dValueLn = 0; iNaNIntValue = 18444492273895866368ULL; dValueLn = ( double )iNaNIntValue; ... unless a developer copies these 8 bytes with a 'memcpy' CRT function directly.

SergeyKostrov · ‎12-13-2012

>>... >>unless a developer copies these 8 bytes with a 'memcpy' CRT function directly. Something like that: ... // Test-Case 1.3 printf( "Test-Case 1.3\n" ); void *pdValueLn = &dValueLn; void *piNaNIntValue = &iNaNIntValue; memcpy( ( void * )pdValueLn, ( const void * )piNaNIntValue, 8 ); iIsNaN = _isnan( dValueLn ); if( iIsNaN == 0 ) printf( "\tdValueLn is Not NaN\n" ); else printf( "\tdValueLn is NaN\n" ); ... [ Output ] ... Test-Case 1.3 dValueLn is NaN ... Once again, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast.

Bernard · ‎12-13-2012

>>>It affects only how the value is displayed not as how it is stored.>>> Yes , but the stored value is encoded by the compiler and/or hardware so the compiler's vendor can implement it differently.Look at case of Intel primitive long double type and its truncation to 64-bit double precision type.