Concerns on using AVX double floating point instructions for integer data - Page 2

cagribal · ‎12-13-2012

Hi all,

As you might know, AVX does not provide instructions for integer types, which are planned to arrive with AVX2. I have a code written using AVX instructions, which basically use _mm256_*_pd() variants of instructions that operate on double-precision floating-point values (the instructions I use are min, max, shuffle, blend, load, loadu, etc.). However my data is actually integers, which I load by casting integer pointers to double pointers, i.e. __m256d reg = _mm256_loadu_pd((double*)intPtr) etc. Functionality wise the code seems to do what I expect, i.e. sorts the data. However, as I haven't tested with all sorts of different data, I'm concerned whether the output will always be correct. What corner cases should I be concerned with? Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison would not work?

Thanks for comments and suggestions

SergeyKostrov · ‎12-13-2012

>>...Yes , but the stored value is encoded by the compiler and/or hardware so the compiler's vendor can implement it differently... No, when it comes to conversion from int to double in accordance with IEEE754 Standard unless some vendor violates that standard.

Bernard · ‎12-13-2012

Only when the IEEE754 Standard is concerned.Moreover you must also take into account unpredictable possibility of the hardware units clock inaccuraccies and/or data(memory) bus timing errors which could pollute the results with the random values.I know that I'm to rigorous here:) ,but such a hardware related errors could be quite possible to occur .

Patrick_F_Intel1 · ‎12-13-2012

If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.

Jeffrey_A_Intel · ‎12-13-2012

Once again, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast.

Of course. The original question however involved casting pointers, not data values: _mm256_loadu_pd((double*)intPtr).

Bernard · ‎12-13-2012

>>>If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.>>> Yes in the past I experienced such a behaviour with the faulty CPU. >>>hw units clock inaccuracies (not sure what that means)>>> I mean miniscule shifts in the phase of the clock frequency. >>>then the hardware has bigger problems than can be addressed here.>>> I know that pretty well.My intention was to emphasize the fact that sometimes the wrong result while converting between the primitive types could stem from the hardware error.

SergeyKostrov · ‎12-13-2012

[ From Jeff ] >>...Of course. The original question however involved casting pointers, not data values... Jeff, sorry for repeating that statement made by cagribal: >>...my data is actually integers, which I load by casting integer pointers to double pointers... and after data loaded cagribal does some processing and his concern is related to, I would say, "unsafe" comparisons or correctness of comparisons of double-precision data values, not pointers. Best regards, Sergey

Patrick_F_Intel1 · ‎12-13-2012

There are 2 cases: 1) casting a int64 to a double. This always works and never generates a NAN, but you can lose precision. 2) casting an int64 pointer to a double pointer (which is basically a memcpy(&double_var, &int64_var, 8); ). This also always 'works' but can generate a NAN. Basically you are not converting an int64 to a double, you are just copying bits. I say 'not converting an int64 to double' because, unless your int64 bit pattern just happens to also be the correct 64bit double encoding, then you are not going to get the correct double encoding for your int64 number. Does that make sense? Pat The

cagribal · ‎12-14-2012

Hi all, Thanks for the comments. Patrick has clearly summarized all the cases. However, questions I still have are: a) Why AVX _mm256_min_pd() or _mm256_max_pd() return NaN for comparisons with an NaN number? (Please see the Test 3 in the code snippet I posted above) b) My understanding is, if integers do not contain all 1's in the exponent field, i.e. bits 63-52, then all double comparisons over the raw bits (treated as double by copying or so) will be always correct. The implication is that by restricting my integers to use at most 62 bits (i.e. by leaving MSB exponent bit always 0), I can assure that comparisons will always be correct. Any comments on this? Thanks,

TimP · ‎12-14-2012

a) IEEE754 defines a comparison against NaN to return NaN, These are floating point operations. b) I suppose, you must assure correct setting of DAZ bit to use comparisons with zero exponent bits.

Jeffrey_A_Intel · ‎12-14-2012

According to the instruction set reference manual (see the description of the VMINPD instruction which is what the documentation says the _mm256_min_pd intrinsic generates):

If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned). If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.

In your case, one of the elements of the 2nd operand is a NaN, so that NaN is forwarded to the destination operand. As to your second point: depending on your floating-point environment, subnormals (exponent == 0, significand != 0; i.e., non-zero integers with "small" absolute value) might cause exceptions to be raised. I don't know what would happen if you have "flush-to-zero" enabled and you compare two vectors of small, non-zero integers. I'm sure the behavior is defined; I just don't know what it is.

Patrick_F_Intel1 · ‎12-14-2012

Hello cagribal, Adding a little more to For a), I assume that it is part of the IEE floating point (754?) standard to return a NAN if you are comparing NANs. For b), it depends on what you mean by 'correct'. 1) If you are just casting int64 to double and you test that the int64 value isn't > 52 bits then the value will be correct. 2) If you are copying (instead of casting) then your result will probably be wrong even if you are only using bits 0-52. Here is an example of 'copying an int64 to double' not working... not working in the sense that the number in the double does not equal the number in the int64. Using msvc: C:\tst>type fltpt.c [cpp] #include #include int main(int argc, char **argv) { double x, y; long long int myll; myll = ( 0x3LL << 40) + 1; printf("myll = 0x%llx, %lld\n", myll, myll); x = (double)myll; printf("dbl x val by casting= %f, in hex= 0x%llx\n", x, x); memcpy(&y, &myll, sizeof(y)); //y = *(double *)(long long int *)&myll; // this line is same as memcpy above printf("dbl y val by copying= %f, in hex= 0x%llx\n", y, y); return 0; } [/cpp] [plain] C:\tst>fltpt.exe myll = 0x30000000001, 3298534883329 dbl x val by casting= 3298534883329.000000, in hex= 0x4288000000000800 dbl y val by copying= 0.000000, in hex= 0x30000000001 [/plain] You can see a description of what happens during the 'int64->dbl' casting at http://www.cs.binghamton.edu/~reckert/220/floatpt.htm Pat

SergeyKostrov · ‎12-14-2012

cagribal, What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties. Let's say you have a data set. Define safe and not-safe ranges. Pre-scan the data set and verify that all numbers are in the safe range and only after that do all the rest processing. If some numbers are not-safe than create a vector of not-safe numbers and save all indexes of these numbers for additional analysis. If you don't need to do the additional analysis than simply truncate all unsafe numbers to a max or min values of the safe range. This is what I would do and I use that solution in a real implementation of a Pigeonhole Sorting algorithm to sort only positive integer numbers. I would move ahead with practical implementation of a needed processing and, as I already menrioned, I would define safe and not-safe ranges first of all. Also, if your software is a mission critical ( healthcare, finance, defense, aerospace, etc ) then the problem has to be treated seriously with as many as possible verifications by different software developers. If your software is not mission critical ( R&D, thesis, do-it-because-have-nothing-else-to-do, etc ) some number of simple verifications will provide everything you need. Best regards, Sergey

Bernard · ‎12-14-2012

@Sergey Great post.

Patrick_F_Intel1 · ‎12-15-2012

Not to beat a horse to death but... [plain] What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties. [/plain] In a double precision number you have about 15 digits of precision. The US GDP is $15 trillion (14 digits). In Indian rupees, the number exceeds the precision of a double. So it is actually not too hard to exceed the number of significant digits in a double... depending on the area in which one is working. The rest of the advice is pretty good. I was assuming that cagribal was just loading the INTs into AVX for sorting (so no modification of the data... pure-read access). If this is true then he can do simple range checking when gets ready to sort the data. Pat

SergeyKostrov · ‎12-15-2012

Patrick, I did a search in Intel(R) AVX compiler intrinsics header immintrin.h and I wonder if another intrinsic function could be used instead of: ... __m256d reg = _mm256_loadu_pd( ( double * )intPtr ); ... Since there is a union __m256i then I would expect an intrinsic function that does a similar operation like: ... __m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr ); ... Could a _mm256_set1_epi64x ( or some similar intrinsic function ) do the same without all issues & problems related to __int64-to-double cast?

Patrick_F_Intel1 · ‎12-16-2012

Hey Sergey, I'm not sure quite sure I understand... the __int64-to-double cast is working as expected (as far as I can tell). Other than having int64 AVX instructions, what would you like the new intrinsic to do? Pat

SergeyKostrov · ‎12-16-2012

>>... what would you like the new intrinsic to do? Exactly the same operation, that is to load 4 __int64 values into the reg variable of type __m256i: Instead of __m256d reg = _mm256_loadu_pd( ( double * )intPtr ); to use this __m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr ); Could you take a look at declarations of __m256d and __m256i C unions in immintrin.h header file?