Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Concerns on using AVX double floating point instructions for integer data

cagribal
Beginner
6,206 Views

Hi all,

As you might know, AVX does not provide instructions for integer types, which are planned to arrive with AVX2. I have a code written using AVX instructions, which basically use _mm256_*_pd() variants of instructions that operate on double-precision floating-point values (the instructions I use are min, max, shuffle, blend, load, loadu, etc.). However my data is actually integers, which I load by casting integer pointers to double pointers, i.e. __m256d reg = _mm256_loadu_pd((double*)intPtr) etc. Functionality wise the code seems to do what I expect, i.e. sorts the data. However, as I haven't tested with all sorts of different data, I'm concerned whether the output will always be correct. What corner cases should I be concerned with? Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison would not work?

Thanks for comments and suggestions

0 Kudos
37 Replies
SergeyKostrov
Valued Contributor II
2,282 Views
>>...Yes , but the stored value is encoded by the compiler and/or hardware so the compiler's vendor can implement it differently... No, when it comes to conversion from int to double in accordance with IEEE754 Standard unless some vendor violates that standard.
0 Kudos
Bernard
Valued Contributor I
2,281 Views
Only when the IEEE754 Standard is concerned.Moreover you must also take into account unpredictable possibility of the hardware units clock inaccuraccies and/or data(memory) bus timing errors which could pollute the results with the random values.I know that I'm to rigorous here:) ,but such a hardware related errors could be quite possible to occur .
0 Kudos
Patrick_F_Intel1
Employee
2,281 Views
If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.
0 Kudos
Jeffrey_A_Intel
Employee
2,282 Views
Once again, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast.
Of course. The original question however involved casting pointers, not data values: _mm256_loadu_pd((double*)intPtr).
0 Kudos
Bernard
Valued Contributor I
2,282 Views
>>>If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.>>> Yes in the past I experienced such a behaviour with the faulty CPU. >>>hw units clock inaccuracies (not sure what that means)>>> I mean miniscule shifts in the phase of the clock frequency. >>>then the hardware has bigger problems than can be addressed here.>>> I know that pretty well.My intention was to emphasize the fact that sometimes the wrong result while converting between the primitive types could stem from the hardware error.
0 Kudos
SergeyKostrov
Valued Contributor II
2,282 Views
[ From Jeff ] >>...Of course. The original question however involved casting pointers, not data values... Jeff, sorry for repeating that statement made by cagribal: >>...my data is actually integers, which I load by casting integer pointers to double pointers... and after data loaded cagribal does some processing and his concern is related to, I would say, "unsafe" comparisons or correctness of comparisons of double-precision data values, not pointers. Best regards, Sergey
0 Kudos
Patrick_F_Intel1
Employee
2,282 Views
There are 2 cases: 1) casting a int64 to a double. This always works and never generates a NAN, but you can lose precision. 2) casting an int64 pointer to a double pointer (which is basically a memcpy(&double_var, &int64_var, 8); ). This also always 'works' but can generate a NAN. Basically you are not converting an int64 to a double, you are just copying bits. I say 'not converting an int64 to double' because, unless your int64 bit pattern just happens to also be the correct 64bit double encoding, then you are not going to get the correct double encoding for your int64 number. Does that make sense? Pat The
0 Kudos
cagribal
Beginner
2,282 Views
Hi all, Thanks for the comments. Patrick has clearly summarized all the cases. However, questions I still have are: a) Why AVX _mm256_min_pd() or _mm256_max_pd() return NaN for comparisons with an NaN number? (Please see the Test 3 in the code snippet I posted above) b) My understanding is, if integers do not contain all 1's in the exponent field, i.e. bits 63-52, then all double comparisons over the raw bits (treated as double by copying or so) will be always correct. The implication is that by restricting my integers to use at most 62 bits (i.e. by leaving MSB exponent bit always 0), I can assure that comparisons will always be correct. Any comments on this? Thanks,
0 Kudos
TimP
Honored Contributor III
2,282 Views
a) IEEE754 defines a comparison against NaN to return NaN, These are floating point operations. b) I suppose, you must assure correct setting of DAZ bit to use comparisons with zero exponent bits.
0 Kudos
Jeffrey_A_Intel
Employee
2,282 Views
According to the instruction set reference manual (see the description of the VMINPD instruction which is what the documentation says the _mm256_min_pd intrinsic generates):
If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned). If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.
In your case, one of the elements of the 2nd operand is a NaN, so that NaN is forwarded to the destination operand. As to your second point: depending on your floating-point environment, subnormals (exponent == 0, significand != 0; i.e., non-zero integers with "small" absolute value) might cause exceptions to be raised. I don't know what would happen if you have "flush-to-zero" enabled and you compare two vectors of small, non-zero integers. I'm sure the behavior is defined; I just don't know what it is.
0 Kudos
Patrick_F_Intel1
Employee
2,282 Views
Hello cagribal, Adding a little more to For a), I assume that it is part of the IEE floating point (754?) standard to return a NAN if you are comparing NANs. For b), it depends on what you mean by 'correct'. 1) If you are just casting int64 to double and you test that the int64 value isn't > 52 bits then the value will be correct. 2) If you are copying (instead of casting) then your result will probably be wrong even if you are only using bits 0-52. Here is an example of 'copying an int64 to double' not working... not working in the sense that the number in the double does not equal the number in the int64. Using msvc: C:\tst>type fltpt.c [cpp] #include #include int main(int argc, char **argv) { double x, y; long long int myll; myll = ( 0x3LL << 40) + 1; printf("myll = 0x%llx, %lld\n", myll, myll); x = (double)myll; printf("dbl x val by casting= %f, in hex= 0x%llx\n", x, x); memcpy(&y, &myll, sizeof(y)); //y = *(double *)(long long int *)&myll; // this line is same as memcpy above printf("dbl y val by copying= %f, in hex= 0x%llx\n", y, y); return 0; } [/cpp] [plain] C:\tst>fltpt.exe myll = 0x30000000001, 3298534883329 dbl x val by casting= 3298534883329.000000, in hex= 0x4288000000000800 dbl y val by copying= 0.000000, in hex= 0x30000000001 [/plain] You can see a description of what happens during the 'int64->dbl' casting at http://www.cs.binghamton.edu/~reckert/220/floatpt.htm Pat
0 Kudos
SergeyKostrov
Valued Contributor II
2,282 Views
cagribal, What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties. Let's say you have a data set. Define safe and not-safe ranges. Pre-scan the data set and verify that all numbers are in the safe range and only after that do all the rest processing. If some numbers are not-safe than create a vector of not-safe numbers and save all indexes of these numbers for additional analysis. If you don't need to do the additional analysis than simply truncate all unsafe numbers to a max or min values of the safe range. This is what I would do and I use that solution in a real implementation of a Pigeonhole Sorting algorithm to sort only positive integer numbers. I would move ahead with practical implementation of a needed processing and, as I already menrioned, I would define safe and not-safe ranges first of all. Also, if your software is a mission critical ( healthcare, finance, defense, aerospace, etc ) then the problem has to be treated seriously with as many as possible verifications by different software developers. If your software is not mission critical ( R&D, thesis, do-it-because-have-nothing-else-to-do, etc ) some number of simple verifications will provide everything you need. Best regards, Sergey
0 Kudos
Bernard
Valued Contributor I
2,282 Views
@Sergey Great post.
0 Kudos
Patrick_F_Intel1
Employee
2,282 Views
Not to beat a horse to death but... [plain] What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties. [/plain] In a double precision number you have about 15 digits of precision. The US GDP is $15 trillion (14 digits). In Indian rupees, the number exceeds the precision of a double. So it is actually not too hard to exceed the number of significant digits in a double... depending on the area in which one is working. The rest of the advice is pretty good. I was assuming that cagribal was just loading the INTs into AVX for sorting (so no modification of the data... pure-read access). If this is true then he can do simple range checking when gets ready to sort the data. Pat
0 Kudos
SergeyKostrov
Valued Contributor II
2,282 Views
Patrick, I did a search in Intel(R) AVX compiler intrinsics header immintrin.h and I wonder if another intrinsic function could be used instead of: ... __m256d reg = _mm256_loadu_pd( ( double * )intPtr ); ... Since there is a union __m256i then I would expect an intrinsic function that does a similar operation like: ... __m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr ); ... Could a _mm256_set1_epi64x ( or some similar intrinsic function ) do the same without all issues & problems related to __int64-to-double cast?
0 Kudos
Patrick_F_Intel1
Employee
2,282 Views
Hey Sergey, I'm not sure quite sure I understand... the __int64-to-double cast is working as expected (as far as I can tell). Other than having int64 AVX instructions, what would you like the new intrinsic to do? Pat
0 Kudos
SergeyKostrov
Valued Contributor II
2,282 Views
>>... what would you like the new intrinsic to do? Exactly the same operation, that is to load 4 __int64 values into the reg variable of type __m256i: Instead of __m256d reg = _mm256_loadu_pd( ( double * )intPtr ); to use this __m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr ); Could you take a look at declarations of __m256d and __m256i C unions in immintrin.h header file?
0 Kudos
Reply