Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Concerns on using AVX double floating point instructions for integer data

cagribal
ビギナー
8,711件の閲覧回数

Hi all,

As you might know, AVX does not provide instructions for integer types, which are planned to arrive with AVX2. I have a code written using AVX instructions, which basically use _mm256_*_pd() variants of instructions that operate on double-precision floating-point values (the instructions I use are min, max, shuffle, blend, load, loadu, etc.). However my data is actually integers, which I load by casting integer pointers to double pointers, i.e. __m256d reg = _mm256_loadu_pd((double*)intPtr) etc. Functionality wise the code seems to do what I expect, i.e. sorts the data. However, as I haven't tested with all sorts of different data, I'm concerned whether the output will always be correct. What corner cases should I be concerned with? Would the comparisons will always be correct or will there be some integer values where the AVX floating point comparison would not work?

Thanks for comments and suggestions

0 件の賞賛
37 返答(返信)
SergeyKostrov
高評価コントリビューター II
3,235件の閲覧回数
>>...Yes , but the stored value is encoded by the compiler and/or hardware so the compiler's vendor can implement it differently... No, when it comes to conversion from int to double in accordance with IEEE754 Standard unless some vendor violates that standard.
Bernard
高評価コントリビューター I
3,234件の閲覧回数
Only when the IEEE754 Standard is concerned.Moreover you must also take into account unpredictable possibility of the hardware units clock inaccuraccies and/or data(memory) bus timing errors which could pollute the results with the random values.I know that I'm to rigorous here:) ,but such a hardware related errors could be quite possible to occur .
Patrick_F_Intel1
従業員
3,234件の閲覧回数
If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.
Jeffrey_A_Intel
従業員
3,235件の閲覧回数
Once again, this is not possible to create a NaN value in a double precision variable from a 64-bit integer variable by doing a simple cast.
Of course. The original question however involved casting pointers, not data values: _mm256_loadu_pd((double*)intPtr).
Bernard
高評価コントリビューター I
3,235件の閲覧回数
>>>If you are getting errors like random memory values or hw units clock inaccuracies (not sure what that means), then the hardware has bigger problems than can be addressed here.>>> Yes in the past I experienced such a behaviour with the faulty CPU. >>>hw units clock inaccuracies (not sure what that means)>>> I mean miniscule shifts in the phase of the clock frequency. >>>then the hardware has bigger problems than can be addressed here.>>> I know that pretty well.My intention was to emphasize the fact that sometimes the wrong result while converting between the primitive types could stem from the hardware error.
SergeyKostrov
高評価コントリビューター II
3,235件の閲覧回数
[ From Jeff ] >>...Of course. The original question however involved casting pointers, not data values... Jeff, sorry for repeating that statement made by cagribal: >>...my data is actually integers, which I load by casting integer pointers to double pointers... and after data loaded cagribal does some processing and his concern is related to, I would say, "unsafe" comparisons or correctness of comparisons of double-precision data values, not pointers. Best regards, Sergey
Patrick_F_Intel1
従業員
3,235件の閲覧回数
There are 2 cases: 1) casting a int64 to a double. This always works and never generates a NAN, but you can lose precision. 2) casting an int64 pointer to a double pointer (which is basically a memcpy(&double_var, &int64_var, 8); ). This also always 'works' but can generate a NAN. Basically you are not converting an int64 to a double, you are just copying bits. I say 'not converting an int64 to double' because, unless your int64 bit pattern just happens to also be the correct 64bit double encoding, then you are not going to get the correct double encoding for your int64 number. Does that make sense? Pat The
cagribal
ビギナー
3,235件の閲覧回数
Hi all, Thanks for the comments. Patrick has clearly summarized all the cases. However, questions I still have are: a) Why AVX _mm256_min_pd() or _mm256_max_pd() return NaN for comparisons with an NaN number? (Please see the Test 3 in the code snippet I posted above) b) My understanding is, if integers do not contain all 1's in the exponent field, i.e. bits 63-52, then all double comparisons over the raw bits (treated as double by copying or so) will be always correct. The implication is that by restricting my integers to use at most 62 bits (i.e. by leaving MSB exponent bit always 0), I can assure that comparisons will always be correct. Any comments on this? Thanks,
TimP
名誉コントリビューター III
3,235件の閲覧回数
a) IEEE754 defines a comparison against NaN to return NaN, These are floating point operations. b) I suppose, you must assure correct setting of DAZ bit to use comparisons with zero exponent bits.
Jeffrey_A_Intel
従業員
3,235件の閲覧回数
According to the instruction set reference manual (see the description of the VMINPD instruction which is what the documentation says the _mm256_min_pd intrinsic generates):
If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned). If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.
In your case, one of the elements of the 2nd operand is a NaN, so that NaN is forwarded to the destination operand. As to your second point: depending on your floating-point environment, subnormals (exponent == 0, significand != 0; i.e., non-zero integers with "small" absolute value) might cause exceptions to be raised. I don't know what would happen if you have "flush-to-zero" enabled and you compare two vectors of small, non-zero integers. I'm sure the behavior is defined; I just don't know what it is.
Patrick_F_Intel1
従業員
3,235件の閲覧回数
Hello cagribal, Adding a little more to For a), I assume that it is part of the IEE floating point (754?) standard to return a NAN if you are comparing NANs. For b), it depends on what you mean by 'correct'. 1) If you are just casting int64 to double and you test that the int64 value isn't > 52 bits then the value will be correct. 2) If you are copying (instead of casting) then your result will probably be wrong even if you are only using bits 0-52. Here is an example of 'copying an int64 to double' not working... not working in the sense that the number in the double does not equal the number in the int64. Using msvc: C:\tst>type fltpt.c [cpp] #include #include int main(int argc, char **argv) { double x, y; long long int myll; myll = ( 0x3LL << 40) + 1; printf("myll = 0x%llx, %lld\n", myll, myll); x = (double)myll; printf("dbl x val by casting= %f, in hex= 0x%llx\n", x, x); memcpy(&y, &myll, sizeof(y)); //y = *(double *)(long long int *)&myll; // this line is same as memcpy above printf("dbl y val by copying= %f, in hex= 0x%llx\n", y, y); return 0; } [/cpp] [plain] C:\tst>fltpt.exe myll = 0x30000000001, 3298534883329 dbl x val by casting= 3298534883329.000000, in hex= 0x4288000000000800 dbl y val by copying= 0.000000, in hex= 0x30000000001 [/plain] You can see a description of what happens during the 'int64->dbl' casting at http://www.cs.binghamton.edu/~reckert/220/floatpt.htm Pat
SergeyKostrov
高評価コントリビューター II
3,235件の閲覧回数
cagribal, What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties. Let's say you have a data set. Define safe and not-safe ranges. Pre-scan the data set and verify that all numbers are in the safe range and only after that do all the rest processing. If some numbers are not-safe than create a vector of not-safe numbers and save all indexes of these numbers for additional analysis. If you don't need to do the additional analysis than simply truncate all unsafe numbers to a max or min values of the safe range. This is what I would do and I use that solution in a real implementation of a Pigeonhole Sorting algorithm to sort only positive integer numbers. I would move ahead with practical implementation of a needed processing and, as I already menrioned, I would define safe and not-safe ranges first of all. Also, if your software is a mission critical ( healthcare, finance, defense, aerospace, etc ) then the problem has to be treated seriously with as many as possible verifications by different software developers. If your software is not mission critical ( R&D, thesis, do-it-because-have-nothing-else-to-do, etc ) some number of simple verifications will provide everything you need. Best regards, Sergey
Bernard
高評価コントリビューター I
3,235件の閲覧回数
@Sergey Great post.
Patrick_F_Intel1
従業員
3,235件の閲覧回数
Not to beat a horse to death but... [plain] What is a chance that your number(s) will be greater than 2^53? Please ask yourself. Then, if you're not counting number of atoms in the Universe ( ~10^80 ) than a definition of two ranges, that is a safe and not-safe, for my numbers should bring clarity to your uncertanties. [/plain] In a double precision number you have about 15 digits of precision. The US GDP is $15 trillion (14 digits). In Indian rupees, the number exceeds the precision of a double. So it is actually not too hard to exceed the number of significant digits in a double... depending on the area in which one is working. The rest of the advice is pretty good. I was assuming that cagribal was just loading the INTs into AVX for sorting (so no modification of the data... pure-read access). If this is true then he can do simple range checking when gets ready to sort the data. Pat
SergeyKostrov
高評価コントリビューター II
3,235件の閲覧回数
Patrick, I did a search in Intel(R) AVX compiler intrinsics header immintrin.h and I wonder if another intrinsic function could be used instead of: ... __m256d reg = _mm256_loadu_pd( ( double * )intPtr ); ... Since there is a union __m256i then I would expect an intrinsic function that does a similar operation like: ... __m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr ); ... Could a _mm256_set1_epi64x ( or some similar intrinsic function ) do the same without all issues & problems related to __int64-to-double cast?
Patrick_F_Intel1
従業員
3,235件の閲覧回数
Hey Sergey, I'm not sure quite sure I understand... the __int64-to-double cast is working as expected (as far as I can tell). Other than having int64 AVX instructions, what would you like the new intrinsic to do? Pat
SergeyKostrov
高評価コントリビューター II
3,235件の閲覧回数
>>... what would you like the new intrinsic to do? Exactly the same operation, that is to load 4 __int64 values into the reg variable of type __m256i: Instead of __m256d reg = _mm256_loadu_pd( ( double * )intPtr ); to use this __m256i reg = _mm256_load?_???64?( ( __int64 * )intPtr ); Could you take a look at declarations of __m256d and __m256i C unions in immintrin.h header file?
返信