- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. When I run this code :
#include <cmath> // for NAN c++11 and up #include <iostream> #include <xmmintrin.h> int main(int argc, char ** argv) { float nan_value = NAN; __m128 const a = _mm_load_ss(&nan_value); __m128 const b = _mm_setzero_ps(); std::cout << "gt : " << (nan_value > 0) << std::endl; std::cout << "lt : " << (nan_value < 0) << std::endl; std::cout << "ge : " << (nan_value >= 0) << std::endl; std::cout << "le : " << (nan_value <= 0) << std::endl; std::cout << "eq : " << (nan_value == 0) << std::endl; std::cout << "ne : " << (nan_value != 0) << std::endl << std::endl << std::endl; std::cout << "ugt : " << _mm_ucomigt_ss(a,b) << std::endl; std::cout << "ult : " << _mm_ucomilt_ss(a,b) << std::endl; std::cout << "uge : " << _mm_ucomige_ss(a,b) << std::endl; std::cout << "ule : " << _mm_ucomile_ss(a,b) << std::endl; std::cout << "ueq : " << _mm_ucomieq_ss(a,b) << std::endl; std::cout << "une : " << _mm_ucomineq_ss(a,b) << std::endl << std::endl << std::endl; return 0; }
I get this output :
Which is weird because based on the documentation comigt/lt/ge/le/eq/neq should return 1 when one or both operands are NAN. Can someone shed some light on this?
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This behavior is correct. It stems from two things: which EFLAGS get set by UCOMISS, and which EFLAGS get tested by CMOVxx.
The _mm_ucomixx_ss(a, b) intrinsic converts to the following assembly instructions:
movaps xmm0, xmmword ptr ucomiss xmm0, dword ptr mov eax,0 mov ecx, 1 cmovxx eax, ecx
where 'CMOVxx' would become, for instance, 'CMOVA' (meaning "move if above") for _mm_ucomigt_ss(). So the UCOMISS instruction sets EFLAGS based on a and b, and then the CMOVxx instruction either leaves eax (the return value) at 0 if the condition wasn't satisfied, or changes it to 1 if it was.
The documentation for UCOMISS gives the following behavior:
RESULT ← UnorderedCompare(SRC1[31:0] <> SRC2[31:0]) { (* Set EFLAGS *) CASE (RESULT) OF UNORDERED: ZF,PF,CF ← 111; GREATER_THAN: ZF,PF,CF ← 000; LESS_THAN: ZF,PF,CF ← 001; EQUAL: ZF,PF,CF ← 100; ESAC;
If either operand is NaN, the result is unordered, which sets PF (note that PF is cleared for all non-NaN cases). But the ZF and CF flags also get set in the unordered case. Now if you look at the documentation for CMOVxx, you'll see for instance that CMOVA moves if CF = 0 and ZF = 0. This isn't true if the result of UCOMISS is unordered, so you get 0 in your first example (gt). But your second example (lt) translates to CMOVB ("move if below"), which moves if CF=1. So in the unordered caes, the move executes, and the return value is 1. You can explain the behavior of the other examples similarly.
In short:
- Unordered inputs to UCOMISS set PF, but also set ZF and CF.
- CMOVxx doesn't care about PF if you are doing things like equal, greater than, etc., so conditions which require ZF or CF (or both) to be 1 will return 1.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This behavior is correct. It stems from two things: which EFLAGS get set by UCOMISS, and which EFLAGS get tested by CMOVxx.
The _mm_ucomixx_ss(a, b) intrinsic converts to the following assembly instructions:
movaps xmm0, xmmword ptr ucomiss xmm0, dword ptr mov eax,0 mov ecx, 1 cmovxx eax, ecx
where 'CMOVxx' would become, for instance, 'CMOVA' (meaning "move if above") for _mm_ucomigt_ss(). So the UCOMISS instruction sets EFLAGS based on a and b, and then the CMOVxx instruction either leaves eax (the return value) at 0 if the condition wasn't satisfied, or changes it to 1 if it was.
The documentation for UCOMISS gives the following behavior:
RESULT ← UnorderedCompare(SRC1[31:0] <> SRC2[31:0]) { (* Set EFLAGS *) CASE (RESULT) OF UNORDERED: ZF,PF,CF ← 111; GREATER_THAN: ZF,PF,CF ← 000; LESS_THAN: ZF,PF,CF ← 001; EQUAL: ZF,PF,CF ← 100; ESAC;
If either operand is NaN, the result is unordered, which sets PF (note that PF is cleared for all non-NaN cases). But the ZF and CF flags also get set in the unordered case. Now if you look at the documentation for CMOVxx, you'll see for instance that CMOVA moves if CF = 0 and ZF = 0. This isn't true if the result of UCOMISS is unordered, so you get 0 in your first example (gt). But your second example (lt) translates to CMOVB ("move if below"), which moves if CF=1. So in the unordered caes, the move executes, and the return value is 1. You can explain the behavior of the other examples similarly.
In short:
- Unordered inputs to UCOMISS set PF, but also set ZF and CF.
- CMOVxx doesn't care about PF if you are doing things like equal, greater than, etc., so conditions which require ZF or CF (or both) to be 1 will return 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, that makes all things pretty much clear except the fact that the documentation tells one thing while in reality intrinsic does completely different thing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Naer J. wrote:
Thanks, that makes all things pretty much clear except the fact that the documentation tells one thing while in reality intrinsic does completely different thing?
which intrinsic documentation are you referring to ? MS compiler, Intel compiler, other ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm talking about MS compiler : here(scroll down to the bottom) it explicitly stated "If a
or b
is a NaN,1
is returned".
I actually don't have any problems with comigt and comige because I can replace them with comilt and comile, and swap a and b operands around. This would give me well defined behavior which I am seeking. But the real problem is comineq behavior. I can not force comineq to return 1 when a or b is NAN without accessing PF flag. And because of that I have to make my way around introducing additional cmpord intrinsic. And if I do use cmpord intrinsic then why not to use cmpgt in the first place which gives 0000001 results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I spoke w/my compiler friends. The involved parties requested you submit a bug so that they can track/investigate the issue. https://connect.microsoft.com/VisualStudio
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Naer J. wrote:
I'm talking about MS compiler : here(scroll down to the bottom) it explicitly stated "If a or b is a NaN,1 is returned".
I actually don't have any problems with comigt and comige because I can replace them with comilt and comile, and swap a and b operands around. This would give me well defined behavior which I am seeking. But the real problem is comineq behavior. I can not force comineq to return 1 when a or b is NAN without accessing PF flag. And because of that I have to make my way around introducing additional cmpord intrinsic. And if I do use cmpord intrinsic then why not to use cmpgt in the first place which gives 0000001 results.
The MSDN documentation says that 1 is returned for all NaN cases. This is wrong, and there's no way the intrinsics could do that as written. Interestingly enough, the Intel documentation for the same intrinsics says nothing about NaN at all!
Perhaps you could tell us a bit about what you're trying to do and we could help you figure out the best way forward, regardless of documentation errors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps you could tell us a bit about what you're trying to do and we could help you figure out the best way forward, regardless of documentation errors.
I was just playing around with intrinsics, nothing interesting in particular.
For the side note the GCC 4.8.1 produces the same results while GCC 4.9.2 and Clang 3.5.1 gives 111110.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page