Solved: SSE ucomiss/comiss strange behavior

Naer_J_ · ‎02-04-2015

Hello. When I run this code :

#include <cmath> // for NAN c++11 and up
#include <iostream>
#include <xmmintrin.h>


int main(int argc, char ** argv) {

	float nan_value = NAN;

	__m128 const a = _mm_load_ss(&nan_value);
	__m128 const b = _mm_setzero_ps();

	std::cout << "gt : " << (nan_value >  0) << std::endl;
	std::cout << "lt : " << (nan_value <  0) << std::endl;
	std::cout << "ge : " << (nan_value >= 0) << std::endl;
	std::cout << "le : " << (nan_value <= 0) << std::endl;
	std::cout << "eq : " << (nan_value == 0) << std::endl;
	std::cout << "ne : " << (nan_value != 0) << std::endl << std::endl << std::endl;

	std::cout << "ugt : " << _mm_ucomigt_ss(a,b) << std::endl;
	std::cout << "ult : " << _mm_ucomilt_ss(a,b) << std::endl;
	std::cout << "uge : " << _mm_ucomige_ss(a,b) << std::endl;
	std::cout << "ule : " << _mm_ucomile_ss(a,b) << std::endl;
	std::cout << "ueq : " << _mm_ucomieq_ss(a,b) << std::endl;
	std::cout << "une : " << _mm_ucomineq_ss(a,b) << std::endl << std::endl << std::endl;

	return 0;
}

I get this output :

Which is weird because based on the documentation comigt/lt/ge/le/eq/neq should return 1 when one or both operands are NAN. Can someone shed some light on this?

acctpurge_a_1 · ‎02-05-2015

This behavior is correct. It stems from two things: which EFLAGS get set by UCOMISS, and which EFLAGS get tested by CMOVxx.

The _mm_ucomixx_ss(a, b) intrinsic converts to the following assembly instructions:

movaps xmm0, xmmword ptr 
ucomiss xmm0, dword ptr 
mov eax,0
mov ecx, 1
cmovxx eax, ecx

where 'CMOVxx' would become, for instance, 'CMOVA' (meaning "move if above") for _mm_ucomigt_ss(). So the UCOMISS instruction sets EFLAGS based on a and b, and then the CMOVxx instruction either leaves eax (the return value) at 0 if the condition wasn't satisfied, or changes it to 1 if it was.

The documentation for UCOMISS gives the following behavior:

RESULT ← UnorderedCompare(SRC1[31:0] <> SRC2[31:0]) {
(* Set EFLAGS *)
CASE (RESULT) OF
    UNORDERED:
                              ZF,PF,CF ← 111;
    GREATER_THAN:
                              ZF,PF,CF ← 000;
    LESS_THAN:
                              ZF,PF,CF ← 001;
    EQUAL:
                              ZF,PF,CF ← 100;
ESAC;

If either operand is NaN, the result is unordered, which sets PF (note that PF is cleared for all non-NaN cases). But the ZF and CF flags also get set in the unordered case. Now if you look at the documentation for CMOVxx, you'll see for instance that CMOVA moves if CF = 0 and ZF = 0. This isn't true if the result of UCOMISS is unordered, so you get 0 in your first example (gt). But your second example (lt) translates to CMOVB ("move if below"), which moves if CF=1. So in the unordered caes, the move executes, and the return value is 1. You can explain the behavior of the other examples similarly.

In short:

Unordered inputs to UCOMISS set PF, but also set ZF and CF.
CMOVxx doesn't care about PF if you are doing things like equal, greater than, etc., so conditions which require ZF or CF (or both) to be 1 will return 1.

View solution in original post

acctpurge_a_1 · ‎02-05-2015

This behavior is correct. It stems from two things: which EFLAGS get set by UCOMISS, and which EFLAGS get tested by CMOVxx.

The _mm_ucomixx_ss(a, b) intrinsic converts to the following assembly instructions:

movaps xmm0, xmmword ptr 
ucomiss xmm0, dword ptr 
mov eax,0
mov ecx, 1
cmovxx eax, ecx

where 'CMOVxx' would become, for instance, 'CMOVA' (meaning "move if above") for _mm_ucomigt_ss(). So the UCOMISS instruction sets EFLAGS based on a and b, and then the CMOVxx instruction either leaves eax (the return value) at 0 if the condition wasn't satisfied, or changes it to 1 if it was.

The documentation for UCOMISS gives the following behavior:

RESULT ← UnorderedCompare(SRC1[31:0] <> SRC2[31:0]) {
(* Set EFLAGS *)
CASE (RESULT) OF
    UNORDERED:
                              ZF,PF,CF ← 111;
    GREATER_THAN:
                              ZF,PF,CF ← 000;
    LESS_THAN:
                              ZF,PF,CF ← 001;
    EQUAL:
                              ZF,PF,CF ← 100;
ESAC;

If either operand is NaN, the result is unordered, which sets PF (note that PF is cleared for all non-NaN cases). But the ZF and CF flags also get set in the unordered case. Now if you look at the documentation for CMOVxx, you'll see for instance that CMOVA moves if CF = 0 and ZF = 0. This isn't true if the result of UCOMISS is unordered, so you get 0 in your first example (gt). But your second example (lt) translates to CMOVB ("move if below"), which moves if CF=1. So in the unordered caes, the move executes, and the return value is 1. You can explain the behavior of the other examples similarly.

In short:

Unordered inputs to UCOMISS set PF, but also set ZF and CF.
CMOVxx doesn't care about PF if you are doing things like equal, greater than, etc., so conditions which require ZF or CF (or both) to be 1 will return 1.

Naer_J_ · ‎02-05-2015

Thanks, that makes all things pretty much clear except the fact that the documentation tells one thing while in reality intrinsic does completely different thing?

bronxzv · ‎02-05-2015

Naer J. wrote:

Thanks, that makes all things pretty much clear except the fact that the documentation tells one thing while in reality intrinsic does completely different thing?

which intrinsic documentation are you referring to ? MS compiler, Intel compiler, other ?

Naer_J_ · ‎02-05-2015

I'm talking about MS compiler : here(scroll down to the bottom) it explicitly stated "If a or b is a NaN,1 is returned".

I actually don't have any problems with comigt and comige because I can replace them with comilt and comile, and swap a and b operands around. This would give me well defined behavior which I am seeking. But the real problem is comineq behavior. I can not force comineq to return 1 when a or b is NAN without accessing PF flag. And because of that I have to make my way around introducing additional cmpord intrinsic. And if I do use cmpord intrinsic then why not to use cmpgt in the first place which gives 0000001 results.

MarkC_Intel · ‎02-05-2015

I spoke w/my compiler friends. The involved parties requested you submit a bug so that they can track/investigate the issue. https://connect.microsoft.com/VisualStudio

acctpurge_a_1 · ‎02-05-2015

Naer J. wrote:

I'm talking about MS compiler : here(scroll down to the bottom) it explicitly stated "If a or b is a NaN,1 is returned".

I actually don't have any problems with comigt and comige because I can replace them with comilt and comile, and swap a and b operands around. This would give me well defined behavior which I am seeking. But the real problem is comineq behavior. I can not force comineq to return 1 when a or b is NAN without accessing PF flag. And because of that I have to make my way around introducing additional cmpord intrinsic. And if I do use cmpord intrinsic then why not to use cmpgt in the first place which gives 0000001 results.

The MSDN documentation says that 1 is returned for all NaN cases. This is wrong, and there's no way the intrinsics could do that as written. Interestingly enough, the Intel documentation for the same intrinsics says nothing about NaN at all!

Perhaps you could tell us a bit about what you're trying to do and we could help you figure out the best way forward, regardless of documentation errors.

Naer_J_ · ‎02-05-2015

Perhaps you could tell us a bit about what you're trying to do and we could help you figure out the best way forward, regardless of documentation errors.

I was just playing around with intrinsics, nothing interesting in particular.

For the side note the GCC 4.8.1 produces the same results while GCC 4.9.2 and Clang 3.5.1 gives 111110.