Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Looking for a conditional intrinsic SSE command

amoshkov
Beginner
924 Views
Im looking for an SSE command that is equivalent to if
The following example:
if( a < b )
{
a = b;
}
else
{
a = c;
}


is conveniently optimized with:
mask = _mm_cmplt_pd (a, b);
a = _mm_blend_pd (c, b, mask);

In this case:
  1. The comparison is mapped to the first intrinsic operation
  2. The conditional assignment is mapped to the second intrinsic operation

However, in the next example (without an else)
if( a < b )
{
a = b;
}

If the condition isnt met, the regular C code skips the assignment operation,
But with the following optimized code:

mask = _mm_cmplt_pd (a, b);
a = _mm_blend_pd (a, b, mask);


the assignment still happens.

My question:
Is there a way to save the assignment if not needed (i.e. if the conditioned isnt met)?

Thanks

0 Kudos
7 Replies
Thomas_W_Intel
Employee
924 Views

Hello,

You can only avoid the blend instruction if the condition is false for both comparisons that _mm_cmplt_pd does. Furthermore, you will need (at least) 1 cycle for a conditional jump and _mm_blend_pd needs only1 cycle. This looks already optimal to me.

Kind regards

Thomas

0 Kudos
TimP
Honored Contributor III
924 Views
The blend with unconditional store is probably optimum, for cases where branch prediction isn't effective. So, the answer to that question is dependent on context, and may require actual testing.
0 Kudos
amoshkov
Beginner
924 Views
MAD willhal:

You can only avoid the blend instruction if the condition is false for both comparisons that _mm_cmplt_pd does.



Is the avoidance done automatically if the condition is false for both comparisons?
0 Kudos
amoshkov
Beginner
924 Views
tim18:
The blend with unconditional store is probably optimum, for cases where branch prediction isn't effective.


Can you explain what you mean by blend with unconditional store? are you referring to a case where the condition is true for one of the masks? Also, what do you mean by is probably optimum?

0 Kudos
amoshkov
Beginner
924 Views
MAD willhal:

Furthermore, you will need (at least) 1 cycle for a conditional jump and _mm_blend_pd needs only1 cycle.



Is there a reference that specifies the number of cycles that each of the _mm operations require? (or do they all require one cycle?)
0 Kudos
TimP
Honored Contributor III
924 Views

a = _mm_blend_pd (a, b, mask);

stores a, regardless of whether the value changes. This avoids any dependence on branch prediction, but could increase latency, compared to compilation with a predictable branch and frequent skipping of the store.

0 Kudos
Thomas_W_Intel
Employee
924 Views

Is the avoidance done automatically if the condition is false for both comparisons?

No, it is not avoided. However, avoiding it does not make sense. The blend instruction needs only 1 single clock cycle--regardless of how many values are copied (0, 1, or 2). You can hardly do better than that.

If you introduced some additional code to skip the blend, it will need at least oneinstruction for the jump. Plus, the branch prediction might be wrong, which will result in additional wasted cycles.

With an out-of-order engine, it is very difficult to predict if some piece of code is optimal---but in this case it is very likely :)

The latency and throughput of instructions can be found in Appendix C of the "Intel and IA-32 Architectures Optimization Reference Manual" (http://www.intel.com/design/processor/manuals/248966.pdf)

Kind regards

Thomas

0 Kudos
Reply