Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Smart_Lubobya
Beginner
290 Views

use of if -else statement in sse2 intrinsics

if (y < 0)

m[j+8] = (

short)(-( (((-y) * n[_q]) + _f) >> _a ));

else
m[j+8] = (short)( ((y * n[_q]) + _f) >> _a );

int

x =b + (d << 1);

r = c[j+4];

0 Kudos
3 Replies
Om_S_Intel
Employee
290 Views

What is the question here?
Smart_Lubobya
Beginner
290 Views

my question is how do i write the sse2 intrinsic of the above codes and generally how do i approach the if/else statement in sse2 intrinsic codes.
Thomas_W_Intel
Employee
290 Views

You certainly came across the fact that most SSE2 instructions always operate on all elements in the register. This poses some problem when you need to implement alternative code path depending on some condition. The trick is to use "masks" to implement alternative code paths, i.e. the code is executed for all elements, but does only affect some of them. In your example, you want to take the absolute value of an integer. This can be implemented like this (untested code):

__m128i cmp_result = _mm_cmpgt_epi32(_mm_set1_epi32(0),a);

__m128i b = _mm_xor_si128(a, cmp_result); // invert bits of all negative numbers

__m128i mask1 = _mm_and_si128(_mm_set1_epi32(1), cmp_result); // register with 1 if neg, 0 otherwise

__m128i result = _mm_add_epi32(b, mask1); // add 1 to the numbers, that were negative

Unless I did some mistake, the code inverts all bits of negative numbers and adds 1. The positive numbers are untouched. This avoids an if-else-statement, the conditional branch in your code. However, the "xor", "and", and "add" instructions are always executed, even if all numbers are positive. If this is regularly the case for typical input to your algorithm, it might be worth to test first if all results are zero, e.g. with _mm_test_all_zeros. (As always, you only know for sure which the fastest implementation is, by trying out.) For performance reasons, you would also set the constants _mm_set1_epi32(0) and _mm_set1_epi32(1) outside of the hot loop, but the compiler might already do this for you automatically.

P.S.: For questions about SSE2 instructions, the "AVX and CPU instructions" forum is often a better place than the compiler forum.

Reply