topic Hello, in Intel® Integrated Performance Primitives

IPP and selection

Sebastien_C_1 — Wed, 09 Mar 2016 08:38:49 GMT

Hello,

I am looking for an IPP function which is equivalent to the following code :

selection( float* v_sel, float* v_a, float* v_b, float* v_out, int size)
{
int i;
/* v_sel is a boolean vector */

for (i=0 ; i < size ; i++)
{

if( *v_sel == 0.)
{
      *v_out = *v_a;
}
else
{
    *v_out = *v_b;
}

v_sel++
v_a++
v_b++
v_out++

}

Any idea ?

Thanks a lot

HI Sebastien,

Jonghak_K_Intel — Wed, 09 Mar 2016 08:53:58 GMT

HI Sebastien,

To help us to understand your function better,

could you elaborate some information about your function?

About what you want to acheive and when you want to use it, what is the object of your function?

Often this kind of function

Sebastien_C_1 — Wed, 09 Mar 2016 12:54:00 GMT

Often this kind of function is used after threasholding.

Threasholding returns a vector of 0 and 1. Then this vector is used to choose value in vector A (if value is 0) or in vector B (if value is 1).

Sometimes I use it with single value vector A and single value vector B. If value after thresholding is 0, output_vector value is value A otherwise value B.

I have had good luck with the

McCalpinJohn — Wed, 09 Mar 2016 21:51:24 GMT

I have had good luck with the compiler vectorizing simple loops that do these sorts of merge operations. For example this loop compiles into very good AVX code with the Intel 15 or Intel 16 compilers.

            for (i=0; i<N; i++) {
                if (v_in > scalar1*compare) {
                    v_out = scalar2*compare;
                } else {
                    v_out = v_in;
                }
            }

The generated code is fully vectorized and does not have any obvious wasted effort. It loads 256 bits of each of the vectors, multiplies the "compare" vector by the "scalar1" value and uses a VCMPGTPS instruction for the compare. It then scales the "compare[]" array by "scalar2" and saves the value in another register. The results of the VCMPGTPS are used with a VANDNPS instruction to merge select either the element from "v_in" or the scaled value of "compare" for the output, then does a 256-bit store of the merged result. I don't see anything in the generated code that looks sub-optimal.

The vectorization falls apart if the loops get much more complicated and also falls apart if the compiler is not sure that the pointers don't alias.

The compiler will generate multiple versions of the code to handle different alignments and vector lengths -- the routines as I compiled them had no restrictions on alignment and the performance was only very weakly dependent on alignment on a Haswell system.

Hello,

Jonghak_K_Intel — Thu, 10 Mar 2016 07:48:16 GMT

Hello,

I don't really see a such function for a selection in the reference guide of IPP, but the above example of John's looks very efficient.

Also, you could take a look at the below pseudocode.

func ( v_sel, v_a, v_b , v_out , size ){


 float* v_one[size]; 


ippsMul_<mod>(v_sel, v_b, v_out, size ); 


 ippsSet_<mod>(1,v_one, size); // makes every element of v_one to 0x0001


 ippsXor_<mod>_I(v_one, v_sel, size) // this flips the boolean vector


 ippsMul_64f(v_sel, v_a, v_out, size ); 


}

Thank you.