<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hello, in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104330#M25251</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I don't really see a such function for a selection in the reference guide of IPP, but the above example of John's looks very efficient.&lt;/P&gt;

&lt;P&gt;Also, you could take a look at the below pseudocode.&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;func ( v_sel, v_a, v_b , v_out , size ){


&amp;nbsp;float* v_one[size]; 


ippsMul_&amp;lt;mod&amp;gt;(v_sel, v_b, v_out, size ); 


&amp;nbsp;ippsSet_&amp;lt;mod&amp;gt;(1,v_one, size); // makes every element of v_one to 0x0001


&amp;nbsp;ippsXor_&amp;lt;mod&amp;gt;_I(v_one, v_sel, size) // this flips the boolean vector


&amp;nbsp;ippsMul_64f(v_sel, v_a, v_out, size ); 


}
&lt;/PRE&gt;

&lt;P&gt;Thank you.&lt;/P&gt;</description>
    <pubDate>Thu, 10 Mar 2016 07:48:16 GMT</pubDate>
    <dc:creator>Jonghak_K_Intel</dc:creator>
    <dc:date>2016-03-10T07:48:16Z</dc:date>
    <item>
      <title>IPP and selection</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104326#M25247</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I am looking for an IPP function which is equivalent to the following code :&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;selection( float* v_sel, float* v_a, float* v_b, float* v_out, int size)
{
int i;
/* v_sel is a boolean vector */

for (i=0 ; i &amp;lt; size ; i++)
{

if( *v_sel == 0.)
{
      *v_out = *v_a;
}
else
{
    *v_out = *v_b;
}

v_sel++
v_a++
v_b++
v_out++

}&lt;/PRE&gt;

&lt;P&gt;Any idea ?&lt;/P&gt;

&lt;P&gt;Thanks a lot&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2016 08:38:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104326#M25247</guid>
      <dc:creator>Sebastien_C_1</dc:creator>
      <dc:date>2016-03-09T08:38:49Z</dc:date>
    </item>
    <item>
      <title>HI Sebastien,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104327#M25248</link>
      <description>&lt;P&gt;HI Sebastien,&lt;/P&gt;

&lt;P&gt;To help us to understand your function better,&lt;/P&gt;

&lt;P&gt;could you elaborate some information about your function?&lt;/P&gt;

&lt;P&gt;About what you want to acheive and when you want to use it, what is the object of your function?&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2016 08:53:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104327#M25248</guid>
      <dc:creator>Jonghak_K_Intel</dc:creator>
      <dc:date>2016-03-09T08:53:58Z</dc:date>
    </item>
    <item>
      <title>Often this kind of function</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104328#M25249</link>
      <description>&lt;P&gt;Often this kind of function is used after threasholding.&lt;/P&gt;

&lt;P&gt;Threasholding returns a vector of 0 and 1. Then this vector is used to choose value in vector A (if value is 0) or in vector B (if value is 1).&lt;/P&gt;

&lt;P&gt;Sometimes I use it with single value vector A and single value vector B. If value after thresholding is 0, output_vector value is value A otherwise value B.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2016 12:54:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104328#M25249</guid>
      <dc:creator>Sebastien_C_1</dc:creator>
      <dc:date>2016-03-09T12:54:00Z</dc:date>
    </item>
    <item>
      <title>I have had good luck with the</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104329#M25250</link>
      <description>&lt;P&gt;I have had good luck with the compiler vectorizing simple loops that do these sorts of merge operations.&amp;nbsp;&amp;nbsp; For example this loop compiles into very good AVX code with the Intel 15 or Intel 16 compilers.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;            for (i=0; i&amp;lt;N; i++) {
                if (v_in&lt;I&gt; &amp;gt; scalar1*compare&lt;I&gt;) {
                    v_out&lt;I&gt; = scalar2*compare&lt;I&gt;;
                } else {
                    v_out&lt;I&gt; = v_in&lt;I&gt;;
                }
            }
&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;The generated code is fully vectorized and does not have any obvious wasted effort.&amp;nbsp;&amp;nbsp; It loads 256 bits of each of the vectors, multiplies the "compare" vector by the "scalar1" value and uses a VCMPGTPS instruction for the compare.&amp;nbsp; It then scales the "compare[]" array by "scalar2" and saves the value in another register.&amp;nbsp; The results of the VCMPGTPS are used with a VANDNPS instruction to merge select either the element from "v_in" or the scaled value of "compare" for the output, then does a 256-bit store of the merged result.&amp;nbsp;&amp;nbsp;&amp;nbsp; I don't see anything in the generated code that looks sub-optimal.&lt;/P&gt;

&lt;P&gt;The vectorization falls apart if the loops get much more complicated and also falls apart if the compiler is not sure that the pointers don't alias.&lt;/P&gt;

&lt;P&gt;The compiler will generate multiple versions of the code to handle different alignments and vector lengths -- the routines as I compiled them had no restrictions on alignment and the performance was only very weakly dependent on alignment on a Haswell system.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2016 21:51:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104329#M25250</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-03-09T21:51:24Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104330#M25251</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I don't really see a such function for a selection in the reference guide of IPP, but the above example of John's looks very efficient.&lt;/P&gt;

&lt;P&gt;Also, you could take a look at the below pseudocode.&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;func ( v_sel, v_a, v_b , v_out , size ){


&amp;nbsp;float* v_one[size]; 


ippsMul_&amp;lt;mod&amp;gt;(v_sel, v_b, v_out, size ); 


&amp;nbsp;ippsSet_&amp;lt;mod&amp;gt;(1,v_one, size); // makes every element of v_one to 0x0001


&amp;nbsp;ippsXor_&amp;lt;mod&amp;gt;_I(v_one, v_sel, size) // this flips the boolean vector


&amp;nbsp;ippsMul_64f(v_sel, v_a, v_out, size ); 


}
&lt;/PRE&gt;

&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Mar 2016 07:48:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-and-selection/m-p/1104330#M25251</guid>
      <dc:creator>Jonghak_K_Intel</dc:creator>
      <dc:date>2016-03-10T07:48:16Z</dc:date>
    </item>
  </channel>
</rss>

