<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic ippsAtan2 timing with 0 operands in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798000#M2913</link>
    <description>Hi Tim,&lt;BR /&gt;&lt;BR /&gt;what version of IPP do you use? Does that effect take place on all variants of atan2 function (ippsAtan2_32f_A11, ippsAtan2_32f_A21 and ippsAtan2_32f_A24)?&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt; Vladimir</description>
    <pubDate>Thu, 28 Oct 2010 20:41:52 GMT</pubDate>
    <dc:creator>Vladimir_Dudnik</dc:creator>
    <dc:date>2010-10-28T20:41:52Z</dc:date>
    <item>
      <title>ippsAtan2 timing with 0 operands</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/797999#M2912</link>
      <description>When either of the operands to ippsAtan2_32f is 0, the operation takes many times longer than it when both operands are non-zero. On my AMD 64X2, it takes 3x as long (65 cycles per element, versus 22 cycles). On a Xeon X5680, it takes TEN TIMES as long (207 cycles versus 21).&lt;BR /&gt;&lt;BR /&gt;I find this very odd, since the results in either case are constant (0 when X=0, pi/2 when Y=0). The atan2 function in Microsoft's C run-time library takes half the time when an operand is 0.&lt;BR /&gt;&lt;BR /&gt;I'm going to try scanning through the vectors to special-case zero elements, but I'm dubious that is a net win. Anyone have any suggestions?&lt;BR /&gt;--&lt;BR /&gt;Tim Roberts, timr@probo.com&lt;BR /&gt;Providenza &amp;amp; Boekelheide, Inc.</description>
      <pubDate>Thu, 28 Oct 2010 20:29:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/797999#M2912</guid>
      <dc:creator>Tim_Roberts</dc:creator>
      <dc:date>2010-10-28T20:29:07Z</dc:date>
    </item>
    <item>
      <title>ippsAtan2 timing with 0 operands</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798000#M2913</link>
      <description>Hi Tim,&lt;BR /&gt;&lt;BR /&gt;what version of IPP do you use? Does that effect take place on all variants of atan2 function (ippsAtan2_32f_A11, ippsAtan2_32f_A21 and ippsAtan2_32f_A24)?&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt; Vladimir</description>
      <pubDate>Thu, 28 Oct 2010 20:41:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798000#M2913</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2010-10-28T20:41:52Z</dc:date>
    </item>
    <item>
      <title>ippsAtan2 timing with 0 operands</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798001#M2914</link>
      <description>I'm using IPP 6.1.&lt;BR /&gt;&lt;BR /&gt;Good question regarding the precision. I was using the A11 variant, but I just checked the others. The A24 variant also has a penalty when the parameters are 0, but the penalty is smaller.&lt;BR /&gt;&lt;BR /&gt;The A21 variant behaves differently. I don't see a penalty when it is exactly 0, but both of the values are small (but non-zero), the 3x penalty is there.&lt;BR /&gt;&lt;BR /&gt;If this were an iterative algorithm, I might expect that some combinations take longer to converge, but I thought this was a straight-line polynomial. Hence, my surprise. Could this be triggering overflow or underflow?&lt;BR /&gt;&lt;BR /&gt;Tim Roberts</description>
      <pubDate>Thu, 28 Oct 2010 22:14:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798001#M2914</guid>
      <dc:creator>Tim_Roberts</dc:creator>
      <dc:date>2010-10-28T22:14:02Z</dc:date>
    </item>
    <item>
      <title>ippsAtan2 timing with 0 operands</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798002#M2915</link>
      <description>Tim,&lt;BR /&gt;&lt;BR /&gt;which libraries are you using - IA32 or Intel64? Did you use emerged libs? If yes, did you use ippInit function in your code?&lt;BR /&gt;&lt;BR /&gt;Andrey&lt;BR /&gt;</description>
      <pubDate>Fri, 29 Oct 2010 09:39:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798002#M2915</guid>
      <dc:creator>Andrey_G_Intel2</dc:creator>
      <dc:date>2010-10-29T09:39:45Z</dc:date>
    </item>
    <item>
      <title>ippsAtan2 timing with 0 operands</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798003#M2916</link>
      <description>Tim,&lt;BR /&gt;The algorithm for atan2 has special code path for handling zeros. Different combinations of zero-nonzero arguments yield different special case results and they are all handled outside of the main path algorithm. This is vector function specific: we use SIMD commands to gain maximum performance, but this means we have to apply same algorithm to all inputs.This same algorithm is by design branch-free (to avoid misprediction penalties) and we strive to make it applicable for widest possible range of arguments. Still making this algorithm uniform for very different cases has performance implications. And we choose to take a hit of branch mispredict for subtle cases (e.g. zeros) versus slowing down all values in a uniform algorithm.&lt;BR /&gt;&lt;BR /&gt;In case you have a lot of zeros in your vector you may consider couple opportunities: a) filter them out bevore calling a vector function b) call scalar function in a loop e.g. atan2f from math.h (or mathimf.h if you are using Intel Compiler).&lt;BR /&gt;&lt;BR /&gt;Nikita</description>
      <pubDate>Fri, 29 Oct 2010 14:06:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsAtan2-timing-with-0-operands/m-p/798003#M2916</guid>
      <dc:creator>Nikita_A_Intel</dc:creator>
      <dc:date>2010-10-29T14:06:05Z</dc:date>
    </item>
  </channel>
</rss>

