<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic complex number class for optimum performance in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113260#M73124</link>
    <description>&lt;P&gt;I'm trying to take the&amp;nbsp;greatest advantage of AVX-512 simd instructions on my Xeon phi motherboard computer. I am not sure how the compiler deals with floating point complex numbers when doing optimization.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Is it recommended to use the std::complex&amp;lt;float&amp;gt; class in the innermost loop of a high-performance application? &amp;nbsp;In this inner loop, I'm multiplying 2 complex numbers and accumulating in a third.&lt;/P&gt;

&lt;P&gt;Or is there a better class, say the MKL_complex class, that would run faster?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 07 Mar 2017 18:42:40 GMT</pubDate>
    <dc:creator>Gerald_H_</dc:creator>
    <dc:date>2017-03-07T18:42:40Z</dc:date>
    <item>
      <title>complex number class for optimum performance</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113260#M73124</link>
      <description>&lt;P&gt;I'm trying to take the&amp;nbsp;greatest advantage of AVX-512 simd instructions on my Xeon phi motherboard computer. I am not sure how the compiler deals with floating point complex numbers when doing optimization.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Is it recommended to use the std::complex&amp;lt;float&amp;gt; class in the innermost loop of a high-performance application? &amp;nbsp;In this inner loop, I'm multiplying 2 complex numbers and accumulating in a third.&lt;/P&gt;

&lt;P&gt;Or is there a better class, say the MKL_complex class, that would run faster?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 18:42:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113260#M73124</guid>
      <dc:creator>Gerald_H_</dc:creator>
      <dc:date>2017-03-07T18:42:40Z</dc:date>
    </item>
    <item>
      <title>I'll answer my own question</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113261#M73125</link>
      <description>&lt;P&gt;I'll answer my own question as best as I can.&lt;/P&gt;

&lt;P&gt;What I was looking for is a library of routines, cleverly written and possibly using Intel Intrinsics, that allows me to perform vector operations like multiply on pairs of arrays of std::complex&amp;lt;float&amp;gt;. I was surprised to learn that no such library exists! &amp;nbsp;I never imagined a computational platform like Xeon Phi would not support complex numbers at a deep level. And no contributor has written such a library, either.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The closest thing I found was &lt;A href="http://www.agner.org/optimize/#vectorclass"&gt;Agner Fog's vector class library&lt;/A&gt;&amp;nbsp;which provides tools from which one could build a vectorized complex number library. I found two very illuminating posts describing how one would vectorize complex multiplication by &lt;A href="https://www.codeproject.com/articles/874396/crunching-numbers-with-avx-and-avx"&gt;Matt Scarpino&lt;/A&gt; and &lt;A href="http://stackoverflow.com/questions/39509746/how-to-square-two-complex-doubles-with-256-bit-avx-vectors/39521257#39521257"&gt;Peter Cordes&lt;/A&gt;. These two posts outline two methods that rely on Intel Intrinsic functions. Even if you don't want to program with intrinsics, I encourage you to study those examples because it gives good hints on why it is hard to vectorize complex multiply and how to overcome that hardship.&lt;/P&gt;

&lt;P&gt;For now, I am hoping that after study of the intrinsics solutions I can write some regular C++ code that gives the compiler enough hints to autovectorize&amp;nbsp;my computation.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Mar 2017 20:00:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113261#M73125</guid>
      <dc:creator>Gerald_H_</dc:creator>
      <dc:date>2017-03-15T20:00:19Z</dc:date>
    </item>
    <item>
      <title>Also there is nothing magical</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113262#M73126</link>
      <description>&lt;P&gt;Also there is nothing magical about the MKL complex number class. The complex class in the standard template library is the best one available.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Mar 2017 20:05:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113262#M73126</guid>
      <dc:creator>Gerald_H_</dc:creator>
      <dc:date>2017-03-15T20:05:57Z</dc:date>
    </item>
    <item>
      <title>I don't use C++ classes, but</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113263#M73127</link>
      <description>&lt;P&gt;I don't use C++ classes, but in plain C code I found that the compiler's performance for complex numbers in the standard interleaved format was often quite disappointing.&amp;nbsp; For almost all of my signal-processor codes, it was faster to "de-interleave" the input data into separate real and imaginary arrays, perform the complex arithmetic "manually", and then interleave the separate output arrays into the standard interleaved format.&amp;nbsp; (Obviously, if you don't have to put the data back in interleaved format between steps, you can save even more time.)&lt;/P&gt;</description>
      <pubDate>Thu, 16 Mar 2017 19:30:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113263#M73127</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-03-16T19:30:38Z</dc:date>
    </item>
    <item>
      <title>With the sse3/4 support for</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113264#M73128</link>
      <description>&lt;P&gt;With the sse3/4 support for complex, there is no need to split data. &amp;nbsp;It became a dilemma as to whether avx could gain for complex multiply. &amp;nbsp;The 512 bit formats may well benefit from the split.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Mar 2017 22:59:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113264#M73128</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-03-16T22:59:59Z</dc:date>
    </item>
    <item>
      <title>1. Intel C++ compiler option</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113265#M73129</link>
      <description>&lt;STRONG&gt;1&lt;/STRONG&gt;. Intel C++ compiler option to consider:

-[no-]complex-limited-range
 enable/disable(DEFAULT) the use of the basic algebraic expansions of
 some complex arithmetic operations. This can allow for some
 performance improvement in programs which use a lot of complex
 arithmetic at the loss of some exponent range.

&lt;STRONG&gt;2&lt;/STRONG&gt;. Take a look at &lt;STRONG&gt;LIBIMF&lt;/STRONG&gt; library ( Intel's &lt;STRONG&gt;complex.h&lt;/STRONG&gt; and &lt;STRONG&gt;mathimf.h&lt;/STRONG&gt; headers )

&lt;STRONG&gt;3&lt;/STRONG&gt;. &lt;STRONG&gt;SSE3&lt;/STRONG&gt; ISA didn't introduce any intrinsic functions related to complex numbers. Take a look at &lt;STRONG&gt;pmmintrin.h&lt;/STRONG&gt; header file and &lt;STRONG&gt;SSE3&lt;/STRONG&gt; introduced just &lt;STRONG&gt;13&lt;/STRONG&gt; new intrinsic functions, &lt;STRONG&gt;3&lt;/STRONG&gt; defines and &lt;STRONG&gt;2&lt;/STRONG&gt; run-time macros.</description>
      <pubDate>Fri, 17 Mar 2017 18:48:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113265#M73129</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2017-03-17T18:48:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Or is there a better</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113266#M73130</link>
      <description>&amp;gt;&amp;gt;...Or is there a better class, say the MKL_complex class, that would run faster? 

Take a look at Intel &lt;STRONG&gt;IPP&lt;/STRONG&gt; library and compare performance of &lt;STRONG&gt;IPP&lt;/STRONG&gt;'s complex number functions vs. &lt;STRONG&gt;MKL&lt;/STRONG&gt;'s complex number functions.</description>
      <pubDate>Fri, 17 Mar 2017 18:53:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113266#M73130</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2017-03-17T18:53:41Z</dc:date>
    </item>
    <item>
      <title>Thanks for several good</title>
      <link>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113267#M73131</link>
      <description>&lt;P&gt;Thanks for several good comments.&lt;/P&gt;

&lt;P&gt;When I look at the intrinsic functions for AVX512, I see that they're perfectly happy doing a stride of 2. &lt;SPAN style="font-size: 16.26px;"&gt;Indeed, this is what Matt's algorithm does, albeit for AVX256.&amp;nbsp;&lt;/SPAN&gt;So it shouldn't be very difficult -- let me rephrase that -- it should be possible to write fast complex functions without having to de-interleave. &amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I think it would be helpful to the physics community if Intel were to provide vector support for complex numbers. It certainly would make my code more readable.&lt;/P&gt;

&lt;P&gt;Meanwhile, I guess I have to de-interleave temporarily. It is kind of a hassle because both the input and output of my routine is expected to be interleaved complex.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 19:16:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/complex-number-class-for-optimum-performance/m-p/1113267#M73131</guid>
      <dc:creator>Gerald_H_</dc:creator>
      <dc:date>2017-03-17T19:16:36Z</dc:date>
    </item>
  </channel>
</rss>

