<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 10.1 + VML vzabs seems slow in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891955#M10482</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/404361"&gt;Ilya Burylov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;P&gt;&lt;A href="http://software.intel.com/en-us/profile/371334/"&gt;vasci_intel&lt;/A&gt;, we would like to reproduce this result, can you please specify some details?&lt;BR /&gt;What kind of loop you used for comparison:&lt;BR /&gt;1) for(;;) { sqrt(Im(z)*Im(z) + Re(z)*Re(z)); }&lt;BR /&gt;2) for(;;) { cabs(z); }&lt;BR /&gt;3) Something else&lt;BR /&gt;Was this loop vectorized by compiler?&lt;BR /&gt;What version of compilerwas used?&lt;BR /&gt;And what system do you use (IA32/Intel 64, CPU)?&lt;BR /&gt;Will try to look deeper in the case.&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
I am using Intel 10.1.025 C++ Windows x86_64 compiler on Pentium D (Intel 64). My code is actually a fairly complete C++ library matrix library using a custom "Complex" class (not std::complex). The core routine is simply a sqrt( re*re + im*im). It appears the Intel compiler is doing a very good job of inlining the various C++ calls&lt;BR /&gt;&lt;BR /&gt;Given the above, the sample code is non-trivial to generate, but I will take the time to do so and submit it here.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 10 Nov 2009 22:30:46 GMT</pubDate>
    <dc:creator>AndrewC</dc:creator>
    <dc:date>2009-11-10T22:30:46Z</dc:date>
    <item>
      <title>10.1 + VML vzabs seems slow</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891952#M10479</link>
      <description>I am using vzabs ( MKL 10.1) to take absolute values abs=sqrt(r*r + i*i) of vectors (10000) of double complex data. Suprisingly a single call to vzabs is slower by a factor of 1.5 than a simple C++ 'for' loop implementation.&lt;BR /&gt;Switching to 10.2, shows that vzabs takes advantage of threading, but only just matches the single threaded C++ implementation in wall clock time.</description>
      <pubDate>Fri, 06 Nov 2009 20:06:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891952#M10479</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2009-11-06T20:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: 10.1 + VML vzabs seems slow</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891953#M10480</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
If you are writing it out in the form you quote, with no protection against over/underflow, your own vectorized code should give full performance, and your test loop may be long enough to show a gain with threaded parallelization.&lt;BR /&gt;</description>
      <pubDate>Sat, 07 Nov 2009 01:02:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891953#M10480</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-07T01:02:48Z</dc:date>
    </item>
    <item>
      <title>Re: 10.1 + VML vzabs seems slow</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891954#M10481</link>
      <description>&lt;P&gt;&lt;A href="http://software.intel.com/en-us/profile/371334/"&gt;vasci_intel&lt;/A&gt;, we would like to reproduce this result, can you please specify some details?&lt;BR /&gt;What kind of loop you used for comparison:&lt;BR /&gt;1) for(;;) { sqrt(Im(z)*Im(z) + Re(z)*Re(z)); }&lt;BR /&gt;2) for(;;) { cabs(z); }&lt;BR /&gt;3) Something else&lt;BR /&gt;Was this loop vectorized by compiler?&lt;BR /&gt;What version of compilerwas used?&lt;BR /&gt;And what system do you use (IA32/Intel 64, CPU)?&lt;BR /&gt;Will try to look deeper in the case.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2009 15:58:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891954#M10481</guid>
      <dc:creator>Ilya_B_Intel</dc:creator>
      <dc:date>2009-11-09T15:58:37Z</dc:date>
    </item>
    <item>
      <title>Re: 10.1 + VML vzabs seems slow</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891955#M10482</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/404361"&gt;Ilya Burylov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;P&gt;&lt;A href="http://software.intel.com/en-us/profile/371334/"&gt;vasci_intel&lt;/A&gt;, we would like to reproduce this result, can you please specify some details?&lt;BR /&gt;What kind of loop you used for comparison:&lt;BR /&gt;1) for(;;) { sqrt(Im(z)*Im(z) + Re(z)*Re(z)); }&lt;BR /&gt;2) for(;;) { cabs(z); }&lt;BR /&gt;3) Something else&lt;BR /&gt;Was this loop vectorized by compiler?&lt;BR /&gt;What version of compilerwas used?&lt;BR /&gt;And what system do you use (IA32/Intel 64, CPU)?&lt;BR /&gt;Will try to look deeper in the case.&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
I am using Intel 10.1.025 C++ Windows x86_64 compiler on Pentium D (Intel 64). My code is actually a fairly complete C++ library matrix library using a custom "Complex" class (not std::complex). The core routine is simply a sqrt( re*re + im*im). It appears the Intel compiler is doing a very good job of inlining the various C++ calls&lt;BR /&gt;&lt;BR /&gt;Given the above, the sample code is non-trivial to generate, but I will take the time to do so and submit it here.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 10 Nov 2009 22:30:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891955#M10482</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2009-11-10T22:30:46Z</dc:date>
    </item>
    <item>
      <title>Re: 10.1 + VML vzabs seems slow</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891956#M10483</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/371334"&gt;vasci_intel&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
I am using Intel 10.1.025 C++ Windows x86_64 compiler on Pentium D (Intel 64). My code is actually a fairly complete C++ library matrix library using a custom "Complex" class (not std::complex). The core routine is simply a sqrt( re*re + im*im). It appears the Intel compiler is doing a very good job of inlining the various C++ calls&lt;BR /&gt;&lt;BR /&gt;Given the above, the sample code is non-trivial to generate, but I will take the time to do so and submit it here.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I have done some more investigation on this...&lt;BR /&gt;The environment is actually&lt;BR /&gt;Intel C++ 10.1.025, ia32, MKL 10.2 Update 3, Windows XP, Pentium D, 3.00Ghz,Timing is done for size=50000, and a number of loops to get meaningful numbers...&lt;BR /&gt;&lt;BR /&gt;typedef std::complex&lt;DOUBLE&gt; dcomplex;&lt;BR /&gt; dcomplex * a = new dcomplex[size];&lt;BR /&gt; double * b = new double[size];&lt;BR /&gt;&lt;BR /&gt; for(int i=0; i&lt;SIZE&gt;&lt;/SIZE&gt; a&lt;I&gt;=dcomplex(4.0,3.0);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;test 1 , using std::complex&lt;BR /&gt; for(int i=0; i&lt;SIZE&gt;&lt;/SIZE&gt; b&lt;I&gt;=abs(a&lt;I&gt;);&lt;BR /&gt; }&lt;BR /&gt;CPU Time : 21.57 s ( wallclock = 21.81)&lt;BR /&gt;&lt;BR /&gt;test 2 using inline naive abs() function ( more similar to my custom C++ Complex library)&lt;BR /&gt; for(int i=0; i&lt;SIZE&gt;&lt;/SIZE&gt; const dcomplex &amp;amp;aa=a&lt;I&gt;;&lt;BR /&gt; double r=aa.real();&lt;BR /&gt; double im=aa.imag();&lt;BR /&gt; b&lt;I&gt;=sqrt(r*r +im*im);&lt;BR /&gt; }&lt;BR /&gt; CPU Time: 0.6875  ( wallclock = 0.687)&lt;BR /&gt;&lt;BR /&gt;test 3 using VML&lt;BR /&gt; vzabs( &amp;amp;size, (MKL_Complex16 *)a,b );&lt;BR /&gt;CPU Time: 1.13 ( wallclock = 0.566)&lt;BR /&gt;&lt;BR /&gt;Looking at the code for std::complex::abs I had not realized there is , in general, a more numerically stable way to do abs() than just the naive implementation. std::complex::abs is doing this, obviously painfully slow. If vzabs is doing similar then vzabs is clearly relatively very fast ( and it is threaded efficiently)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/DOUBLE&gt;</description>
      <pubDate>Tue, 10 Nov 2009 23:28:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891956#M10483</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2009-11-10T23:28:03Z</dc:date>
    </item>
    <item>
      <title>Re: 10.1 + VML vzabs seems slow</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891957#M10484</link>
      <description>&lt;P&gt;vasci_intel,&lt;/P&gt;
&lt;P&gt;Thank you for your answers.&lt;/P&gt;
&lt;P&gt;Yes, vzAbs implements numerically enhanced algorithm for calculation of the result to give accurate answers on all valid arguments, while nave implementation sqrt(r*r +im*im) shows total accuracy loss for about half of representable floating-point numbers.&lt;/P&gt;
&lt;P&gt;Let us consider A=max(abs,abs(im)). If A&amp;lt;2^-538 (roughly), then sqrt(r*r+im*im) will give zero as a result, but correct result should not be smaller than A. If A&amp;gt;2^+512 (roughly), then sqrt(r*r+im*im) will give infinity, while correct result should not be larger than A*sqrt(2). This gives about half of valid FP numbers.&lt;/P&gt;
&lt;P&gt;Still we appreciate your input and if nave implementation is sufficient for your needs then we can consider more optimizations efforts in VML relaxed accuracy modes&lt;STRONG&gt;&lt;EM&gt; &lt;/EM&gt;&lt;/STRONG&gt;for vzAbs. Do you know if vzAbs is in yours application hotspot?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2009 14:22:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/10-1-VML-vzabs-seems-slow/m-p/891957#M10484</guid>
      <dc:creator>Ilya_B_Intel</dc:creator>
      <dc:date>2009-11-11T14:22:00Z</dc:date>
    </item>
  </channel>
</rss>

