<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic First test show bad performace - what's wrong? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913121#M12314</link>
    <description>I just installed MKL 9.1.027 and wanted to try it out with c++, visual studio 2005.&lt;BR /&gt;&lt;BR /&gt;First I made this simple wrapper namespace:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;SPAN&gt;namespace mkl&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;struct vec3f&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; {&lt;/SPAN&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  vec3f(const float x, const float y, const float z)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  {&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;   e[0] = x;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;   e[1] = y;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;   e[2] = z;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  }&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  float e[4];&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; };&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;inline void sqrt(const vec3f &amp;amp;in, vec3f &amp;amp;out)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; {&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  vsSqrt(3, in.e, out.e);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; }&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/FONT&gt;








&lt;BR /&gt;
&lt;BR /&gt;&lt;BR /&gt;Then I also wrote this to compare with:&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;SPAN&gt;void oldSqrt(mkl::vec3f &amp;amp;in, mkl::vec3f &amp;amp;out)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; out.e[0] = sqrtf(in.e[0]);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; out.e[1] = sqrtf(in.e[1]);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; out.e[2] = sqrtf(in.e[2]);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/FONT&gt;


&lt;BR /&gt;&lt;BR /&gt;And this is the testing code (I changed the function call and timed the different runs):&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;SPAN&gt;mkl::vec3f v(9.0f, 0.0f, 100.0f);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;mkl::vec3f v2(0.0f, 0.0f, 0.0f);&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;for (unsigned int i = 0; i &amp;lt; 10000000; ++i)&lt;/SPAN&gt;&lt;BR /&gt;&lt;/FONT&gt;

&lt;SPAN&gt;&lt;FONT size="2"&gt;  mkl::sqrt(v,v2);&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Now, the run time for the standard sqrtf was 0.0003 seconds, but for the MKL version I had to wait 1.4 seconds! Why is this? These are my additional dependencies:&lt;BR /&gt;mkl_c_dll.lib&lt;BR /&gt;mkl_ia32.lib&lt;BR /&gt;libguide40.lib&lt;BR /&gt;</description>
    <pubDate>Thu, 13 Mar 2008 00:33:01 GMT</pubDate>
    <dc:creator>akerlund</dc:creator>
    <dc:date>2008-03-13T00:33:01Z</dc:date>
    <item>
      <title>First test show bad performace - what's wrong?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913121#M12314</link>
      <description>I just installed MKL 9.1.027 and wanted to try it out with c++, visual studio 2005.&lt;BR /&gt;&lt;BR /&gt;First I made this simple wrapper namespace:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;SPAN&gt;namespace mkl&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;struct vec3f&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; {&lt;/SPAN&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  vec3f(const float x, const float y, const float z)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  {&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;   e[0] = x;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;   e[1] = y;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;   e[2] = z;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  }&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  float e[4];&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; };&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;inline void sqrt(const vec3f &amp;amp;in, vec3f &amp;amp;out)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; {&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;  vsSqrt(3, in.e, out.e);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; }&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/FONT&gt;








&lt;BR /&gt;
&lt;BR /&gt;&lt;BR /&gt;Then I also wrote this to compare with:&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;SPAN&gt;void oldSqrt(mkl::vec3f &amp;amp;in, mkl::vec3f &amp;amp;out)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; out.e[0] = sqrtf(in.e[0]);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; out.e[1] = sqrtf(in.e[1]);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; out.e[2] = sqrtf(in.e[2]);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/FONT&gt;


&lt;BR /&gt;&lt;BR /&gt;And this is the testing code (I changed the function call and timed the different runs):&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;SPAN&gt;mkl::vec3f v(9.0f, 0.0f, 100.0f);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;mkl::vec3f v2(0.0f, 0.0f, 0.0f);&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;for (unsigned int i = 0; i &amp;lt; 10000000; ++i)&lt;/SPAN&gt;&lt;BR /&gt;&lt;/FONT&gt;

&lt;SPAN&gt;&lt;FONT size="2"&gt;  mkl::sqrt(v,v2);&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Now, the run time for the standard sqrtf was 0.0003 seconds, but for the MKL version I had to wait 1.4 seconds! Why is this? These are my additional dependencies:&lt;BR /&gt;mkl_c_dll.lib&lt;BR /&gt;mkl_ia32.lib&lt;BR /&gt;libguide40.lib&lt;BR /&gt;</description>
      <pubDate>Thu, 13 Mar 2008 00:33:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913121#M12314</guid>
      <dc:creator>akerlund</dc:creator>
      <dc:date>2008-03-13T00:33:01Z</dc:date>
    </item>
    <item>
      <title>Re: First test show bad performace - what's wrong?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913122#M12315</link>
      <description>&lt;P&gt;akerlund,&lt;/P&gt;
&lt;P&gt;you are trying to calculate vsSqrt on very short vector. For most cases you will not receive performance gain from VML usage for such small vectors. Try biggervectors - with 100 elements or more.&lt;/P&gt;
&lt;P&gt;Andrey&lt;/P&gt;</description>
      <pubDate>Fri, 14 Mar 2008 06:17:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913122#M12315</guid>
      <dc:creator>Andrey_G_Intel2</dc:creator>
      <dc:date>2008-03-14T06:17:38Z</dc:date>
    </item>
    <item>
      <title>Re: First test show bad performace - what's wrong?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913123#M12316</link>
      <description>Only somewhere between 100k and 500k floats, I see that vsSqrt runs faster. Is this right? Here is the new code I am testing with:&lt;BR /&gt;&lt;BR /&gt;int howMany;&lt;BR /&gt; cin &amp;gt;&amp;gt; howMany;&lt;BR /&gt; float *numbersIn = new float[howMany];&lt;BR /&gt; float *numbersOut = new float[howMany];&lt;BR /&gt; for (int i = 0; i &amp;lt; howMany; ++i)&lt;BR /&gt;  numbersIn&lt;I&gt; = (1.0f / RAND_MAX) * rand();&lt;BR /&gt;&lt;BR /&gt; Timer tm;&lt;BR /&gt; &lt;BR /&gt; //vsSqrt(howMany, numbersIn, numbersOut); &lt;BR /&gt; for (int i = 0; i &amp;lt; howMany; ++i)&lt;BR /&gt;  numbersOut&lt;I&gt; = sqrtf(numbersIn&lt;I&gt;);&lt;BR /&gt;&lt;BR /&gt; float a = 0.0f;&lt;BR /&gt; for (int i = 0; i &amp;lt; howMany; ++i)&lt;BR /&gt;  a += numbersOut&lt;I&gt;;&lt;BR /&gt;&lt;BR /&gt; tm.Now();&lt;BR /&gt; printf("Time: %f | a: %f
", 1000.0f * tm.TimeElapsed(), a);&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;</description>
      <pubDate>Mon, 24 Mar 2008 15:02:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913123#M12316</guid>
      <dc:creator>akerlund</dc:creator>
      <dc:date>2008-03-24T15:02:28Z</dc:date>
    </item>
    <item>
      <title>Re: First test show bad performace - what's wrong?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913124#M12317</link>
      <description>Your un-optimized sum reduction will take a significant part of the time, as well as likely producing insufficient accuracy, for such long vectors. Certainly, it would take rather long vectors before VML sqrt() could compete with optimized source code. If you are interested in performance on such code, you should consider SSE parallel intrinsics, or a vectorizing compiler.&lt;BR /&gt;</description>
      <pubDate>Mon, 24 Mar 2008 16:21:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913124#M12317</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2008-03-24T16:21:47Z</dc:date>
    </item>
    <item>
      <title>Re: First test show bad performace - what's wrong?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913125#M12318</link>
      <description>&lt;P&gt;Regarding your first test case — I am not sure how MKL handles floating point exceptions but that may as well be the cause of the slowdown. Try picking numbers so as to avoid denormals after repeatedly calculating square root for many iterations.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Mar 2008 08:03:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/First-test-show-bad-performace-what-s-wrong/m-p/913125#M12318</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2008-03-25T08:03:19Z</dc:date>
    </item>
  </channel>
</rss>

