<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sequential VML performance in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780877#M1465</link>
    <description>Hello kjpus,&lt;BR /&gt;Could you specify a bit detailed, what version on MKL do you use,is your OS Windows, is yourapplication 32-bit or 64-bit,what processor do you have (Core i7 2960XM or another one).&lt;BR /&gt;Thanks,&lt;BR /&gt;Eugeny.&lt;BR /&gt;</description>
    <pubDate>Tue, 10 Apr 2012 11:09:05 GMT</pubDate>
    <dc:creator>Eugeny_G_Intel</dc:creator>
    <dc:date>2012-04-10T11:09:05Z</dc:date>
    <item>
      <title>Sequential VML performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780873#M1461</link>
      <description>I've been evaluating MKL for a few days. I am really surprised to find out sequential VML performance isVERY bad. Attached is a little sample code. When I multiply two complex vectors, VML takes ~5 times longer compared toVS2010 generated code to finish on a Dell Precision laptop (core i7 CPU). I wonder what I was doing wrong to have that kind of performance penalty. I've attahced the code I used. &lt;BR /&gt;&lt;BR /&gt;TIA.</description>
      <pubDate>Fri, 30 Mar 2012 13:46:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780873#M1461</guid>
      <dc:creator>kjpus</dc:creator>
      <dc:date>2012-03-30T13:46:24Z</dc:date>
    </item>
    <item>
      <title>Sequential VML performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780874#M1462</link>
      <description>VC++ would not be able to optimize away your outer timed loop for the VML call, as it might do for the in-line code.&lt;BR /&gt;VML doesn't do anything magic which you couldn't accomplish with OpenMP and a vectorizing compiler.&lt;BR /&gt;</description>
      <pubDate>Sat, 31 Mar 2012 12:42:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780874#M1462</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-03-31T12:42:06Z</dc:date>
    </item>
    <item>
      <title>Sequential VML performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780875#M1463</link>
      <description>&lt;P&gt;Hello. Youcomparenave implementation of complex double precision multiplication with VML HA-version implementation (that it slow, but accurate). VML provides fast nave implementation too, as EP-version.&lt;BR /&gt;&lt;BR /&gt;To use VML EP-version of complex double multiplicationinstead of HA-version,you can change default VML mode to EP by calling vmlsetmode(VML_EP) before call to vzmul, or just replace&lt;BR /&gt; vzmul(&amp;amp;size,(MKL_Complex16*)buf1,(MKL_Complex16*)buf2,(MKL_Complex16*)buf4);&lt;BR /&gt;by&lt;BR /&gt; vmzmul(&amp;amp;size,(MKL_Complex16*)buf1,(MKL_Complex16*)buf2,(MKL_Complex16*)buf4, VML_EP);&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 12:45:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780875#M1463</guid>
      <dc:creator>Eugeny_G_Intel</dc:creator>
      <dc:date>2012-04-02T12:45:40Z</dc:date>
    </item>
    <item>
      <title>Sequential VML performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780876#M1464</link>
      <description>Thanks, Eugeny.&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Changing to EP mode does bring the VML performance close to the compiler generated code, though still slightly slower.&lt;/DIV&gt;</description>
      <pubDate>Mon, 09 Apr 2012 04:26:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780876#M1464</guid>
      <dc:creator>kjpus</dc:creator>
      <dc:date>2012-04-09T04:26:25Z</dc:date>
    </item>
    <item>
      <title>Sequential VML performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780877#M1465</link>
      <description>Hello kjpus,&lt;BR /&gt;Could you specify a bit detailed, what version on MKL do you use,is your OS Windows, is yourapplication 32-bit or 64-bit,what processor do you have (Core i7 2960XM or another one).&lt;BR /&gt;Thanks,&lt;BR /&gt;Eugeny.&lt;BR /&gt;</description>
      <pubDate>Tue, 10 Apr 2012 11:09:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780877#M1465</guid>
      <dc:creator>Eugeny_G_Intel</dc:creator>
      <dc:date>2012-04-10T11:09:05Z</dc:date>
    </item>
    <item>
      <title>Sequential VML performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780878#M1466</link>
      <description>Hello kjpus,&lt;BR /&gt;I was able to reproduce the issue. It will be fixed in new MKL release.&lt;BR /&gt;Thanks for finding,&lt;BR /&gt;Eugeny.&lt;BR /&gt;</description>
      <pubDate>Tue, 10 Apr 2012 11:39:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780878#M1466</guid>
      <dc:creator>Eugeny_G_Intel</dc:creator>
      <dc:date>2012-04-10T11:39:01Z</dc:date>
    </item>
    <item>
      <title>Hello Eugeny,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780879#M1467</link>
      <description>&lt;P&gt;Hello Eugeny,&lt;/P&gt;

&lt;P&gt;I also meet the problem when&amp;nbsp;I multiply two complex vectors using the vmzMul function.&lt;/P&gt;

&lt;P&gt;I wonder whether&amp;nbsp;the problem is fixed in the MKL version 11.3.2?&lt;/P&gt;

&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Sat, 08 Oct 2016 00:55:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780879#M1467</guid>
      <dc:creator>hao_y_</dc:creator>
      <dc:date>2016-10-08T00:55:15Z</dc:date>
    </item>
    <item>
      <title>Hi Hao, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780880#M1468</link>
      <description>&lt;P&gt;Hi Hao,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The original issue was fixed in early version about 11.0.x. &amp;nbsp;so should in MKL 11.3.2 too.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Do you use the same test with &lt;SPAN style="font-size: 12px;"&gt;VML EP-version&lt;/SPAN&gt;&amp;nbsp; on some machine with MKL 11.3.2. Could you please let us know the OS and processing information you are testing?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Oct 2016 02:59:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780880#M1468</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-10-08T02:59:01Z</dc:date>
    </item>
    <item>
      <title>Thanks, Ying.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780881#M1469</link>
      <description>&lt;P&gt;Thanks, Ying.&lt;/P&gt;

&lt;P&gt;The OS&amp;nbsp;is Linux.&lt;/P&gt;

&lt;P&gt;I use the&amp;nbsp;left code to compute&amp;nbsp;two 16*8 matrices multiplication element by element, and use the right code to test whether the vmzMul function&lt;BR /&gt;
	could run faster than the left one.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Complex ** ppIn1, **ppIn2, **ppOut;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MKL_Complex16 * pIn1, *pIn2, *pOut;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ....&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; len =&amp;nbsp;16*8;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ...&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; double seconds_s1 = dsecnd();&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;double seconds_s2 = dsecnd();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for(int i=0; i&amp;lt;16; i++)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vmzMul(len, pIn1, pIn2, pOut, VML_EP);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; double seconds_e2 = dsecnd() - seconds_s2;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for(int j=0; j&amp;lt;8; j++)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cout &amp;lt;&amp;lt; seconds_e2 &amp;lt;&amp;lt; endl;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ppOut&lt;I&gt;&lt;J&gt; = ppIn1&lt;I&gt;&lt;J&gt; * ppIn2&lt;I&gt;&lt;J&gt;;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; double seconds_e1 = dsecnd() - seconds_s1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cout &amp;lt;&amp;lt; seconds_e1 &amp;lt;&amp;lt; endl;&lt;BR /&gt;
	The result is seconds_s1 = 8.19564e-07, and seconds_s2 = 6.13928e-06. &amp;nbsp;I wonder what I was doing wrong to have&amp;nbsp;this kind of result.&lt;BR /&gt;
	Thank you!&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Oct 2016 06:53:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780881#M1469</guid>
      <dc:creator>hao_y_</dc:creator>
      <dc:date>2016-10-08T06:53:24Z</dc:date>
    </item>
    <item>
      <title>Hi Hao, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780882#M1470</link>
      <description>&lt;P&gt;Hi Hao,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Have you tried the latest version, for example, MKL 2017?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I did a quick test. the performance shows the MK is far fast than direct one.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Intel(R) Math Kernel Library Version 2017.0.0 Beta Update 1 Build 20160513 for&lt;BR /&gt;
	ntel(R) 64 architecture applications&lt;BR /&gt;
	direct &amp;nbsp;: 0.157639&lt;BR /&gt;
	mkl vmzMul : 0.00693191&lt;BR /&gt;
	Press any key to continue . . .&lt;/P&gt;

&lt;P&gt;As&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;the test marix size seem small. &amp;nbsp;I &amp;nbsp;add a few hundred dummy loop iterations around the main computation, just to make it run longer. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;here is my test code. would you please try it and let us know the result?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#include "stdafx.h"

// TODO: reference any additional headers you need in STDAFX.H
// and not in this file
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;random&amp;gt;
#include &amp;lt;ctime&amp;gt;
#include &amp;lt;new&amp;gt;
#include &amp;lt;tuple&amp;gt;
#include &amp;lt;complex&amp;gt; 

#include &amp;lt;mkl.h&amp;gt;
#define LOOP 10000
typedef std::complex&amp;lt;double&amp;gt; Complex;

const MKL_INT Arows = 16, Acols = 8;  
using namespace std;

void Comon_vml(){
 
	Complex ppIn1[Arows][Acols],ppIn2[Arows][Acols], ppOut[Arows][Acols];

	   for (int i = 0; i &amp;lt; Arows; i++){
         for(int j = 0; j &amp;lt;Acols; j++){
           ppIn1&lt;I&gt;&lt;J&gt; = Complex(i+1,j+1);
           ppIn2&lt;I&gt;&lt;J&gt; = Complex(i+1,j+1);      
         }
       }

	  double seconds_s1 = dsecnd(); 
	  	for (int iter=0; iter&amp;lt;LOOP; iter++){
         for(int i=0; i&amp;lt;Arows; i++)                                                        
         {                                                                                
              for(int j=0; j&amp;lt;Acols; j++)                                              
              {
                    ppOut&lt;I&gt;&lt;J&gt; = ppIn1&lt;I&gt;&lt;J&gt; * ppIn2&lt;I&gt;&lt;J&gt;;
              }
         }
		}
         double seconds_e1 = dsecnd() - seconds_s1;
         cout &amp;lt;&amp;lt; "direct  : " &amp;lt;&amp;lt; seconds_e1 &amp;lt;&amp;lt; endl;
/*
		  std::cout &amp;lt;&amp;lt; "From direct" &amp;lt;&amp;lt; std::endl;
       for (int i = 0; i &amp;lt; Arows; i++){
         for(int j = 0; j &amp;lt; Acols; j++){
			 cout &amp;lt;&amp;lt; "[" &amp;lt;&amp;lt; i &amp;lt;&amp;lt; ", " &amp;lt;&amp;lt; j &amp;lt;&amp;lt; "]" &amp;lt;&amp;lt;ppOut&lt;I&gt;&lt;J&gt; &amp;lt;&amp;lt;"\t";
          
         }
         std::cout &amp;lt;&amp;lt; std::endl;
       }
*/
} 

void mkl_vml(){
/*MKL_Complex16 * pIn1, *pIn2, *pOut;

pIn1 = new MKL_Complex16[len]();
pIn2 = new MKL_Complex16[len]();
pOut = new MKL_Complex16[len]();
*/


	MKL_Complex16 ppIn1[Arows][Acols], ppIn2[Arows][Acols], ppOut[Arows][Acols];

	   for (int i = 0; i &amp;lt; Arows; i++){
         for(int j = 0; j &amp;lt;Acols; j++){
           ppIn1&lt;I&gt;&lt;J&gt;.real = i+1;  ppIn1&lt;I&gt;&lt;J&gt;.imag = j+1;
           ppIn2&lt;I&gt;&lt;J&gt;.real = i+1;  ppIn2&lt;I&gt;&lt;J&gt;.imag = j+1;     
         }
       }
 
 MKL_INT len = Arows*Acols;
    double seconds_s2 = dsecnd();
	for (int i=0; i&amp;lt;LOOP; i++)
		   vmzMul(len, &amp;amp;ppIn1[0][0], &amp;amp;ppIn2[0][0], &amp;amp;ppOut[0][0], VML_EP);
   double seconds_e2 = dsecnd() - seconds_s2;
   cout &amp;lt;&amp;lt; "mkl vmzMul : " &amp;lt;&amp;lt; seconds_e2 &amp;lt;&amp;lt; endl;
/*
      std::cout &amp;lt;&amp;lt; "From vmzMul" &amp;lt;&amp;lt; std::endl;
       for (int i = 0; i &amp;lt; Arows; i++){
         for(int j = 0; j &amp;lt; Acols; j++){
           cout &amp;lt;&amp;lt; "[" &amp;lt;&amp;lt; i &amp;lt;&amp;lt; ", " &amp;lt;&amp;lt; j &amp;lt;&amp;lt; "]" &amp;lt;&amp;lt; "(" &amp;lt;&amp;lt; ppOut&lt;I&gt;&lt;J&gt;.real &amp;lt;&amp;lt; "," &amp;lt;&amp;lt; ppOut&lt;I&gt;&lt;J&gt;.imag &amp;lt;&amp;lt; ")" &amp;lt;&amp;lt; "\t" ;
         }
         std::cout &amp;lt;&amp;lt; std::endl;
       }

*/

		 }


int main(void) {

int len=198;
char buf[198];
mkl_get_version_string(buf, len);
cout &amp;lt;&amp;lt; buf &amp;lt;&amp;lt;endl;

	Comon_vml();
       mkl_vml();
   return 0;
       
}&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Ying&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 03:49:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sequential-VML-performance/m-p/780882#M1470</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-10-11T03:49:16Z</dc:date>
    </item>
  </channel>
</rss>

