<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Performance differences look in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979802#M17375</link>
    <description>&lt;P&gt;Performance differences look small enough that they could be due to any of several factors:&lt;/P&gt;

&lt;P&gt;1) apparently, you didn't invoke auto-vectorization. &amp;nbsp;Even gprof ought to show whether that makes a difference.&lt;/P&gt;

&lt;P&gt;2) differences (possibly accidental) in data alignment or total cache usage&lt;/P&gt;

&lt;P&gt;.....&lt;/P&gt;</description>
    <pubDate>Sun, 09 Feb 2014 17:53:14 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2014-02-09T17:53:14Z</dc:date>
    <item>
      <title>A'*B using mkl_dcscmm</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979801#M17374</link>
      <description>&lt;P&gt;I tried mkl_dcscmm to compute both A*B and A'*B&amp;nbsp; using a Matlab mex file (64-bit Linux, Matlab 2013a and 2013b) similar to the code posted in&lt;BR /&gt;
	&lt;A href="http://software.intel.com/en-us/forums/topic/472320" target="_blank"&gt;http://software.intel.com/en-us/forums/topic/472320&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;MKL is faster than matlab's own implemention on A*B. It is strange that MKL is slower than matlab's version on A'*B and the results are slightly different.&lt;BR /&gt;
	&lt;BR /&gt;
	(the first column of cpu is from matlab's implementation and the second column is from MKL)&lt;BR /&gt;
	seed:&amp;nbsp; 76080079, A*B: err 0.00e+00, cpu (0.91, 0.44), A'*B: err 1.43e-09, cpu (0.76, 0.71)&lt;BR /&gt;
	seed:&amp;nbsp; 66432737, A*B: err 0.00e+00, cpu (0.91, 0.43), A'*B: err 1.43e-09, cpu (0.75, 0.79)&lt;BR /&gt;
	seed:&amp;nbsp; 90643494, A*B: err 0.00e+00, cpu (0.92, 0.45), A'*B: err 1.43e-09, cpu (0.77, 0.88)&lt;BR /&gt;
	seed:&amp;nbsp; 75317986, A*B: err 0.00e+00, cpu (0.94, 0.46), A'*B: err 1.45e-09, cpu (0.75, 0.82)&lt;BR /&gt;
	seed:&amp;nbsp; 31023079, A*B: err 0.00e+00, cpu (0.92, 0.42), A'*B: err 1.43e-09, cpu (0.75, 0.80)&lt;BR /&gt;
	seed:&amp;nbsp; 86467634, A*B: err 0.00e+00, cpu (0.94, 0.48), A'*B: err 1.44e-09, cpu (0.76, 0.86)&lt;BR /&gt;
	seed:&amp;nbsp; 19834911, A*B: err 0.00e+00, cpu (0.93, 0.61), A'*B: err 1.42e-09, cpu (0.78, 0.76)&lt;BR /&gt;
	seed:&amp;nbsp; 79273667, A*B: err 0.00e+00, cpu (0.93, 0.48), A'*B: err 1.43e-09, cpu (0.75, 0.82)&lt;BR /&gt;
	seed:&amp;nbsp; 11976366, A*B: err 0.00e+00, cpu (0.93, 0.45), A'*B: err 1.42e-09, cpu (0.78, 0.89)&lt;BR /&gt;
	seed:&amp;nbsp; 16420430, A*B: err 0.00e+00, cpu (0.92, 0.40), A'*B: err 1.43e-09, cpu (0.75, 0.80)&lt;/P&gt;

&lt;P&gt;My codes are attached. It can be compiled as&lt;BR /&gt;
	mex -O&amp;nbsp; -largeArrayDims&amp;nbsp; -output sfmult mkl-sfmult-v1.cpp&amp;nbsp;&lt;BR /&gt;
	A*B and A'*B can be computed as sfmult(A, B, 1) and sfmult(A, B, 2), respectively.&lt;/P&gt;

&lt;P&gt;Although A'*B can also be computed as sfmult(A', B, 1) by first doing the transpose, it is better to provide the A matrix and use the flag of transpose inside mkl_dcscmm.&lt;BR /&gt;
	&lt;BR /&gt;
	Any suggestion or comment is welcome. Thanks!&lt;/P&gt;</description>
      <pubDate>Sun, 09 Feb 2014 12:44:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979801#M17374</guid>
      <dc:creator>Zaiwen</dc:creator>
      <dc:date>2014-02-09T12:44:40Z</dc:date>
    </item>
    <item>
      <title>Performance differences look</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979802#M17375</link>
      <description>&lt;P&gt;Performance differences look small enough that they could be due to any of several factors:&lt;/P&gt;

&lt;P&gt;1) apparently, you didn't invoke auto-vectorization. &amp;nbsp;Even gprof ought to show whether that makes a difference.&lt;/P&gt;

&lt;P&gt;2) differences (possibly accidental) in data alignment or total cache usage&lt;/P&gt;

&lt;P&gt;.....&lt;/P&gt;</description>
      <pubDate>Sun, 09 Feb 2014 17:53:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979802#M17375</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-02-09T17:53:14Z</dc:date>
    </item>
    <item>
      <title>Thanks a lot for the quick</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979803#M17376</link>
      <description>&lt;P&gt;Thanks a lot for the quick feedback. I am quite confused by the two factors.&lt;BR /&gt;
	*) Why the correctness depends on auto-vectorization and data alignment or total cache usage?&lt;BR /&gt;
	*) The error of A'*B can become larger if the size of the matrix increases. But the results of A*B are the same as these computed by Matlab.&lt;BR /&gt;
	*) Since these two operations have to be called tens to hundreds of times in my application, the performance differences can be quite large. Hence, I hope to first figure out the reason in some simple random examples.&lt;BR /&gt;
	&lt;BR /&gt;
	I am wondering if there is a bug in mkl_dcscmm when the transpose of A is used.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Feb 2014 01:20:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979803#M17376</guid>
      <dc:creator>Zaiwen</dc:creator>
      <dc:date>2014-02-11T01:20:06Z</dc:date>
    </item>
    <item>
      <title>Please refer the article on</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979804#M17377</link>
      <description>&lt;P&gt;Please refer the article on the MKL feature called Conditional Numerical Reproducibility to get more details on causes of incorrect results&amp;nbsp;.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/en-us/articles/introduction-to-the-conditional-numerical-reproducibility-cnr"&gt;http://software.intel.com/en-us/articles/introduction-to-the-conditional-numerical-reproducibility-cnr&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;From MKL 11.1 onwards, we also support CNR mode on unaligned data.&amp;nbsp; Can you try MKL 11.1 and see you still see the problem?&lt;/P&gt;

&lt;P&gt;--Vipin&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Mar 2014 05:20:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/A-B-using-mkl-dcscmm/m-p/979804#M17377</guid>
      <dc:creator>VipinKumar_E_Intel</dc:creator>
      <dc:date>2014-03-06T05:20:30Z</dc:date>
    </item>
  </channel>
</rss>

