<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Bjoern, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954566#M15417</link>
    <description>&lt;P&gt;Bjoern,&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;pls try to set &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="kwd" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;mkl_set_num_threads&lt;/SPAN&gt;&lt;SPAN class="sep" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;( #num of cores on your system&lt;/SPAN&gt;&lt;SPAN class="sep" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;&amp;nbsp;) before C=prod(A,B).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN class="sep" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;What would you see in then case?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 10 Jan 2014 11:23:42 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2014-01-10T11:23:42Z</dc:date>
    <item>
      <title>MKL sgemm executed serial</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954565#M15416</link>
      <description>&lt;P&gt;Hi people,&lt;/P&gt;

&lt;P&gt;I am struggling with the threading of SGEMM in MKL called from C++. I am working on a rather big software that is parallelized with OpenMP. In one of the functions, I first set up matrices A (14000,1300)&amp;nbsp; and B (1300,14000) within an OpenMP loop, and then want to calculate the product.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;#pragma openmp parallel for&lt;/P&gt;

	&lt;P&gt;for (.....){ fill A,B }&lt;/P&gt;
	C=prod(A,B);&lt;/BLOCKQUOTE&gt;

&lt;P&gt;prod() is supposed to call SGEMM. I also tried cblas_sgemm directly. In both cases, SGEMM is executed on one thread only, even though in other parts of the code, MKL calls to LAPACK are threaded. Any idea of why it switches to serial here?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2014 11:07:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954565#M15416</guid>
      <dc:creator>Bjoern_B_</dc:creator>
      <dc:date>2014-01-10T11:07:41Z</dc:date>
    </item>
    <item>
      <title>Bjoern,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954566#M15417</link>
      <description>&lt;P&gt;Bjoern,&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;pls try to set &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="kwd" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;mkl_set_num_threads&lt;/SPAN&gt;&lt;SPAN class="sep" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;( #num of cores on your system&lt;/SPAN&gt;&lt;SPAN class="sep" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;&amp;nbsp;) before C=prod(A,B).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN class="sep" style="font-family: 'Courier New', Courier, monospace; color: rgb(51, 51, 51); font-size: 13.600000381469727px; line-height: 20px;"&gt;What would you see in then case?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2014 11:23:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954566#M15417</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2014-01-10T11:23:42Z</dc:date>
    </item>
    <item>
      <title>Thanks Gennady,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954567#M15418</link>
      <description>&lt;P&gt;Thanks Gennady,&lt;/P&gt;

&lt;P&gt;turns out it was not a problem with MKL but with cmake caused linking of both GSL and MKL cblas, and the overloaded prod() took SGEMM from GSL...&lt;/P&gt;

&lt;P&gt;I have another question regarding parallelization: Currently I have a construction:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;#pragma omp parallel for&lt;/P&gt;

	&lt;P&gt;for ( int i =0 ,&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;i &amp;lt; &lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;mtot; i++) { &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P class="p1"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; B&lt;I&gt;&amp;nbsp;= &lt;SPAN class="s2"&gt;ub&lt;/SPAN&gt;::prod(A, B[ i ]); &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/I&gt;&lt;/P&gt;

	&lt;P class="p2"&gt;&lt;SPAN class="s5"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;} &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Is it advisable to do this? Will every OMP thread try to execute a threaded prod()? Then I assume that means threads getting into conflict. In my case, A is 2700x2700 and each B 2700x2300 matrix.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Jan 2014 13:25:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemm-executed-serial/m-p/954567#M15418</guid>
      <dc:creator>Bjoern_B_</dc:creator>
      <dc:date>2014-01-11T13:25:00Z</dc:date>
    </item>
  </channel>
</rss>

