<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I have changed my code to in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137595#M26145</link>
    <description>&lt;P&gt;I have changed my code to something like this:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;const int mklThreads = mkl_get_max_threads();
for (int step =1;step&amp;lt;N_steps; i++){
    serial code: Some linear algebra (SVD/ Pseudoinverse).
    mkl_set_num_threads(1);
    #parallel openmp for
        matrix multiplication
    mkl_set_num_threads(mklThreads);
    }&lt;/PRE&gt;

&lt;P&gt;The problem still persists. &amp;nbsp;I will try to reproduce the issue in a smaller project,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Aug 2019 09:08:09 GMT</pubDate>
    <dc:creator>Ferrazzano__Vincenzo</dc:creator>
    <dc:date>2019-08-21T09:08:09Z</dc:date>
    <item>
      <title>serial vs parellel: different behaviour</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137593#M26143</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;we wrote a header-only library, where we use IntelMKL (wrapped by Armadillo) and open MP in a nested way.&lt;/P&gt;&lt;P&gt;In broad strokes, in the header-only library we do something like this&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark; wrap-lines:false;"&gt;for (int step =1;step&amp;lt;N_steps; i++){
	serial code: Some linear algebra (SVD/ Pseudoinverse).
	#parallel openmp for 
		matrix multiplication
	}
&lt;/PRE&gt;

&lt;P&gt;My projects usually have the following include structure:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; exe &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; using IntelMKL parallel in the VS-&amp;gt;property-&amp;gt;Intel Performance Libraries-&amp;gt; Use Intel MKL&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;^&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; static_lib &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; I compile the header-only library in some function, it includes just IntelMKL headers&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ^&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;header-only &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; including IntelMKL headers&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;we repeat this structure for different project, where the header-only library is in common.&lt;/P&gt;
&lt;P&gt;For &lt;STRONG&gt;SOME&lt;/STRONG&gt; of the projects, the code in the header-only library crashes in some random way, sometimes in the serial part (the SVD fails with message:&amp;nbsp;&lt;BR /&gt;Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.)&lt;/P&gt;
&lt;P&gt;sometimes in the loop, where some out of bound location in vector is accessed. If remove the #openMP pragma, it just fails in the SVD at some point.&lt;/P&gt;
&lt;P&gt;If I switch in the exe options IntelMKL to serial, it works just fine. Behaviour is the same if I include or exclude the OpenMP support from visual studio.&lt;/P&gt;
&lt;P&gt;Any clue on what is causing this? The code spends most of the time in the parallel for, where the intelMKL should be serial anyway,&amp;nbsp; but we would like to use any speedup we can have.&lt;/P&gt;
&lt;P&gt;Our setup:&lt;/P&gt;
&lt;P&gt;C++ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;17&lt;/P&gt;
&lt;P&gt;VS &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;15.9.14&lt;/P&gt;
&lt;P&gt;Intel MKL &amp;nbsp; &amp;nbsp; &amp;nbsp;2019.4.245&lt;/P&gt;
&lt;P&gt;CPU &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; intel Xeon Gold 6126 CPU @ 2.59 GHz&lt;/P&gt;
&lt;P&gt;SO: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Windows 10&lt;/P&gt;
&lt;P&gt;but we&amp;nbsp;had this issue on different machines, and with previous version of VS, intel mkl and on different machines.&lt;/P&gt;
&lt;P&gt;Happy to provide any information you might require.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Aug 2019 11:13:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137593#M26143</guid>
      <dc:creator>Ferrazzano__Vincenzo</dc:creator>
      <dc:date>2019-08-02T11:13:02Z</dc:date>
    </item>
    <item>
      <title>here is the link to the MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137594#M26144</link>
      <description>&lt;P&gt;here is the &lt;A href="https://software.intel.com/en-us/mkl-linux-developer-guide-calling-intel-mkl-functions-from-multi-threaded-applications"&gt;link&lt;/A&gt;&amp;nbsp;to the MKL usage model: disable Intel MKL internal threading for the whole application...&lt;/P&gt;&lt;P&gt;with regard to&amp;nbsp;Intel MKtL ERROR: Parameter 4 was incorrect on entry to DLASCL: this is an unknown issue for MKL 2019. Could you check if the input data doesn't contain NaNs or Infs&lt;/P&gt;&lt;P&gt;in the case if the inputs are correct, could you give us the reproducer when the problem has happened?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Aug 2019 03:42:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137594#M26144</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-08-05T03:42:58Z</dc:date>
    </item>
    <item>
      <title>I have changed my code to</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137595#M26145</link>
      <description>&lt;P&gt;I have changed my code to something like this:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;const int mklThreads = mkl_get_max_threads();
for (int step =1;step&amp;lt;N_steps; i++){
    serial code: Some linear algebra (SVD/ Pseudoinverse).
    mkl_set_num_threads(1);
    #parallel openmp for
        matrix multiplication
    mkl_set_num_threads(mklThreads);
    }&lt;/PRE&gt;

&lt;P&gt;The problem still persists. &amp;nbsp;I will try to reproduce the issue in a smaller project,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Aug 2019 09:08:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137595#M26145</guid>
      <dc:creator>Ferrazzano__Vincenzo</dc:creator>
      <dc:date>2019-08-21T09:08:09Z</dc:date>
    </item>
    <item>
      <title>Hi. I replace the PINV with</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137596#M26146</link>
      <description>&lt;P&gt;Hi.&amp;nbsp;&lt;BR /&gt;I replace the PINV with the MKL only implementation suggested here&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://software.intel.com/en-us/articles/implement-pseudoinverse-of-a-matrix-by-intel-mkl" target="_blank"&gt;https://software.intel.com/en-us/articles/implement-pseudoinverse-of-a-matrix-by-intel-mkl&lt;/A&gt;,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;now, linking against the parallel version&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;makes the&amp;nbsp;&lt;/STRONG&gt;dgesdd&amp;nbsp;routine to return 2.&lt;/P&gt;&lt;P&gt;As I mentioned, this happens only for some projects where we link out library. For others, everything works fine.&lt;/P&gt;&lt;P&gt;Another phenomenon that might hint in the right direction: after more testing/profiling, we realised that the number of threads in our project is not really take into account by intelMKL, even in those project where linking against the parallel version works fine. Regardless of the number of threads selected, performance are the same, although the number of threads seems to be correctly stetted.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We tried setting up the number of threads by any combination of:&lt;BR /&gt;omp_set_num_threads()&lt;/P&gt;&lt;P&gt;mkl_set_num_threads()&lt;/P&gt;&lt;P&gt;mkl_set_local_num_threads()&lt;/P&gt;&lt;P&gt;and setting back the old number of threads after the operation is performed.&lt;/P&gt;&lt;P&gt;To be sure, I saved the matrix, and tried the same function on a "fresh" projects. The performance scale with the number of processors.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Sep 2019 15:16:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137596#M26146</guid>
      <dc:creator>Ferrazzano__Vincenzo</dc:creator>
      <dc:date>2019-09-06T15:16:22Z</dc:date>
    </item>
    <item>
      <title>regarding - dgesdd routine to</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137597#M26147</link>
      <description>&lt;P&gt;regarding -&amp;nbsp;dgesdd&amp;nbsp;routine to return 2 - you may give us the reproducer and we will look at this case on our side.&lt;/P&gt;&lt;P&gt;regard to performance: what is the typical problem size? and how many of omp threads you run?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 07 Sep 2019 04:26:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/serial-vs-parellel-different-behaviour/m-p/1137597#M26147</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-09-07T04:26:53Z</dc:date>
    </item>
  </channel>
</rss>

