<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic dtrnlsp_solve spinning/sleeping when called from multiple threads in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017287#M19542</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I am using the trust region solver in MKL and having issues where dtrnlsp_solve takes significantly longer to complete. &amp;nbsp; I have got many threads that all need to run an optimization using the trust region solver, each optimization problem has about 200 residuals and about 40-70 unknowns. &amp;nbsp;When I get to a high number of threads needing to perform the optimization I start to see (though concurrency profiling) that many of the threads are blocked in the solve for up to 20 times longer than a normal solve. &amp;nbsp;I start to see this behaviour when I have about 40-60 threads which could call the trust region solver. &amp;nbsp;I have tried two versions of MKL. &amp;nbsp;Initially I was using version 11.1.2 &amp;nbsp;and seeing the trust region threads spinning with a call stack ending in mkl_serv_lock &amp;lt;- mkl_serv_deallocate. &amp;nbsp;I then tried version 11.3.0 and saw the threads spinning or sleeping in tbb under mkl_serv_allocate. &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I'm using external threading so I am running MKL in sequential mode. &amp;nbsp;I'm also using the tbb allocator and 64 bit versions.&lt;/P&gt;

&lt;P&gt;Ideally I would like to find a solution that works for MKL version 11.1.2. &amp;nbsp;There appears to be a small change in the solution produced by the optimization between 11.1.2 and 11.3.0 with the older version appearing to converge to a smaller overall error.&lt;/P&gt;

&lt;P&gt;Thanks in advance&lt;/P&gt;

&lt;P&gt;Steven&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 08 Sep 2015 12:50:14 GMT</pubDate>
    <dc:creator>Steven_H_1</dc:creator>
    <dc:date>2015-09-08T12:50:14Z</dc:date>
    <item>
      <title>dtrnlsp_solve spinning/sleeping when called from multiple threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017287#M19542</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I am using the trust region solver in MKL and having issues where dtrnlsp_solve takes significantly longer to complete. &amp;nbsp; I have got many threads that all need to run an optimization using the trust region solver, each optimization problem has about 200 residuals and about 40-70 unknowns. &amp;nbsp;When I get to a high number of threads needing to perform the optimization I start to see (though concurrency profiling) that many of the threads are blocked in the solve for up to 20 times longer than a normal solve. &amp;nbsp;I start to see this behaviour when I have about 40-60 threads which could call the trust region solver. &amp;nbsp;I have tried two versions of MKL. &amp;nbsp;Initially I was using version 11.1.2 &amp;nbsp;and seeing the trust region threads spinning with a call stack ending in mkl_serv_lock &amp;lt;- mkl_serv_deallocate. &amp;nbsp;I then tried version 11.3.0 and saw the threads spinning or sleeping in tbb under mkl_serv_allocate. &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I'm using external threading so I am running MKL in sequential mode. &amp;nbsp;I'm also using the tbb allocator and 64 bit versions.&lt;/P&gt;

&lt;P&gt;Ideally I would like to find a solution that works for MKL version 11.1.2. &amp;nbsp;There appears to be a small change in the solution produced by the optimization between 11.1.2 and 11.3.0 with the older version appearing to converge to a smaller overall error.&lt;/P&gt;

&lt;P&gt;Thanks in advance&lt;/P&gt;

&lt;P&gt;Steven&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2015 12:50:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017287#M19542</guid>
      <dc:creator>Steven_H_1</dc:creator>
      <dc:date>2015-09-08T12:50:14Z</dc:date>
    </item>
    <item>
      <title>Stephen,  How many of</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017288#M19543</link>
      <description>&lt;P&gt;Stephen, &amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;How many of external threads You create while calling the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;sequential version of mkl's routine&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 14.4px;"&gt;?&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;and How many of threads available on your system?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2015 06:51:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017288#M19543</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2015-09-09T06:51:03Z</dc:date>
    </item>
    <item>
      <title>Hi Gennady</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017289#M19544</link>
      <description>&lt;P&gt;Hi Gennady&lt;/P&gt;

&lt;P&gt;Thanks for getting back to me.&amp;nbsp;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;&amp;nbsp;I originally noticed the problem in a application with about 60 external threads calling MKL for part of their processing. This was running on an 8 core i7 with hypethreading.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I have now run tests on three different computers. &amp;nbsp;One with quad core i7, 8Gb RAM, hyperthreading turned off. Second with 8 core i7, 16Gb RAM, hyperthreading turned on. Third with two 6 core Xeon, 12Gb RAM, hyperthreading turned off. &amp;nbsp;I have run the same test on all three with 4, 8, 16, 20, 40 and 80 threads. &amp;nbsp;In each test the total computation required is the same. For all three machines I see very similar behaviour. &amp;nbsp;In the following results the processing times are approximate and relative to the processing time for 4 threads on that machine. These results were collected with MKL 11.1.2. &amp;nbsp;These are results from my test setup.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Quad core&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Threads &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Processing Time &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Locking Observed&lt;/P&gt;

&lt;P&gt;4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P&gt;8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.9 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P&gt;16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.9 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P&gt;20 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.9 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Infrequent&lt;/P&gt;

&lt;P&gt;40 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Yes&lt;/P&gt;

&lt;P&gt;80 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Yes&lt;/P&gt;

&lt;P&gt;8 Core&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;Threads &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Processing Time &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Locking Observed&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.6 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;20 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Infrequent&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;40 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.5 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Yes&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;80 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Yes&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;12 Core&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;Threads &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Processing Time &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Locking Observed&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.6 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;No&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Infrequent&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;20 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Infrequent&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;40 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Yes&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;80 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Yes&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2015 13:08:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dtrnlsp-solve-spinning-sleeping-when-called-from-multiple/m-p/1017289#M19544</guid>
      <dc:creator>Steven_H_1</dc:creator>
      <dc:date>2015-09-09T13:08:48Z</dc:date>
    </item>
  </channel>
</rss>

