<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic LAPACKE_sgesdd stops using threads for 10k x 10k matrix in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094920#M23490</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Calling &lt;STRONG&gt;LAPACKE_sgesdd&lt;/STRONG&gt; for different size of the input matrix, I've noticed that starting from some dimension, computations run in a single thread.&lt;/P&gt;

&lt;P&gt;Attached is a code that calls the function for a matrix filled with random numbers uniformly drawn from [0, 1] and measures execurion time.&amp;nbsp;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;The project archive is available at my &lt;A href="https://drive.google.com/file/d/0B73nEa1JLW2qVnhidFBfZzFLVGM/view?usp=sharing"&gt;Google Drive&lt;/A&gt;.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;For a matrix with 10000 columns there is a sharp performance decrease when the number of rows reaches 9000. This effect does not appear, if the same code is compiled with Intel compiler.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;Is there any way to make the code work with MS compiler too?&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;&amp;gt;SVDProblem.exe 8000 10000
Time taken: 57.906423 s.

&amp;gt;SVDProblem.exe 8500 10000
Time taken: 63.765770 s.

&amp;gt;SVDProblem.exe 9000 10000
Time taken: 257.664138 s.&lt;/PRE&gt;

&lt;P&gt;Hardware:&lt;BR /&gt;
	Intel Core i7-6950X, 64 GB RAM&lt;/P&gt;

&lt;P&gt;Software:&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;MKL 2017 Update 1 (statically linked &lt;STRONG&gt;mkl_core.lib&lt;/STRONG&gt;,&amp;nbsp;&lt;STRONG&gt;mkl_intel_lp64.lib&lt;/STRONG&gt;,&amp;nbsp;&lt;STRONG&gt;mkl_intel_thread.lib&lt;/STRONG&gt;)&lt;BR /&gt;
	VisualStudio2015 Update3, Intel Compiler 17.0 (&lt;/SPAN&gt;&lt;STRONG&gt;libiomp5md.lib&lt;/STRONG&gt; is statically linked, &lt;STRONG&gt;libiomp5md.dll&lt;/STRONG&gt; is copied to the binary folder)&lt;BR /&gt;
	Windows 7 Enterprise Service Pack 1&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Thank you!&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;Igor&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 14 Feb 2017 02:12:18 GMT</pubDate>
    <dc:creator>Igor_C_Intel</dc:creator>
    <dc:date>2017-02-14T02:12:18Z</dc:date>
    <item>
      <title>LAPACKE_sgesdd stops using threads for 10k x 10k matrix</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094920#M23490</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Calling &lt;STRONG&gt;LAPACKE_sgesdd&lt;/STRONG&gt; for different size of the input matrix, I've noticed that starting from some dimension, computations run in a single thread.&lt;/P&gt;

&lt;P&gt;Attached is a code that calls the function for a matrix filled with random numbers uniformly drawn from [0, 1] and measures execurion time.&amp;nbsp;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;The project archive is available at my &lt;A href="https://drive.google.com/file/d/0B73nEa1JLW2qVnhidFBfZzFLVGM/view?usp=sharing"&gt;Google Drive&lt;/A&gt;.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;For a matrix with 10000 columns there is a sharp performance decrease when the number of rows reaches 9000. This effect does not appear, if the same code is compiled with Intel compiler.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;Is there any way to make the code work with MS compiler too?&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;&amp;gt;SVDProblem.exe 8000 10000
Time taken: 57.906423 s.

&amp;gt;SVDProblem.exe 8500 10000
Time taken: 63.765770 s.

&amp;gt;SVDProblem.exe 9000 10000
Time taken: 257.664138 s.&lt;/PRE&gt;

&lt;P&gt;Hardware:&lt;BR /&gt;
	Intel Core i7-6950X, 64 GB RAM&lt;/P&gt;

&lt;P&gt;Software:&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;MKL 2017 Update 1 (statically linked &lt;STRONG&gt;mkl_core.lib&lt;/STRONG&gt;,&amp;nbsp;&lt;STRONG&gt;mkl_intel_lp64.lib&lt;/STRONG&gt;,&amp;nbsp;&lt;STRONG&gt;mkl_intel_thread.lib&lt;/STRONG&gt;)&lt;BR /&gt;
	VisualStudio2015 Update3, Intel Compiler 17.0 (&lt;/SPAN&gt;&lt;STRONG&gt;libiomp5md.lib&lt;/STRONG&gt; is statically linked, &lt;STRONG&gt;libiomp5md.dll&lt;/STRONG&gt; is copied to the binary folder)&lt;BR /&gt;
	Windows 7 Enterprise Service Pack 1&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Thank you!&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;Igor&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 02:12:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094920#M23490</guid>
      <dc:creator>Igor_C_Intel</dc:creator>
      <dc:date>2017-02-14T02:12:18Z</dc:date>
    </item>
    <item>
      <title>thanks Igor, we will gave a</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094921#M23491</link>
      <description>&lt;P&gt;thanks Igor, we will gave a look at the problem asap&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 02:28:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094921#M23491</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2017-02-14T02:28:40Z</dc:date>
    </item>
    <item>
      <title>Igor, checked the behavior on</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094922#M23492</link>
      <description>&lt;P&gt;Igor, c&lt;SPAN style="font-size: 1em;"&gt;hecked the behavior on two systems available right now: 2 and 24 threads. &amp;nbsp;I only added mkl_version and mkl_get_max_threads routines to report some needed details:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;below what I see on my side:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;_cl.exe &lt;/SPAN&gt;&lt;STRONG style="font-size: 1em;"&gt;8500 10000&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Major version: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2017&lt;BR /&gt;
	Minor version: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0&lt;BR /&gt;
	Update version: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;BR /&gt;
	Product status: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Product&lt;BR /&gt;
	Build: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 20161005&lt;BR /&gt;
	Platform: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Intel(R) 64 architecture&lt;BR /&gt;
	Processor optimization: &amp;nbsp;Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors&lt;BR /&gt;
	================================================================&lt;/P&gt;

&lt;P&gt;n_rows = 8500&lt;BR /&gt;
	n_columns = 10000&lt;BR /&gt;
	&amp;nbsp;MKL #threads == 24&lt;BR /&gt;
	&lt;STRONG&gt;Time taken: 77.126904 s.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;_cl.exe 9000 10000&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;MKL #threads == 24&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Time taken: 82.861420 s.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;cl version&lt;BR /&gt;
	Microsoft (R) C/C++ Optimizing Compiler Version &lt;STRONG&gt;18.00.21005.1 for x64&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 05:08:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094922#M23492</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2017-02-14T05:08:34Z</dc:date>
    </item>
    <item>
      <title>and the similar with 2</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094923#M23493</link>
      <description>&lt;P&gt;and the similar with 2 threads&lt;/P&gt;

&lt;P&gt;_cl.exe 8500 10000&lt;BR /&gt;
	Major version: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2017&lt;BR /&gt;
	Minor version: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0&lt;BR /&gt;
	Update version: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;BR /&gt;
	Product status: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Product&lt;BR /&gt;
	Build: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 20161005&lt;BR /&gt;
	Platform: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Intel(R) 64 architecture&lt;BR /&gt;
	Processor optimization: &amp;nbsp;Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors&lt;BR /&gt;
	================================================================&lt;/P&gt;

&lt;P&gt;n_rows = 8500&lt;BR /&gt;
	n_columns = 10000&lt;BR /&gt;
	&amp;nbsp;MKL #threads == 2&lt;BR /&gt;
	Time taken:&lt;STRONG&gt; 323.315681 s.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;_cl.exe 9000 10000&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;n_rows = 9000&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;n_columns = 10000&lt;BR /&gt;
	&amp;nbsp;MKL #threads == 2&lt;BR /&gt;
	Time taken: &lt;STRONG&gt;375.469913 s.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 05:15:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094923#M23493</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2017-02-14T05:15:52Z</dc:date>
    </item>
    <item>
      <title>Gennady, thanks a lot for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094924#M23494</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Gennady, thanks a lot for prompt answer.&lt;BR /&gt;
	I inserted a call of&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;STRONG&gt;MKL_Get_Max_Threads&lt;/STRONG&gt;&amp;nbsp;routine to my code and the problem disappeared.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;After some experiments...&amp;nbsp;&lt;BR /&gt;
	if&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;&lt;STRONG&gt;MKL_Get_Max_Threads&lt;/STRONG&gt; is called at the start, it returns 10 and SVD uses 10 threads.&lt;/SPAN&gt;&lt;BR /&gt;
	if&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;&lt;STRONG&gt;MKL_Get_Max_Threads&lt;/STRONG&gt; is called just before&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;LAPACKE_sgesdd&lt;/STRONG&gt;&amp;nbsp;call, it returns 1 and calculations&lt;BR /&gt;
	are performed using a single thread.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Debugger shows no threads are created till&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;STRONG&gt;LAPACKE_sgesdd&lt;/STRONG&gt; function call in both cases,&lt;BR /&gt;
	so race condition is excluded. Can it be attributed to unspecified order of static variables initialization in MKL libraries?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Also, the problem seems to be very uncommon... laptop, another desktop and even a virtual machine installed on&lt;BR /&gt;
	the problematic desktop work flawlessly. I'm going to try it on peers' computers and share an update. Anyway, I have a working&lt;BR /&gt;
	solution now (call&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&lt;STRONG&gt;MKL_Get_Max_Threads&lt;/STRONG&gt; in advance), so the problem is not urgent anymore.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;P.S.&amp;nbsp;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;&lt;STRONG&gt;MKL version:&lt;/STRONG&gt; Intel(R) Math Kernel Library Version 2017.0.1 Product Build 20161005 for Intel(R) 64 architecture applications&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&lt;STRONG&gt;Compiler:&lt;/STRONG&gt;&amp;nbsp;&lt;/SPAN&gt;Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 07:04:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094924#M23494</guid>
      <dc:creator>Igor_C_Intel</dc:creator>
      <dc:date>2017-02-14T07:04:00Z</dc:date>
    </item>
    <item>
      <title>I've just found a similar</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094925#M23495</link>
      <description>&lt;P&gt;I've just found a similar symptom description at&amp;nbsp;&lt;BR /&gt;
	&lt;SPAN style="font-size: 13.008px;"&gt;&lt;A href="https://svn.artisynth.org/svn/artisynth_core/trunk/src/artisynth/core/driver/Main.java"&gt;https://svn.artisynth.org/svn/artisynth_core/trunk/src/artisynth/core/driver/Main.java&lt;/A&gt; :&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE style="color: rgb(0, 0, 0); word-wrap: break-word; white-space: pre-wrap;"&gt;/**
    * On Windows, we have sometimes seen that Pardiso getNumThreads() needs to
    * be called early, or otherwise the maximum number of threads returned by
    * mkl_get_max_threads() becomes fixed at 1. In particular, we seem to have
    * to do this before models are loaded.
*/&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 07:31:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094925#M23495</guid>
      <dc:creator>Igor_C_Intel</dc:creator>
      <dc:date>2017-02-14T07:31:52Z</dc:date>
    </item>
    <item>
      <title>Igor, I still couldn't</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094926#M23496</link>
      <description>&lt;P&gt;Igor, I still couldn't reproduce the issue on my side on different systems available. But, i use&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;cl version&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;18.00.21005.1 for x64. I see only this difference.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;I will ask owner of this code to help. we will keep you updated. Thanks for the case.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 11:29:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACKE-sgesdd-stops-using-threads-for-10k-x-10k-matrix/m-p/1094926#M23496</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2017-02-14T11:29:57Z</dc:date>
    </item>
  </channel>
</rss>

