<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: LAPACK MKL faster on one thread than two - how come? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871101#M8488</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/246987"&gt;Tony Garratt&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;P&gt;On a dual-core win64 machine, I have some code that uses LAPACK and BLAS inside some numerical computation.s&lt;BR /&gt;I experimented withMKL_NUM_THREADS as follows:&lt;BR /&gt;&lt;BR /&gt;Not set(i.e. let MKL use both cores):CPU time=54.5s;wall clock time=28.0s&lt;BR /&gt;=1: CPU time=27.6s; wall clock time=27.7s&lt;BR /&gt;&lt;BR /&gt;How come letting MKL use both cores uses more CPU and has a longer wall clock time?&lt;BR /&gt;&lt;BR /&gt;Note: my PC does not have HT enabled.&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;Tony,&lt;BR /&gt;what is the typical size of your task?&lt;BR /&gt;--Gennnady&lt;BR /&gt;</description>
    <pubDate>Thu, 17 Sep 2009 09:36:50 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2009-09-17T09:36:50Z</dc:date>
    <item>
      <title>LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871097#M8484</link>
      <description>&lt;P&gt;On a dual-core win64 machine, I have some code that uses LAPACK and BLAS inside some numerical computation.s&lt;BR /&gt;I experimented withMKL_NUM_THREADS as follows:&lt;BR /&gt;&lt;BR /&gt;Not set(i.e. let MKL use both cores):CPU time=54.5s;wall clock time=28.0s&lt;BR /&gt;=1: CPU time=27.6s; wall clock time=27.7s&lt;BR /&gt;&lt;BR /&gt;How come letting MKL use both cores uses more CPU and has a longer wall clock time?&lt;BR /&gt;&lt;BR /&gt;Note: my PC does not have HT enabled.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Sep 2009 20:17:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871097#M8484</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-09-15T20:17:56Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871098#M8485</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
I suppose it's not difficult to construct a case like that; there may be several ways. For example, if your case uses the entire cache with one thread, and performance is limited by cache, there may be no gain for 2 threads. According to a typical definition of CPU time (e.g. C clock() or Fortran cpu_time), it adds up the times spent in each thread. Achieving a cpu time which indicates all threads are running 100% is sometimes given as a goal.&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Sep 2009 12:46:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871098#M8485</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-09-16T12:46:24Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871099#M8486</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;I suppose it's not difficult to construct a case like that; there may be several ways. For example, if your case uses the entire cache with one thread, and performance is limited by cache, there may be no gain for 2 threads. According to a typical definition of CPU time (e.g. C clock() or Fortran cpu_time), it adds up the times spent in each thread. Achieving a cpu time which indicates all threads are running 100% is sometimes given as a goal.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Thanks Tim for your reply. So basically we are saying that this behaviour is not unexpected. Or put another way, the algorithm inside MKL that decides how many threads may not always choose the optimal number of threads?</description>
      <pubDate>Wed, 16 Sep 2009 17:35:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871099#M8486</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-09-16T17:35:05Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871100#M8487</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Choice of number of threads may be based mainly on whether the problem is large enough to use all available threads. There is some cache blocking in MKL functions where it is appropriate, but I suppose cases where this won't enable multi-thread scaling wouldn't be detected.&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Sep 2009 17:41:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871100#M8487</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-09-16T17:41:41Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871101#M8488</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/246987"&gt;Tony Garratt&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;P&gt;On a dual-core win64 machine, I have some code that uses LAPACK and BLAS inside some numerical computation.s&lt;BR /&gt;I experimented withMKL_NUM_THREADS as follows:&lt;BR /&gt;&lt;BR /&gt;Not set(i.e. let MKL use both cores):CPU time=54.5s;wall clock time=28.0s&lt;BR /&gt;=1: CPU time=27.6s; wall clock time=27.7s&lt;BR /&gt;&lt;BR /&gt;How come letting MKL use both cores uses more CPU and has a longer wall clock time?&lt;BR /&gt;&lt;BR /&gt;Note: my PC does not have HT enabled.&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;Tony,&lt;BR /&gt;what is the typical size of your task?&lt;BR /&gt;--Gennnady&lt;BR /&gt;</description>
      <pubDate>Thu, 17 Sep 2009 09:36:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871101#M8488</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2009-09-17T09:36:50Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871102#M8489</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/334681"&gt;Gennady Fedorov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;Tony,&lt;BR /&gt;what is the typical size of your task?&lt;BR /&gt;--Gennnady&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;To be more accurate, we are using a third party linear sparse solver inside our application andthis solver makes heavy use of the BLAS - so the issue is related to BLAS, not LAPACK as I first thought. Our problem size is n=163. The sparse solver makes use of at least level 1 and 2 blas and possibly level 3 (I can check if knowing this is important).&lt;BR /&gt;&lt;BR /&gt;We are using Fortran 10.0.25 and MKL 10.1.1 and this is on a windows win64 (2-core no HT) machine, but we are also seeing the same type of behaviour on linux machines (8 core HT) too.</description>
      <pubDate>Fri, 18 Sep 2009 15:22:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871102#M8489</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-09-18T15:22:43Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871103#M8490</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;I tried another test. I extracted a matrix from our application and set up an off-line test to solve and factorise that matrix repeatedly. Here are the results (on a dual core win64 machine):&lt;BR /&gt;&lt;BR /&gt;NUM_MKL_THREADS CPU Time Wall clock&lt;BR /&gt;Not set 91.5 47.66&lt;BR /&gt;154.4 54.61&lt;BR /&gt;&lt;BR /&gt;In this case, the CPU is a lot for when the 2 cores are used, but the wall clock time does go down. What this tells me is that (not suprisingly) there is a cost of mult-threading, but that cost generally pays off.</description>
      <pubDate>Fri, 18 Sep 2009 18:42:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871103#M8490</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-09-18T18:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871104#M8491</link>
      <description>&lt;BR /&gt;
&lt;P&gt;Garry,&lt;BR /&gt;If I understood you right, you are using third party solver and BLAS routine (Is it dgemm or another routine? ) with the square matrix (163x163). Am I right?&lt;BR /&gt;I guess third party solver is not mkl's routine and &lt;BR /&gt;Could you send us the similar performance numbers for the BLAS routine?&lt;BR /&gt;And one more question - what is the CPU type you are running on?&lt;BR /&gt;--Gennady&lt;/P&gt;</description>
      <pubDate>Mon, 21 Sep 2009 05:03:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871104#M8491</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2009-09-21T05:03:08Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871105#M8492</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/334681"&gt;Gennady Fedorov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;
&lt;P&gt;Garry,&lt;BR /&gt;If I understood you right, you are using third party solver and BLAS routine (Is it dgemm or another routine? ) with the square matrix (163x163). Am I right?&lt;BR /&gt;I guess third party solver is not mkl's routine and &lt;BR /&gt;Could you send us the similar performance numbers for the BLAS routine?&lt;BR /&gt;And one more question - what is the CPU type you are running on?&lt;BR /&gt;--Gennady&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;The third party solver uses a variety of BLAS routines - I am not sure which one of these could be the culprit. The third party solver is not MKLs - it is a Fortran sparse linear solver. I need to dig into the third party code and maybe try to track it down, but it will take some time. It is likely to be DGEMM, but I amnot 100%sure.The matrix is 163x163 that the third party solving is solving, but I need to make sure that N=163 on the BLAS calls because it may be doing some partitioning.&lt;BR /&gt;&lt;BR /&gt;So, what you would like is for to break the problem down and try to find out which BLAS routine is the culprit?&lt;BR /&gt;&lt;BR /&gt;Im running on win64, chip details below, but we have also seen similiar behaviour on linux.&lt;BR /&gt;&lt;BR /&gt;Intel Xeon CPU 5150 @ 2.66Hz, no HT&lt;BR /&gt;&lt;BR /&gt;If you can confirm the next steps, I can work with you to diagnose this problem further...&lt;BR /&gt;&lt;BR /&gt;thank you!&lt;BR /&gt;Tony&lt;BR /&gt;</description>
      <pubDate>Mon, 21 Sep 2009 17:03:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871105#M8492</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-09-21T17:03:30Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871106#M8493</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/246987"&gt;Tony Garratt&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;The third party solver uses a variety of BLAS routines - I am not sure which one of these could be the culprit. The third party solver is not MKLs - it is a Fortran sparse linear solver. I need to dig into the third party code and maybe try to track it down, but it will take some time. It is likely to be DGEMM, but I amnot 100%sure.The matrix is 163x163 that the third party solving is solving, but I need to make sure that N=163 on the BLAS calls because it may be doing some partitioning.&lt;BR /&gt;&lt;BR /&gt;So, what you would like is for to break the problem down and try to find out which BLAS routine is the culprit?&lt;BR /&gt;&lt;BR /&gt;Im running on win64, chip details below, but we have also seen similiar behaviour on linux.&lt;BR /&gt;&lt;BR /&gt;Intel Xeon CPU 5150 @ 2.66Hz, no HT&lt;BR /&gt;&lt;BR /&gt;If you can confirm the next steps, I can work with you to diagnose this problem further...&lt;BR /&gt;&lt;BR /&gt;thank you!&lt;BR /&gt;Tony&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi Gennnady - any update please?</description>
      <pubDate>Wed, 23 Sep 2009 16:59:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871106#M8493</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-09-23T16:59:03Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871107#M8494</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/246987"&gt;Tony Garratt&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Hi Gennnady - any update please?&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
Just out of curiosity, what is the name of the third party sparse linear system solver? &lt;BR /&gt;&lt;BR /&gt;The solution time of 27 seconds is too slow for 163-by-163 linear system, so I assume 163 is the BLAS block size? Or are you using iterative solver?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt; &lt;BR /&gt;</description>
      <pubDate>Mon, 28 Sep 2009 22:19:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871107#M8494</guid>
      <dc:creator>jaewonj</dc:creator>
      <dc:date>2009-09-28T22:19:40Z</dc:date>
    </item>
    <item>
      <title>Re: LAPACK MKL faster on one thread than two - how come?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871108#M8495</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/431061"&gt;jaewonj&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Just out of curiosity, what is the name of the third party sparse linear system solver? &lt;BR /&gt;&lt;BR /&gt;The solution time of 27 seconds is too slow for 163-by-163 linear system, so I assume 163 is the BLAS block size? Or are you using iterative solver?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Unfortunately, I cannot say which sparse solver were we using.</description>
      <pubDate>Thu, 05 Nov 2009 19:25:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-MKL-faster-on-one-thread-than-two-how-come/m-p/871108#M8495</guid>
      <dc:creator>Tony_Garratt</dc:creator>
      <dc:date>2009-11-05T19:25:08Z</dc:date>
    </item>
  </channel>
</rss>

