<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: BLAS Level 2 uses more than one core. in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875581#M8887</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;MKL has to include all the functionality of the standard BLAS versions of those functions. You should easily be able to improve on performance of most Level 2 BLAS, particulary those like these which call level 1 BLAS, by writing code for your own usage. I'm not so familiar with these particular functions; assuming that dspr2 or dspmv or the like may be important, they would require OpenMP schedule(guided) if threading were applied to the public source. So one would think there could be a gain from threading on Core i7, not as large as for those suitable for default schedule, for problems in a certain size range, if it is not so large that cache misses dominate over influence of threading.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: &lt;A href="http://www.thesa-store.com/products/" target="_blank"&gt;http://www.thesa-store.com/products/&lt;/A&gt;</description>
    <pubDate>Thu, 26 Nov 2009 23:46:53 GMT</pubDate>
    <dc:creator>yuriisig</dc:creator>
    <dc:date>2009-11-26T23:46:53Z</dc:date>
    <item>
      <title>BLAS Level 2 uses more than one core.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875577#M8883</link>
      <description>I have noticed that on my processor i7 860 BLAS Level 2 uses more than one core. What sense? Better on 1core to realise good algorithm, instead of to downgrade efficiency of the processor</description>
      <pubDate>Thu, 26 Nov 2009 19:14:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875577#M8883</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2009-11-26T19:14:53Z</dc:date>
    </item>
    <item>
      <title>Re: BLAS Level 2 uses more than one core.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875578#M8884</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/312233"&gt;yuriisig&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; I have noticed that on my processor i7 860 BLAS Level 2 uses more than one core. What sense? Better on 1core to realise good algorithm, instead of to downgrade efficiency of the processor&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
MKL didn't have level 2 threading available until recently, but it was requested frequently. It would take a large vector size to make threading pay off. If your case is using more than optimum threads, you have several options, including mkl_sequential, setting number of threads by environment variable or OpenMP call, or compiling from source.&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Nov 2009 21:05:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875578#M8884</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-26T21:05:09Z</dc:date>
    </item>
    <item>
      <title>Re: BLAS Level 2 uses more than one core.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875579#M8885</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;...but it was requested frequently...&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Why? I think that it is related to an inefficiency of a code of Intel MKL. In my threediagonalisation of the packed matrixes some core for BLAS Level 2 are not required. I DSPTRD on one core for matrixes 5000*5000 gives 21.1 s., and Inel MKL DSPTRD - 28.7 c. and Inel MKL DSYTRD - 26.4 c (i7 860).</description>
      <pubDate>Thu, 26 Nov 2009 22:12:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875579#M8885</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2009-11-26T22:12:22Z</dc:date>
    </item>
    <item>
      <title>Re: BLAS Level 2 uses more than one core.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875580#M8886</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/312233"&gt;yuriisig&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Why? I think that it is related to an inefficiency of a code of Intel MKL. In my threediagonalisation of the packed matrixes some core for BLAS Level 2 are not required. I DSPTRD on one core for matrixes 5000*5000 gives 21.1 s., and Inel MKL DSPTRD - 28.7 c. and Inel MKL DSYTRD - 26.4 c (i7 860).&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
MKL has to include all the functionality of the standard BLAS versions of those functions. You should easily be able to improve on performance of most Level 2 BLAS, particulary those like these which call level 1 BLAS, by writing code for your own usage. I'm not so familiar with these particular functions; assuming that dspr2 or dspmv or the like may be important, they would require OpenMP schedule(guided) if threading were applied to the public source. So one would think there could be a gain from threading on Core i7, not as large as for those suitable for default schedule, for problems in a certain size range, if it is not so large that cache misses dominate over influence of threading.&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Nov 2009 23:26:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875580#M8886</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-26T23:26:08Z</dc:date>
    </item>
    <item>
      <title>Re: BLAS Level 2 uses more than one core.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875581#M8887</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;MKL has to include all the functionality of the standard BLAS versions of those functions. You should easily be able to improve on performance of most Level 2 BLAS, particulary those like these which call level 1 BLAS, by writing code for your own usage. I'm not so familiar with these particular functions; assuming that dspr2 or dspmv or the like may be important, they would require OpenMP schedule(guided) if threading were applied to the public source. So one would think there could be a gain from threading on Core i7, not as large as for those suitable for default schedule, for problems in a certain size range, if it is not so large that cache misses dominate over influence of threading.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: &lt;A href="http://www.thesa-store.com/products/" target="_blank"&gt;http://www.thesa-store.com/products/&lt;/A&gt;</description>
      <pubDate>Thu, 26 Nov 2009 23:46:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875581#M8887</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2009-11-26T23:46:53Z</dc:date>
    </item>
    <item>
      <title>Re: BLAS Level 2 uses more than one core.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875582#M8888</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/312233"&gt;yuriisig&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: &lt;A href="http://www.thesa-store.com/products/" target="_blank"&gt;http://www.thesa-store.com/products/&lt;/A&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hello, &lt;BR /&gt;&lt;BR /&gt;Justadd some comments, &lt;BR /&gt;Some BLAS level 1 and Level 2 function are threaded since MKL 10.2, please see &lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/threaded-blas-level-1-and-2-on-nehalem/" target="_blank"&gt;http://software.intel.com/en-us/articles/threaded-blas-level-1-and-2-on-nehalem/&lt;/A&gt;&lt;BR /&gt;or &lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-mkl-threaded-functions/" target="_blank"&gt;http://software.intel.com/en-us/articles/intel-mkl-threaded-functions/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;But the performance mainly depends on the data location in cache and other factors, for example,&lt;BR /&gt;in &lt;A href="http://software.intel.com/en-us/articles/performance-slow-down-when-dynamically-linking-with-intel-mkl/"&gt;http://software.intel.com/en-us/articles/performance-slow-down-when-dynamically-linking-with-intel-mkl/&lt;/A&gt;&lt;BR /&gt;when &lt;BR /&gt;1) the data set is small in the application.&lt;BR /&gt;2) The second run may have better performance than the first run. &lt;BR /&gt;3) The problem happen whendynamic linking with Intel MKL&lt;BR /&gt;&lt;BR /&gt;You may check them. If it is not related to all of above, may you provide a test case(include theinput data)? itwould be helpful.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Ying</description>
      <pubDate>Mon, 30 Nov 2009 03:21:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/BLAS-Level-2-uses-more-than-one-core/m-p/875582#M8888</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2009-11-30T03:21:45Z</dc:date>
    </item>
  </channel>
</rss>

