<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: MKL sgemv performance using multithreading in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862853#M7618</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;many thanks for your nice information, victor!&lt;BR /&gt;&lt;BR /&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93649"&gt;Victor Pasko (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;/EM&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Thu, 17 Dec 2009 04:38:08 GMT</pubDate>
    <dc:creator>pilot117</dc:creator>
    <dc:date>2009-12-17T04:38:08Z</dc:date>
    <item>
      <title>MKL sgemv performance using multithreading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862851#M7616</link>
      <description>Hi, &lt;BR /&gt;&lt;BR /&gt;I did several tests for MKL function "sgemv":&lt;BR /&gt;&lt;BR /&gt;A matrix 20000 by 20000&lt;BR /&gt;b vec 20000 by 1&lt;BR /&gt;x vec 20000 by 1&lt;BR /&gt;hardware: 2 quad core xeon cpus.&lt;BR /&gt;&lt;SPAN class="Code"&gt;KMP_AFFINITY=verbose,compact&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;While set the number of threads to be 4:&lt;BR /&gt;&lt;BR /&gt;omp_set_num_threads(4);&lt;BR /&gt;float alpha=1.0;&lt;BR /&gt;float gama=0.0;&lt;BR /&gt;int index=1;&lt;BR /&gt;int N=20000;&lt;BR /&gt;sgemv("N", &amp;amp;N, &amp;amp;N, α, A, &amp;amp;N, x, &amp;amp;index, &amp;amp;gama, y, &amp;amp;index); // takes ~92 ms&lt;BR /&gt;sgemv("T", &amp;amp;N, &amp;amp;N, α, A, &amp;amp;N, x, &amp;amp;index, &amp;amp;gama, y, &amp;amp;index); // takes ~92 ms&lt;BR /&gt;&lt;BR /&gt;but if I set the number of threads to be 8:&lt;BR /&gt;omp_set_num_threads(8);&lt;BR /&gt;sgemv("N", &amp;amp;N, &amp;amp;N, α, A, &amp;amp;N, x, &amp;amp;index, &amp;amp;gama, y, &amp;amp;index); // takes ~100 ms&lt;BR /&gt; sgemv("T", &amp;amp;N, &amp;amp;N, α, A, &amp;amp;N, x, &amp;amp;index, &amp;amp;gama, y, &amp;amp;index); // takes ~100 ms&lt;BR /&gt;&lt;BR /&gt;NO improvement by increasing the threads, &lt;B&gt;but worse!&lt;/B&gt; why?&lt;BR /&gt;&lt;BR /&gt; &lt;SPAN class="Code"&gt;&lt;BR /&gt;While set KMP_AFFINITY=verbose,scatter, &lt;BR /&gt;for 4 threads sgemv("N"....) takes around 98~104ms&lt;BR /&gt; sgemv("T"....) takes around 96~100ms&lt;BR /&gt;&lt;BR /&gt;for 8 threads &lt;/SPAN&gt;&lt;SPAN class="Code"&gt;sgemv("N"....) takes around 102~106ms&lt;BR /&gt; sgemv("T"....) takes around 99ms&lt;/SPAN&gt;&lt;BR /&gt;Seems no difference between 4 and 8 threads. &lt;BR /&gt;&lt;BR /&gt;So are these times are resulted from caching? Any one could explain it a little bit?&lt;BR /&gt;&lt;BR /&gt;many thanks! &lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 13 Dec 2009 03:17:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862851#M7616</guid>
      <dc:creator>pilot117</dc:creator>
      <dc:date>2009-12-13T03:17:35Z</dc:date>
    </item>
    <item>
      <title>Re: MKL sgemv performance using multithreading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862852#M7617</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/439804"&gt;pilot117&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Seems no difference between 4 and 8 threads. &lt;BR /&gt;&lt;BR /&gt;So are these times are resulted from caching? Any one could explain it a little bit?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;
&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;You are asking about performance scalability. It should be a linear function depending on number of CPUs. But, taking into account used parallelization algorithm, overhead on threading, memory distribution on caches (e.g. ccNUMA) then scalability is generally difficult to define. Therefore, it's possible to get peak performance on some number of threads. And increasing number of threads just degrades performance.&lt;/P&gt;
&lt;P&gt;Also, performance measuringtechnique is important. Please take a look at modified sgemv example and my results below:&lt;/P&gt;
&lt;P&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;MKL.H&gt;&lt;/MKL.H&gt;&lt;/STDIO.H&gt;&lt;/P&gt;
&lt;P&gt;#ifndef SIZE&lt;BR /&gt;#define SIZE 20000&lt;BR /&gt;#endif&lt;/P&gt;
&lt;P&gt;#ifndef NT&lt;BR /&gt;#define NT 4&lt;BR /&gt;#endif&lt;/P&gt;
&lt;P&gt;#ifndef CYCLE&lt;BR /&gt;#define CYCLE 10&lt;BR /&gt;#endif&lt;/P&gt;
&lt;P&gt;float A[SIZE][SIZE];&lt;BR /&gt;float x[SIZE];&lt;BR /&gt;float y[SIZE];&lt;BR /&gt;float alpha = 1.0;&lt;BR /&gt;float gamma = 0.0;&lt;BR /&gt;int index = 1;&lt;BR /&gt;int N = SIZE;&lt;/P&gt;
&lt;P&gt;int main(int argc, char*argv[]) {&lt;BR /&gt; int i;&lt;BR /&gt; int nt;&lt;/P&gt;
&lt;P&gt; if (argc == 1)&lt;BR /&gt; nt = NT;&lt;BR /&gt; else&lt;BR /&gt;  nt = atoi(argv[argc-1]);&lt;/P&gt;
&lt;P&gt; MKL_Set_Num_Threads(nt);&lt;BR /&gt;&lt;BR /&gt; for (i=0; i &amp;lt; CYCLE; ++i) // used torun on warm caches&lt;BR /&gt; sgemv("N", &amp;amp;N, &amp;amp;N, α, &amp;amp;A[0][0], &amp;amp;N, x, &amp;amp;index, γ, y, &amp;amp;index);&lt;BR /&gt; return 0;&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;On Linux, I have the following results with KMP_AFFINITY=compact&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 1&lt;BR /&gt;0.67user 0.15system 0:01.16elapsed 71%CPU&lt;BR /&gt;&lt;BR /&gt;% /usr/bin/time ./a.out 2&lt;BR /&gt;0.69user 0.15system 0:00.64elapsed 132%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 3&lt;BR /&gt;0.81user 0.15system 0:00.42elapsed 230%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 4&lt;BR /&gt;0.68user 0.20system 0:00.42elapsed 212%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 5&lt;BR /&gt;0.96user 0.20system 0:00.42elapsed 277%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 6&lt;BR /&gt;1.04user 0.23system 0:00.42elapsed 300%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 7&lt;BR /&gt;0.97user 0.51system 0:00.42elapsed 351%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 8&lt;BR /&gt;0.95user 0.62system 0:00.42elapsed 375%CPU&lt;/P&gt;
&lt;P&gt;% /usr/bin/time ./a.out 9&lt;BR /&gt;1.20user 0.26system 0:00.42elapsed 349%CPU&lt;BR /&gt;&lt;BR /&gt;Let me know, if there are any questionsabout scalability&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;--Victor&lt;/P&gt;</description>
      <pubDate>Wed, 16 Dec 2009 11:28:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862852#M7617</guid>
      <dc:creator>barragan_villanueva_</dc:creator>
      <dc:date>2009-12-16T11:28:30Z</dc:date>
    </item>
    <item>
      <title>Re: MKL sgemv performance using multithreading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862853#M7618</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;many thanks for your nice information, victor!&lt;BR /&gt;&lt;BR /&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93649"&gt;Victor Pasko (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;/EM&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 17 Dec 2009 04:38:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862853#M7618</guid>
      <dc:creator>pilot117</dc:creator>
      <dc:date>2009-12-17T04:38:08Z</dc:date>
    </item>
    <item>
      <title>Re: MKL sgemv performance using multithreading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862854#M7619</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Victor, what is CPU type you are running on?&lt;BR /&gt;is it 64-bit code? &lt;BR /&gt;and guess you used the latest version of MKL?&lt;BR /&gt;--Gennady&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Dec 2009 07:37:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862854#M7619</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2009-12-18T07:37:35Z</dc:date>
    </item>
    <item>
      <title>Re: MKL sgemv performance using multithreading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862855#M7620</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/334681"&gt;Gennady Fedorov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Victor, what is CPU type you are running on?&lt;BR /&gt;is it 64-bit code? &lt;BR /&gt;and guess you used the latest version of MKL?&lt;BR /&gt;--Gennady&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Yes, Gennady.Irun onLinux 64-bitusing the latest MKL.</description>
      <pubDate>Fri, 18 Dec 2009 07:51:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-sgemv-performance-using-multithreading/m-p/862855#M7620</guid>
      <dc:creator>barragan_villanueva_</dc:creator>
      <dc:date>2009-12-18T07:51:15Z</dc:date>
    </item>
  </channel>
</rss>

