<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:processing time sequential vs threaded+mkl_set_num_threads_local(1) in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1344834#M32468</link>
    <description>&lt;P&gt;This thread is closing and we will no longer respond to this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 17 Dec 2021 03:46:14 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2021-12-17T03:46:14Z</dc:date>
    <item>
      <title>processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1341921#M32394</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;in a omp threaded application where mkl is called from several threads in parallel, for example:&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#pragma omp prallel for num_threads(4)

for(int i=0;i&amp;lt;4;++i){

  int save=mkl_set_num_threads_local(1)

  dgemm(...);

  mkl_set_num_threads_local(save)

}&lt;/LI-CODE&gt;
&lt;P&gt;When calling the threaded mkl version the number of local threads is set to one on purpose because the array sizes are very small.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have noticed a substantial speed difference depending on whether the sequential version of mkl is linked (&lt;SPAN style="font-family: monospace;"&gt;&lt;SPAN style="color: #000000; background-color: #ffffff;"&gt;libmkl_sequential.a&lt;/SPAN&gt;&lt;/SPAN&gt;) or the threaded (&lt;SPAN style="font-family: monospace;"&gt;&lt;SPAN style="color: #000000; background-color: #ffffff;"&gt;libmkl_intel_thread.a&lt;/SPAN&gt;&lt;/SPAN&gt;). The program needs approximately 1.5 times more time when using threaded compared to using sequential.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am wonder whether anything can be done to have both versions running at the same speed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Dec 2021 00:40:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1341921#M32394</guid>
      <dc:creator>may_ka</dc:creator>
      <dc:date>2021-12-07T00:40:12Z</dc:date>
    </item>
    <item>
      <title>Re: processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1342095#M32400</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please provide us with the following details so that we can work on it from our end?&lt;/P&gt;
&lt;P&gt;MKL Version&lt;/P&gt;
&lt;P&gt;Compiler used&lt;/P&gt;
&lt;P&gt;OS Details &amp;amp; type of CPU&lt;/P&gt;
&lt;P&gt;It would be helpful if you also share with us the complete sample reproducer (&amp;amp; steps to reproduce the issue if any), &amp;amp; how you are calculating the time for both the versions(sequential &amp;amp; threaded) so that it would help us to get more insights regarding the issue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Vidya.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Dec 2021 12:38:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1342095#M32400</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-12-07T12:38:18Z</dc:date>
    </item>
    <item>
      <title>Re: processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1342276#M32402</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks for your response.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;mkl version was oneapi &lt;FONT face="andale mono,times"&gt;2021.2.0&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;compiler was intel oneapi 2021.2.0 clang++&lt;/LI&gt;
&lt;LI&gt;os: linux&lt;/LI&gt;
&lt;LI&gt;cpu: &lt;SPAN style="font-family: monospace;"&gt;&lt;SPAN style="color: #000000; background-color: #ffffff;"&gt;i9-9980HK&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN style="font-family: monospace;"&gt;&lt;SPAN style="color: #000000; background-color: #ffffff;"&gt;I'll try to compile a stand-alone example.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Dec 2021 22:56:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1342276#M32402</guid>
      <dc:creator>may_ka</dc:creator>
      <dc:date>2021-12-07T22:56:53Z</dc:date>
    </item>
    <item>
      <title>Re:processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1342349#M32403</link>
      <description>&lt;P&gt;Hi Karl,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for providing the details.&lt;/P&gt;&lt;P&gt;We are working on your issue internally, we will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;&amp;gt;&amp;gt;I'll try to compile a stand-alone example.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Meanwhile, you can share your example code so that it would help us to get better insights regarding the issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.  &lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Dec 2021 05:03:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1342349#M32403</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-12-08T05:03:51Z</dc:date>
    </item>
    <item>
      <title>Re:processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1343541#M32429</link>
      <description>&lt;P&gt;Karl,&lt;/P&gt;&lt;P&gt; Are there any reproducers here? Checking the problem on my end I see ~ the same performance for moderate and input problem sizes. The only difference we could see in the case when the input problem &amp;lt; 100. In such cases, if we can run the gemm many times and measure the minimum execution time, the performance would be the same as well. &lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 13 Dec 2021 03:51:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1343541#M32429</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-12-13T03:51:44Z</dc:date>
    </item>
    <item>
      <title>Re: processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1343620#M32431</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;thanks for looking into this.&lt;/P&gt;
&lt;P&gt;Unfortunately with the program below I cannot reproduce the problem, and the program where it turned up is not a small reproducer. So I leave it as such for the time being.&lt;/P&gt;
&lt;P&gt;Best&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#include &amp;lt;string&amp;gt;
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;sstream&amp;gt;
#include "mkl.h"
#include &amp;lt;vector&amp;gt;
#include &amp;lt;random&amp;gt;
int main(int argc, char** argv){
  try{
    std::stringstream ss;std::string x,msg;
    if(argc!=4){
      msg="error. require 3 command line arguments: row dimesion of matrix 1, column dimension of matrix 2, number of iterations.";throw msg;
    }
    long long i0nrow1=0,i0ncol2=0, niter=0;
    x=argv[1]; ss&amp;lt;&amp;lt;x;ss &amp;gt;&amp;gt; i0nrow1;ss.clear();
    x=argv[2]; ss&amp;lt;&amp;lt;x;ss &amp;gt;&amp;gt; i0ncol2;ss.clear();
    x=argv[3]; ss&amp;lt;&amp;lt;x;ss &amp;gt;&amp;gt; niter;ss.clear();
    if(i0nrow1&amp;lt;1 || i0ncol2&amp;lt;1 || niter&amp;lt;1){
      msg="error. invalid dimensions";
      throw msg;
    }
    std::random_device rd;
    std::default_random_engine eng(rd());
    std::uniform_real_distribution&amp;lt;double&amp;gt; distr(0,1);
    std::vector&amp;lt;std::vector&amp;lt;double&amp;gt;&amp;gt; a,b,c;
    a.resize(8);b.resize(8);c.resize(8);
    for(int i=0;i&amp;lt;8;++i){
      a[i].resize(i0nrow1*i0nrow1);b[i].resize(i0nrow1*i0ncol2);c[i].resize(i0nrow1*i0ncol2);
      for(auto x : a[i]){x=distr(eng);}
      for(auto x : b[i]){x=distr(eng);}
      for(auto x : c[i]){x=0.0;}
    }
#pragma omp parallel for num_threads(8)
    for(int j=0;j&amp;lt;a.size();++j){
      for(int i=0;i&amp;lt;niter;++i){
	int save=mkl_set_num_threads_local(1);
	cblas_dsymm(CblasColMajor,
		    CblasLeft,
		    CblasUpper,
		    i0nrow1,
		    i0ncol2,
		    1.0,
		    a[j].data(),
		    i0nrow1,
		    b[j].data(),
		    i0nrow1,
		    0.0,
		    c[j].data(),
		    i0nrow1
		    );
	mkl_set_num_threads_local(save);
      }
    }
  }catch(std::string msg){
    std::cout&amp;lt;&amp;lt;"an error has occured: "+msg&amp;lt;&amp;lt;std::endl;
    return(1);
  }
  return(0);
}&lt;/LI-CODE&gt;</description>
      <pubDate>Mon, 13 Dec 2021 11:00:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1343620#M32431</guid>
      <dc:creator>may_ka</dc:creator>
      <dc:date>2021-12-13T11:00:57Z</dc:date>
    </item>
    <item>
      <title>Re:processing time sequential vs threaded+mkl_set_num_threads_local(1)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1344834#M32468</link>
      <description>&lt;P&gt;This thread is closing and we will no longer respond to this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 17 Dec 2021 03:46:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/processing-time-sequential-vs-threaded-mkl-set-num-threads-local/m-p/1344834#M32468</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-12-17T03:46:14Z</dc:date>
    </item>
  </channel>
</rss>

