<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:MKL's simat_copy poor parallel performance in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1284272#M31365</link>
    <description>&lt;P&gt;Hi Joao,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I have been for your reply about which OS (Windows or Linux) you were using.&lt;/P&gt;&lt;P&gt;You didn't even me the instruction how to build the app.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I went ahead and built this code on both Windows and Linux.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Windows:&lt;/P&gt;&lt;P&gt;icl /Qopenmp parallel_test.c /Qmkl=parallel&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Linux:&lt;/P&gt;&lt;P&gt;gcc &amp;nbsp;-DMKL_ILP64&amp;nbsp;-m64&amp;nbsp;-I"${MKLROOT}/include" parallel_test.c -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I was able to build and link on both Windows and Linux.  However, when I tried to run the code, it gave me a segmentation fault error.&lt;/P&gt;&lt;P&gt;I tested the code on the latest version of oneMKL, 2021.2.0.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Since it has been a long time, I would assume that you already got this issue resolved.  I will go ahead and close this issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 25 May 2021 01:25:11 GMT</pubDate>
    <dc:creator>Khang_N_Intel</dc:creator>
    <dc:date>2021-05-25T01:25:11Z</dc:date>
    <item>
      <title>MKL's simat_copy poor parallel performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1250019#M30764</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;
&lt;P&gt;I've been doing some testing with Intel's MKL simat_copy function and noticed that its multi-threaded version is in most cases slower than its sequential counter-part (even for large matrices).&lt;/P&gt;
&lt;P&gt;The following results were obtained on a Intel i9-10980XE CPU, with environment variables OMP_NUM_THREADS=N and OMP_DYNAMIC=false. I've also tested it with OMP_DYNAMIC=true but the results don't seem to change. The file was compiled using the transposition example Makefile and GCC.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Single-threaded:&lt;/P&gt;
&lt;P&gt;Number of threads:1&lt;BR /&gt;Major version: 2020&lt;BR /&gt;...&lt;BR /&gt;Platform: Intel(R) 64 architecture&lt;BR /&gt;Processor optimization: Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost)&lt;BR /&gt;================================================================&lt;/P&gt;
&lt;P&gt;Transpose took 0.046586 seconds&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Multi-threaded:&lt;/P&gt;
&lt;P&gt;Number of threads 2: Transpose took 0.067779 seconds&lt;/P&gt;
&lt;P&gt;Number of threads 4:&amp;nbsp;Transpose took 0.033118 seconds&lt;/P&gt;
&lt;P&gt;Number of threads 8:&amp;nbsp;Transpose took 0.046896 seconds&lt;/P&gt;
&lt;P&gt;Number of threads 10: Transpose took 0.015994 seconds&lt;/P&gt;
&lt;P&gt;Number of threads 18: Transpose took 0.045859 seconds&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I find these results very strange and can't find away to explain or improve them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any insights regarding how to optimize the parallel version will be deeply appreciated!&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jan 2021 13:24:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1250019#M30764</guid>
      <dc:creator>JoaoAlves95</dc:creator>
      <dc:date>2021-01-26T13:24:45Z</dc:date>
    </item>
    <item>
      <title>Re: MKL's simat_copy poor parallel performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1250020#M30765</link>
      <description>&lt;P&gt;Forgot to add that the input matrix is 8000x8000 and also tested with variable.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jan 2021 13:38:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1250020#M30765</guid>
      <dc:creator>JoaoAlves95</dc:creator>
      <dc:date>2021-01-26T13:38:19Z</dc:date>
    </item>
    <item>
      <title>Re:MKL's simat_copy poor parallel performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1251272#M30780</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reporting this issue. I have forwarded your query to the MKL experts. They will get back to you.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 29 Jan 2021 11:28:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1251272#M30780</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-01-29T11:28:11Z</dc:date>
    </item>
    <item>
      <title>Re:MKL's simat_copy poor parallel performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1284272#M31365</link>
      <description>&lt;P&gt;Hi Joao,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I have been for your reply about which OS (Windows or Linux) you were using.&lt;/P&gt;&lt;P&gt;You didn't even me the instruction how to build the app.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I went ahead and built this code on both Windows and Linux.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Windows:&lt;/P&gt;&lt;P&gt;icl /Qopenmp parallel_test.c /Qmkl=parallel&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Linux:&lt;/P&gt;&lt;P&gt;gcc &amp;nbsp;-DMKL_ILP64&amp;nbsp;-m64&amp;nbsp;-I"${MKLROOT}/include" parallel_test.c -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I was able to build and link on both Windows and Linux.  However, when I tried to run the code, it gave me a segmentation fault error.&lt;/P&gt;&lt;P&gt;I tested the code on the latest version of oneMKL, 2021.2.0.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Since it has been a long time, I would assume that you already got this issue resolved.  I will go ahead and close this issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 25 May 2021 01:25:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-s-simat-copy-poor-parallel-performance/m-p/1284272#M31365</guid>
      <dc:creator>Khang_N_Intel</dc:creator>
      <dc:date>2021-05-25T01:25:11Z</dc:date>
    </item>
  </channel>
</rss>

