<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311012#M31951</link>
    <description>&lt;P&gt;thanks, we will check the case on our end and keep this thread updated.&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 31 Aug 2021 03:51:58 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2021-08-31T03:51:58Z</dc:date>
    <item>
      <title>OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1310395#M31939</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tested the performance of&amp;nbsp;mkl_sparse_s_mv executor using CSR format on a linux machine after setting the hints and optimizing via&amp;nbsp;&lt;SPAN&gt;mkl_sparse_optimize for two different versions of Intel MKL&amp;nbsp;distributed&amp;nbsp;via&amp;nbsp;&lt;/SPAN&gt;OneAPI MKL version 2021.3.0 and&amp;nbsp;Parallel XE Cluster edition 2020.1.217&lt;/P&gt;
&lt;P&gt;I found that mkl_sparse_s_mv executor performance for the latest version is significantly slower than&amp;nbsp;the previous version. This is observed especially for the matrices which are probably converted to the DIA format internally after the inspector stage. Please look into the matter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Aug 2021 16:28:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1310395#M31939</guid>
      <dc:creator>Joy7</dc:creator>
      <dc:date>2021-08-27T16:28:11Z</dc:date>
    </item>
    <item>
      <title>Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1310510#M31943</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;This is unexpected behavior.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could You share more details on this matter? Having the reproducer+input matrix is the best and fastest way to check the problem on our end. Are there any specific CPU types did You run? What is performance differences did You observe?&amp;nbsp; OS? Which threading runtime have You used?&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 28 Aug 2021 03:04:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1310510#M31943</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-08-28T03:04:38Z</dc:date>
    </item>
    <item>
      <title>Re: OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1310828#M31949</link>
      <description>&lt;P&gt;Thank you for the reply Gennady_F_Intel.&lt;/P&gt;
&lt;P&gt;For example, following is the observed performance for a sparse matrix Watson/Bauman, &lt;A href="https://sparse.tamu.edu/Watson/Baumann" target="_self"&gt;https://sparse.tamu.edu/Watson/Baumann &lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;Parallel XE Cluster edition 2020.1.217 :&lt;/STRONG&gt; 21 GFLOPS&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;OneAPI MKL version 2021.3.0 :&amp;nbsp;&lt;/STRONG&gt;12 GLOPS&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Next, the system is Intel Core i7-3930K with &lt;SPAN style="font-family: inherit;"&gt;6 &lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;× &lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;3.20GHz cores and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;Ubuntu Linux 18.04.5 OS.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;The gcc/icc compiler options used to link libraries are&amp;nbsp;&lt;SPAN&gt;-lpfm -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Aug 2021 12:43:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1310828#M31949</guid>
      <dc:creator>Joy7</dc:creator>
      <dc:date>2021-08-30T12:43:37Z</dc:date>
    </item>
    <item>
      <title>Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311012#M31951</link>
      <description>&lt;P&gt;thanks, we will check the case on our end and keep this thread updated.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 31 Aug 2021 03:51:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311012#M31951</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-08-31T03:51:58Z</dc:date>
    </item>
    <item>
      <title>Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311131#M31955</link>
      <description>&lt;P&gt;I checked this workload with double-precision ( mkl_sparse_d_mv) routine and don't see the issue:&lt;/P&gt;&lt;P&gt;....&amp;nbsp;matrix name = Baumann.mtx .....&lt;/P&gt;&lt;P&gt;SIZE == 112211, NNZ ==760631&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;MKL 2020.0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;        MKL 2020.1&amp;nbsp;&amp;nbsp;&amp;nbsp;       MKL 2021.3&lt;/P&gt;&lt;P&gt;1.050142e-05 sec&amp;nbsp;&amp;nbsp;1.013826e-05&amp;nbsp;sec&amp;nbsp;1.004625e-05sec&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;-- more details&lt;/P&gt;&lt;P&gt;Major version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2020&lt;/P&gt;&lt;P&gt;Minor version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0&lt;/P&gt;&lt;P&gt;Update version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0&lt;/P&gt;&lt;P&gt;Product status:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Product&lt;/P&gt;&lt;P&gt;Build:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;20191122&lt;/P&gt;&lt;P&gt;Platform:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Intel(R) 64 architecture&lt;/P&gt;&lt;P&gt;Processor optimization:&amp;nbsp;Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;SIZE == 112211, NNZ ==760631&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;....IE SpBLAS MV Execution Time == &lt;B&gt;1.050142e-05&amp;nbsp;sec&lt;/B&gt;&lt;/P&gt;&lt;P&gt;***********************************&lt;/P&gt;&lt;P&gt;Major version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2020&lt;/P&gt;&lt;P&gt;Minor version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0&lt;/P&gt;&lt;P&gt;Update version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;1&lt;/P&gt;&lt;P&gt;Product status:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Product&lt;/P&gt;&lt;P&gt;Build:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;20200208&lt;/P&gt;&lt;P&gt;Platform:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Intel(R) 64 architecture&lt;/P&gt;&lt;P&gt;Processor optimization:&amp;nbsp;Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;SIZE == 112211, NNZ ==760631&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;....IE SpBLAS MV Execution Time == &lt;B&gt;1.013826e-05&amp;nbsp;sec&lt;/B&gt;&lt;/P&gt;&lt;P&gt;***********************************&lt;/P&gt;&lt;P&gt;Major version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2021&lt;/P&gt;&lt;P&gt;Minor version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0&lt;/P&gt;&lt;P&gt;Update version:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;3&lt;/P&gt;&lt;P&gt;Product status:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Product&lt;/P&gt;&lt;P&gt;Build:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;20210617&lt;/P&gt;&lt;P&gt;Platform:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Intel(R) 64 architecture&lt;/P&gt;&lt;P&gt;Processor optimization:&amp;nbsp;Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors&lt;/P&gt;&lt;P&gt;&amp;nbsp;SIZE == 112211, NNZ ==760631&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;....IE SpBLAS MV Execution Time == &lt;B&gt;1.004625e-05&amp;nbsp;sec&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;&lt;P&gt;here are some CPU and OS specific details: &lt;/P&gt;&lt;P&gt;CPU:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2x Xeon Gold 6148 2.4Ghz 20c (NP=40)  ( Skylake-SP : )&lt;/P&gt;&lt;P&gt;&amp;nbsp;MEMORY:&amp;nbsp;&amp;nbsp;192GB 2666Mhz DDR4 Dual-rank&lt;/P&gt;&lt;P&gt;&amp;nbsp;OS:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;CentOS Linux release 7.9.2009 (Core)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;&lt;P&gt;lp64, intel OpenMP threading&lt;/P&gt;&lt;P&gt;export KMP_AFFINITY=compact,1,0,granularity=fine&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;--Gennady&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 31 Aug 2021 14:54:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311131#M31955</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-08-31T14:54:30Z</dc:date>
    </item>
    <item>
      <title>Re: Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performanc</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311409#M31957</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;SPAN&gt;Gennady for checking.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;However, I tried it once again on my end while making sure that I only change the linked libraries from one version to another.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I confirm that I see the correct vector output on both the runs, but&amp;nbsp;unfortunately&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;I continue to see the performance&amp;nbsp;degradation in the newer version. Also, I want to point out that I run the benchmark multiple times to increase the running time for a more accurate measurement. Also, I run the experiment 30 times to&amp;nbsp;calculate the variance. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I also inform this via&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;mkl_sparse_set_mv_hint (A_csr, &amp;nbsp;SPARSE_OPERATION_NON_TRANSPOSE, &amp;nbsp;descr, &amp;nbsp;num_runs * 30);&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Sep 2021 10:52:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311409#M31957</guid>
      <dc:creator>Joy7</dc:creator>
      <dc:date>2021-09-01T10:52:54Z</dc:date>
    </item>
    <item>
      <title>Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311424#M31958</link>
      <description>&lt;P&gt;Please give us your reproducer to run and check how it works on our side. We also run the 1000 times and report the average execution time. the pseudocode looks like as follow:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;double t1 = dsecnd();&lt;/P&gt;&lt;P&gt;for (i=0;i&amp;lt;ncount;i++) &lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mkl_sparse_d_mv ( .....);&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;double t2 = dsecnd();&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 01 Sep 2021 11:58:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311424#M31958</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-09-01T11:58:15Z</dc:date>
    </item>
    <item>
      <title>Re: Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performanc</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311967#M31975</link>
      <description>&lt;P&gt;Please find attached the tarball. &amp;nbsp;Please follow the commands to compile and execute.&lt;/P&gt;
&lt;P&gt;$ make float&lt;/P&gt;
&lt;P&gt;$ ./run_float &amp;lt;PATH-TO-MATRIX-FILE&amp;gt; &amp;lt;NUMBER-OF-THREADS&amp;gt;&lt;/P&gt;
&lt;P&gt;e.g.&lt;/P&gt;
&lt;P&gt;$ ./run_float ~/Watson/Baumann.mtx 5&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Output : Standard_Deviation, Performance in MFLOPS, Fletcher_Sum&lt;/P&gt;</description>
      <pubDate>Fri, 03 Sep 2021 12:46:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311967#M31975</guid>
      <dc:creator>Joy7</dc:creator>
      <dc:date>2021-09-03T12:46:17Z</dc:date>
    </item>
    <item>
      <title>Re:OneAPI MKL vs parallel XE Cluster Edition MKL : mkl_sparse_s_mv inspector/executor performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311968#M31976</link>
      <description>&lt;P&gt;ok, thanks for the tarball. We will check the case.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 03 Sep 2021 12:50:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-MKL-vs-parallel-XE-Cluster-Edition-MKL-mkl-sparse-s-mv/m-p/1311968#M31976</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-09-03T12:50:00Z</dc:date>
    </item>
  </channel>
</rss>

