<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic the problem size is too small in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174635#M28836</link>
    <description>&lt;P&gt;the problem size is too small. Do you see the similar performance regression with biggestt problem size too?&lt;/P&gt;</description>
    <pubDate>Fri, 24 Aug 2018 04:27:52 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2018-08-24T04:27:52Z</dc:date>
    <item>
      <title>Big Performance Problem with PARDISO 2018 Update 3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174633#M28834</link>
      <description>&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;Hello folks,&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;I've a strange performance problem with PARDISO on Windows. Before I open a support call I'll hope to get some feedback in this forum. &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;I'm using Intel® Parallel Studio XE 2018 Update 3 Composer Edition for Fortran Windows*, Version 18.0.0040.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;I have noticed that &lt;STRONG&gt;parallel processing in PARDISO in MKL version 2018.0.3 does not work at all&lt;/STRONG&gt; and processing with only one thread is significantly slower than in version 2016.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;Attached I've a small C++ test program and sample data to solve a small system multiple time. &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;BR /&gt;
	&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT size="3"&gt;&lt;FONT color="#000000"&gt;&lt;FONT face="Calibri"&gt;When I run the program using the MKL DLLs from version 2018.0.3 I get following result:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;&amp;gt;Release\pardiso.exe _data\mat.mm _data\b.mm
Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for 32-bit applications
Solving matrix file _data\mat.mm with vector data _data\b.mm.
Data: rows=445, cols=445, values=1339
MKL threads: 6
Performance: Loops=10000, Time=2.785514 sec
&lt;/PRE&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;And now the funny stuff starts. The same program executed with MKL DLLs from version 2016 (11.3.3) create the following result:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;&amp;gt;Release\pardiso.exe _data\mat.mm _data\b.mm
Intel(R) Math Kernel Library Version 11.3.3 Product Build 20160413 for 32-bit applications
Solving matrix file _data\mat.mm with vector data _data\b.mm.
Data: rows=445, cols=445, values=1339
MKL threads: 6
Performance: Loops=10000, Time=1.171534 sec
&lt;/PRE&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;And it's gonna get worse. The new PARDISO version 2018.0.3 uses a big amount of CPU time for multiple threads but it is slower compared with execution with only one single thread!&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN lang="EN-US" style="margin: 0px; line-height: 107%; font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;&lt;FONT color="#000000"&gt;According to my understanding I've configured all stuff correct. And as it can be seen, using the old MKL stuff from 2016 it works fine.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Aug 2018 18:19:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174633#M28834</guid>
      <dc:creator>Göttinger__Michael</dc:creator>
      <dc:date>2018-08-23T18:19:29Z</dc:date>
    </item>
    <item>
      <title>For better understanding I</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174634#M28835</link>
      <description>&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;For better understanding I have attached log files containing PARDISO diagnostic data. It shows results from single and multicore runs. This also makes it clear that 6 threads are really used and at the same time the performance of MFLOPS decreases.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;This is the result form 6 core parallel calculation:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

&amp;lt; Linear system Ax = b &amp;gt;
             number of equations:           445
             number of non-zeros in A:      1339
             number of non-zeros in A (%): 0.676177

             number of right-hand sides:    1

&amp;lt; Factors L and U &amp;gt;
             number of columns for each panel: 128
             number of independent subgraphs:  0
&amp;lt; Preprocessing with state of the art partitioning metis&amp;gt;
             number of supernodes:                    427
             size of largest supernode:               2
             number of non-zeros in L:                1153
             number of non-zeros in U:                672
             number of non-zeros in L+U:              1825
             gflop   for the numerical factorization: 0.000015

             gflop/s for the numerical factorization: 0.000479

Matrix Performance: Loops=1 Time=0.182937 sec&lt;/PRE&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;Here comes now the single core result. It has a better gflop/s performance as using MKL with 6 cores:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;Statistics:
===========
Parallel Direct Factorization is running on 1 OpenMP

&amp;lt; Linear system Ax = b &amp;gt;
             number of equations:           445
             number of non-zeros in A:      1339
             number of non-zeros in A (%): 0.676177

             number of right-hand sides:    1

&amp;lt; Factors L and U &amp;gt;
             number of columns for each panel: 128
             number of independent subgraphs:  0
&amp;lt; Preprocessing with state of the art partitioning metis&amp;gt;
             number of supernodes:                    427
             size of largest supernode:               2
             number of non-zeros in L:                1153
             number of non-zeros in U:                672
             number of non-zeros in L+U:              1825
             gflop   for the numerical factorization: 0.000015

             gflop/s for the numerical factorization: 0.000532

Matrix Performance: Loops=1 Time=0.164001 sec
&lt;/PRE&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Aug 2018 18:31:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174634#M28835</guid>
      <dc:creator>Göttinger__Michael</dc:creator>
      <dc:date>2018-08-23T18:31:00Z</dc:date>
    </item>
    <item>
      <title>the problem size is too small</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174635#M28836</link>
      <description>&lt;P&gt;the problem size is too small. Do you see the similar performance regression with biggestt problem size too?&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 04:27:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174635#M28836</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-08-24T04:27:52Z</dc:date>
    </item>
    <item>
      <title>Quote:Gennady F. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174636#M28837</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Gennady F. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;the problem size is too small. Do you see the similar performance regression with biggestt problem size too?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;In my real application I can see same performance problem with larger systems too.&lt;BR /&gt;
	&lt;BR /&gt;
	Anyway, I'll verify it in the small test program too. Please feel free to use my attached sample and any MM data file to verify it which a larger data set to be solved.&lt;/P&gt;

&lt;P&gt;The main problem for me is that it seems to be 3 times slower in MKL 2018 as it was in MKL 2016. I'm happy to get feedback about compiler options and other settings which can be changed to get better PARDISO performance in MKL 2018 (or at least the same one as it was in the past).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 06:36:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174636#M28837</guid>
      <dc:creator>Göttinger__Michael</dc:creator>
      <dc:date>2018-08-24T06:36:20Z</dc:date>
    </item>
    <item>
      <title>Quote:Gennady F. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174637#M28838</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Gennady F. (Intel) wrote:&lt;BR /&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;the problem size is too small. Do you see the similar performance regression with biggestt problem size too?&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;&lt;/SPAN&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;
	&lt;BR /&gt;
	I am interested in the performance of Pardiso for systems with a number of equations around 500.&lt;BR /&gt;
	So possible solutions are: a) do not use pardiso for sparse systems with N&amp;lt;nnn. b)&amp;nbsp; use pardiso but set max no of cores to 1.&amp;nbsp; c) ......&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Best regards&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 08:54:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174637#M28838</guid>
      <dc:creator>LRaim</dc:creator>
      <dc:date>2018-08-24T08:54:40Z</dc:date>
    </item>
    <item>
      <title>As mentioned in my previous</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174638#M28839</link>
      <description>&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;As mentioned in my previous post, I've done a test with a little bit larger matrix to be solved. Now the matrix is 131458x131458 with 712722 non-zero values. This is the typical size for our application. &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;The same performance problem in MKL 2018 is here too:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;&amp;gt;Release\pardiso.exe _data\mat2.mm _data\b2.mm
Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for 32-bit applications
Solving matrix file _data\mat2.mm with vector data _data\b2.mm.
Data: rows=131458, cols=131458, values=712722
MKL threads: 6
Performance: Loops=100, Time=8.250383 sec
&lt;/PRE&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;Same system solved with MKL 2016:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;&amp;gt;Release\pardiso.exe _data\mat2.mm _data\b2.mm
Intel(R) Math Kernel Library Version 11.3.3 Product Build 20160413 for 32-bit applications
Solving matrix file _data\mat2.mm with vector data _data\b2.mm.
Data: rows=131458, cols=131458, values=712722
MKL threads: 6
Performance: Loops=100, Time=4.823882 sec
&lt;/PRE&gt;

&lt;P style="margin: 0px 0px 10.66px;"&gt;&lt;SPAN lang="EN-US" style="margin: 0px;"&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;As you can clearly see, the new MKL 2018 it about 50% slower as older versions. &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 09:08:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174638#M28839</guid>
      <dc:creator>Göttinger__Michael</dc:creator>
      <dc:date>2018-08-24T09:08:06Z</dc:date>
    </item>
    <item>
      <title>I recently did the same</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174639#M28840</link>
      <description>&lt;P&gt;I recently did the same upgrade in versions. I see a similar downgrade in run times, BUT it now gives better accuracy on my ill conditioned matrices and now matches IMSL and SuperLU in this respect. It was quite poor before and accuracy is as important to me as the speed.&lt;/P&gt;

&lt;P&gt;I cannot say anything about multi-core as I long gave up on that aspect of PARDISO. But it might be worth another look now.&lt;/P&gt;

&lt;P&gt;My problem sizes are between 500 and 20000 freedoms.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 17:19:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174639#M28840</guid>
      <dc:creator>Andrew_Smith</dc:creator>
      <dc:date>2018-08-24T17:19:14Z</dc:date>
    </item>
    <item>
      <title>thanks Andrew and Michael. I</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174640#M28841</link>
      <description>&lt;P&gt;thanks Andrew and Michael. I managed to reproduced the issue on our side and the case is escalated. we will keep you updated with the status.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 10:21:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174640#M28841</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-08-27T10:21:28Z</dc:date>
    </item>
    <item>
      <title>Dears,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174641#M28842</link>
      <description>&lt;P&gt;Dears,&lt;/P&gt;&lt;P&gt;has this issue in PARDISO been fixed in any of the more recent&amp;nbsp;releases of MKL?&lt;/P&gt;&lt;P&gt;Thanks and kind regards&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2019 17:35:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174641#M28842</guid>
      <dc:creator>Beccaria__Massimilia</dc:creator>
      <dc:date>2019-07-17T17:35:59Z</dc:date>
    </item>
    <item>
      <title>Quote:Gennady F. (Blackbelt)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174642#M28843</link>
      <description>&lt;P&gt;Dears,&lt;/P&gt;&lt;P&gt;Is it possible to have an update on this?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Gennady F. (Blackbelt) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thanks Andrew and Michael. I managed to reproduced the issue on our side and the case is escalated. we will keep you updated with the status.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2019 17:22:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Big-Performance-Problem-with-PARDISO-2018-Update-3/m-p/1174642#M28843</guid>
      <dc:creator>Beccaria__Massimilia</dc:creator>
      <dc:date>2019-07-23T17:22:41Z</dc:date>
    </item>
  </channel>
</rss>

