<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic By the way, when we solve the in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151891#M27208</link>
    <description>&lt;P&gt;By the way, when we solve the same systems of equations with the direct solver Pardiso processor utilization is a constant 50% (the system has 32 virtual cores, 16 physical cores).&amp;nbsp; Gonzalo&lt;/P&gt;</description>
    <pubDate>Thu, 05 Jul 2018 19:24:57 GMT</pubDate>
    <dc:creator>Feijoo__Gonzalo</dc:creator>
    <dc:date>2018-07-05T19:24:57Z</dc:date>
    <item>
      <title>subutilization of processor resources by fgmres</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151890#M27207</link>
      <description>&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Hi Everyone,&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;We are developing an application that uses the FGMRES function on the MKL library to solve systems of linear equations as part of Newton iterations.&amp;nbsp; Recently we did a bit of benchmarking and found that, as the number of equations increases, the processor utilization goes down.&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;We instrumented the code and realized that calls to dfgmres take a progressively larger amount of the total time in the solution operation as the number of equations increases.&amp;nbsp; Basically, we modified the "fgmres_full_fnct_c.c" file provided in the mkl examples directory and computed elapsed timed for different operations such as the calls to fgmres and the time to solve reverse communication callbacks such as RCI_request=1 (matrix-vector product), RCI_request=3 (application of preconditioner), etc.&amp;nbsp; Here are a few numbers:&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;number of equations = 480k&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; total solution time = 8.6 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; (rci_request = 1) = 0.7 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; (rci_request = 3) = 2.2 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; calls to dfgmres&amp;nbsp; = 4.9 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;number of equations = 950k&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; total solution time =&amp;nbsp; 27 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; (rci_request = 1) = 1.8 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; (rci_request = 3) = 5.7 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; calls to dfgmres&amp;nbsp; =&amp;nbsp; 18 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;number of equations = 7,150k&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; total solution time = 820 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; (rci_request = 1) =&amp;nbsp; 15 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; (rci_request = 3) =&amp;nbsp; 83 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp; &amp;nbsp; calls to dfgmres&amp;nbsp; = 700 s&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;We also took pictures of the resource manager and noted that processor utilization is very low for large periods of time, as low as 4%, despite the fact that mkl correctly sets the maximum number of threads to the number of cores (16) in the system.&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Does anybody have an idea of what is happening?&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Sincerely,&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Gonzalo&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;PS: We have several, current licenses of Intel Parallel Studio but Intel's support site is not letting me submit this question to priority support because I am not associated with the account that was used to register the product in our office.&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 05 Jul 2018 19:21:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151890#M27207</guid>
      <dc:creator>Feijoo__Gonzalo</dc:creator>
      <dc:date>2018-07-05T19:21:22Z</dc:date>
    </item>
    <item>
      <title>By the way, when we solve the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151891#M27208</link>
      <description>&lt;P&gt;By the way, when we solve the same systems of equations with the direct solver Pardiso processor utilization is a constant 50% (the system has 32 virtual cores, 16 physical cores).&amp;nbsp; Gonzalo&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jul 2018 19:24:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151891#M27208</guid>
      <dc:creator>Feijoo__Gonzalo</dc:creator>
      <dc:date>2018-07-05T19:24:57Z</dc:date>
    </item>
    <item>
      <title>The cause of the case may be</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151892#M27209</link>
      <description>&lt;P&gt;The cause of the case may be that&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;dfgmres&amp;nbsp;&lt;/SPAN&gt;is not threaded or may be not efficiency implemented. What version of mkl do you use? Could you please export env varaible MKL_VERBOSE=1 and check the version number.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jul 2018 04:02:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151892#M27209</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-07-06T04:02:16Z</dc:date>
    </item>
    <item>
      <title>Hi Gennady,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151893#M27210</link>
      <description>&lt;P&gt;Hi Gennady,&lt;/P&gt;

&lt;P&gt;Thank you for reply!&amp;nbsp;&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;We are using version 2017.1.143.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;I was under the impression that dfgmres is parallelized, and would be surprised if it is not.&amp;nbsp; PARDISO, sparse matrix-vector products are parallelized so I thought this would extent to the functions implementing iterative solvers.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Please, let me know!&lt;/P&gt;

&lt;P&gt;Best,&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;Gonzalo&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jul 2018 14:57:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151893#M27210</guid>
      <dc:creator>Feijoo__Gonzalo</dc:creator>
      <dc:date>2018-07-10T14:57:21Z</dc:date>
    </item>
    <item>
      <title>Hello Gonzalo, actually</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151894#M27211</link>
      <description>&lt;P&gt;Hello Gonzalo,&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;actually fgmres is not threaded, but we don't expect to that will be a problem because of the perf bottleneck of such sort of computations - matrix-vector multiplication and precondition handle. But based on your results, you see the bottleneck is fgmres itself.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;How could we check the problem on our side?&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 09:17:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/subutilization-of-processor-resources-by-fgmres/m-p/1151894#M27211</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-07-12T09:17:55Z</dc:date>
    </item>
  </channel>
</rss>

