<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MKL routines are not threaded in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821639#M4832</link>
    <description>&amp;gt; RCI ISS routines (incliding dcg) are not threaded&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Why? Classical implementation of PCG is very straight-forward and easy to parallel. Are you planning to make it threaded in future?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;gt;sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm using &lt;I&gt;diagonal&lt;/I&gt; matrix storage. Are you sure this behavior is right?&lt;BR /&gt;&lt;BR /&gt;Update: I've tested a couple of runs of BLAS &lt;B&gt;daxpy &lt;/B&gt;routine and CPU usage was 100%. Ok, the main question now is: are you planning to make &lt;B&gt;dcg &lt;/B&gt;threaded in future?&lt;/DIV&gt;</description>
    <pubDate>Thu, 29 Sep 2011 12:00:59 GMT</pubDate>
    <dc:creator>Mikhail_Matrosov</dc:creator>
    <dc:date>2011-09-29T12:00:59Z</dc:date>
    <item>
      <title>MKL routines are not threaded</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821637#M4830</link>
      <description>Hello&lt;BR /&gt;&lt;BR /&gt;I'm using MKL iterative sparse solver and several BLAS routines to solve my SLE. I have two-cored processor Intel Core2 Duo E6550, but application utilizes only one core. That is, during all computation the CPU usage is at constant rate about 50%.&lt;BR /&gt;&lt;BR /&gt;I'm using &lt;B&gt;dcg &lt;/B&gt;routine from CG sparse solver and &lt;B&gt;ddiasymv &lt;/B&gt;routine from BLAS. All they do is performing multiplications (of two vectors or of a vector and sparse matrix), so i expect very good parallelism. Obviously, MKL multithreading is somehow disabled.&lt;BR /&gt;&lt;BR /&gt;I'm using Microsoft Visual Studio 2010 and MKL v10.3 update 6. Project is generated using /MT crt option. I'm using static linkage with the following libraries:&lt;BR /&gt;&lt;BR /&gt;libirc.lib&lt;BR /&gt;mkl_solver.lib&lt;BR /&gt;mkl_intel_c.lib&lt;BR /&gt;mkl_intel_thread.lib&lt;BR /&gt;mkl_core.lib&lt;BR /&gt;libiomp5md.lib&lt;BR /&gt;&lt;BR /&gt;The &lt;B&gt;mkl_get_max_threads&lt;/B&gt; routine returns proper value of 2.&lt;BR /&gt;&lt;BR /&gt;What should I do to enable parallelism?&lt;BR /&gt;</description>
      <pubDate>Thu, 29 Sep 2011 10:31:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821637#M4830</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2011-09-29T10:31:34Z</dc:date>
    </item>
    <item>
      <title>MKL routines are not threaded</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821638#M4831</link>
      <description>&lt;BR /&gt;&lt;DIV&gt;&lt;P&gt;that's because of RCI ISS routines (incliding dcg)
are not threaded. regarding ddiasymv - the sparse matrix multiplication
typically is memory bandwidth limited, with a high cache miss rate. In
such cases pretty difficult to reach the good scalability.&lt;/P&gt;&lt;P&gt;--Gennady&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 29 Sep 2011 11:52:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821638#M4831</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2011-09-29T11:52:51Z</dc:date>
    </item>
    <item>
      <title>MKL routines are not threaded</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821639#M4832</link>
      <description>&amp;gt; RCI ISS routines (incliding dcg) are not threaded&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Why? Classical implementation of PCG is very straight-forward and easy to parallel. Are you planning to make it threaded in future?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;gt;sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm using &lt;I&gt;diagonal&lt;/I&gt; matrix storage. Are you sure this behavior is right?&lt;BR /&gt;&lt;BR /&gt;Update: I've tested a couple of runs of BLAS &lt;B&gt;daxpy &lt;/B&gt;routine and CPU usage was 100%. Ok, the main question now is: are you planning to make &lt;B&gt;dcg &lt;/B&gt;threaded in future?&lt;/DIV&gt;</description>
      <pubDate>Thu, 29 Sep 2011 12:00:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821639#M4832</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2011-09-29T12:00:59Z</dc:date>
    </item>
    <item>
      <title>MKL routines are not threaded</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821640#M4833</link>
      <description>yes, there are such plans, but I can't say exactly when it would be implemented.</description>
      <pubDate>Thu, 29 Sep 2011 14:52:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-routines-are-not-threaded/m-p/821640#M4833</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2011-09-29T14:52:22Z</dc:date>
    </item>
  </channel>
</rss>

