<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi zer0nes, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973384#M16876</link>
    <description>&lt;P&gt;Hi zer0nes,&lt;/P&gt;
&lt;P&gt;Things I'd try: (1) check for memory leaks (monitor task manager, as Sergey K proposed), (2) affinitize the threads (e.g. set KMP_AFFINITY=compact), (3) check if MKL memory allocator is the cause (set MKL_DISABLE_FAST_MM).&lt;/P&gt;
&lt;P&gt;Thanks&lt;BR /&gt;Dima&lt;/P&gt;</description>
    <pubDate>Thu, 13 Jun 2013 09:00:35 GMT</pubDate>
    <dc:creator>Dmitry_B_Intel</dc:creator>
    <dc:date>2013-06-13T09:00:35Z</dc:date>
    <item>
      <title>Performance gets worse over time for the same instructions</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973381#M16873</link>
      <description>&lt;P&gt;First, I'm not if this is the right forum for this question. I don't know what is as the reason could be due to hardware or MKL or .NET or some other hidden factors.&lt;/P&gt;
&lt;P&gt;I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training.&lt;/P&gt;
&lt;P&gt;I have observed unpredictable performance across iterations (example below) and woud like to understand why. In some other runs, the number of connections processed per second dropped to ~600M for a few iterations (very strange). For the one below, it took 6h to finish the training (i.e. each iteration takes about 12 minutes on average). It's rather consistent that the perf degrates towards the end.&amp;nbsp;The perf accounting is more consistent when I run a smaller job (e.g. 20 minutes to finish).&lt;/P&gt;
&lt;P&gt;The code is large and not sharable. If you can't pinpoint why, a hint to help me investigate further would also be appreciated.&lt;/P&gt;
&lt;P&gt;[plain]&lt;/P&gt;
&lt;P&gt;Iterations:1/30, 1504.65M connections processed per second&lt;BR /&gt;Iterations:2/30, 1505.16M connections processed per second&lt;BR /&gt;Iterations:3/30, 1505.16M connections processed per second&lt;BR /&gt;Iterations:4/30, 1504.96M connections processed per second&lt;BR /&gt;Iterations:5/30, 1503.38M connections processed per second&lt;BR /&gt;Iterations:6/30, 1504.68M connections processed per second&lt;BR /&gt;Iterations:7/30, 1502.40M connections processed per second&lt;BR /&gt;Iterations:8/30, 1506.11M connections processed per second&lt;BR /&gt;Iterations:9/30, 1503.20M connections processed per second&lt;BR /&gt;Iterations:10/30, 1504.95M connections processed per second&lt;BR /&gt;Iterations:11/30, 1502.34M connections processed per second&lt;BR /&gt;Iterations:12/30, 1498.91M connections processed per second&lt;BR /&gt;Iterations:13/30, 1490.70M connections processed per second&lt;BR /&gt;Iterations:14/30, 1477.59M connections processed per second&lt;BR /&gt;Iterations:15/30, 1459.92M connections processed per second&lt;BR /&gt;Iterations:16/30, 1433.61M connections processed per second&lt;BR /&gt;Iterations:17/30, 1402.28M connections processed per second&lt;BR /&gt;Iterations:18/30, 1356.30M connections processed per second&lt;BR /&gt;Iterations:19/30, 1342.68M connections processed per second&lt;BR /&gt;Iterations:20/30, 1306.84M connections processed per second&lt;BR /&gt;Iterations:21/30, 1263.10M connections processed per second&lt;BR /&gt;Iterations:22/30, 1236.72M connections processed per second&lt;BR /&gt;Iterations:23/30, 1209.60M connections processed per second&lt;BR /&gt;Iterations:24/30, 1183.91M connections processed per second&lt;BR /&gt;Iterations:25/30, 1157.60M connections processed per second&lt;BR /&gt;Iterations:26/30, 1140.60M connections processed per second&lt;BR /&gt;Iterations:27/30, 1112.54M connections processed per second&lt;BR /&gt;Iterations:28/30, 1086.06M connections processed per second&lt;BR /&gt;Iterations:29/30, 1071.61M connections processed per second&lt;BR /&gt;Iterations:30/30, 1055.94M connections processed per second&lt;/P&gt;
&lt;P&gt;[/plain]&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2013 20:47:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973381#M16873</guid>
      <dc:creator>zer0nes</dc:creator>
      <dc:date>2013-05-27T20:47:10Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973382#M16874</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Which MKL functions are used in your application? Also is the memory enough for the computation during the iterations (considering some MKL function may internally allocate some memory ).&lt;/P&gt;
&lt;P&gt;Also, it may be helpful to run some performance profiling tools, for example, Intel Vtune Amplifier, to profile your application, and understand which part of the code is taking major time.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Chao&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2013 00:57:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973382#M16874</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2013-05-28T00:57:56Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...I have a neural network</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973383#M16875</link>
      <description>&amp;gt;&amp;gt;...I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled
&amp;gt;&amp;gt;dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between
&amp;gt;&amp;gt;iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training...

Since your description is too generic I would suggest to start commenting out some parts of codes followed by a set of tests. Another simple verification: take a look at Windows Task Manager for resource leaks, and verify that memory usage is stable ( Not growing ).</description>
      <pubDate>Tue, 28 May 2013 04:33:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973383#M16875</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-28T04:33:00Z</dc:date>
    </item>
    <item>
      <title>Hi zer0nes,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973384#M16876</link>
      <description>&lt;P&gt;Hi zer0nes,&lt;/P&gt;
&lt;P&gt;Things I'd try: (1) check for memory leaks (monitor task manager, as Sergey K proposed), (2) affinitize the threads (e.g. set KMP_AFFINITY=compact), (3) check if MKL memory allocator is the cause (set MKL_DISABLE_FAST_MM).&lt;/P&gt;
&lt;P&gt;Thanks&lt;BR /&gt;Dima&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2013 09:00:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973384#M16876</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2013-06-13T09:00:35Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...check if MKL memory</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973385#M16877</link>
      <description>&amp;gt;&amp;gt;...check if MKL memory allocator is the cause ( &lt;STRONG&gt;set MKL_DISABLE_FAST_MM&lt;/STRONG&gt; ).

Hi Dmitry,

Could you explain where &lt;STRONG&gt;MKL_DISABLE_FAST_MM&lt;/STRONG&gt; comes from? Do you mean an environment variable, or a macro or a function?

I see that there is a description for &lt;STRONG&gt;MKL_Disable_Fast_MM&lt;/STRONG&gt; function declared in &lt;STRONG&gt;mkl_service.h&lt;/STRONG&gt; as:
...
_Mkl_Api( int, MKL_Disable_Fast_MM, ( void ) )
#define  mkl_disable_fast_mm MKL_Disable_Fast_MM
...

Thanks in advance.</description>
      <pubDate>Fri, 14 Jun 2013 00:15:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973385#M16877</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-06-14T00:15:26Z</dc:date>
    </item>
    <item>
      <title>Thanks to all :).</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973386#M16878</link>
      <description>&lt;P&gt;Thanks to all :).&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jun 2013 00:20:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-gets-worse-over-time-for-the-same-instructions/m-p/973386#M16878</guid>
      <dc:creator>zer0nes</dc:creator>
      <dc:date>2013-06-17T00:20:06Z</dc:date>
    </item>
  </channel>
</rss>

