<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800384#M2971</link>
    <description>&lt;P&gt;kmp_wait_sleep is related to the OpenMP library which MKL uses for threading. May be, your computation is not big enough for the number of threads you use.&lt;/P&gt;</description>
    <pubDate>Tue, 26 Oct 2010 17:13:44 GMT</pubDate>
    <dc:creator>VipinKumar_E_Intel</dc:creator>
    <dc:date>2010-10-26T17:13:44Z</dc:date>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800383#M2970</link>
      <description>I am profiling using Amplifier XE 2011 on a 4 core machine Windows 64-bit machine and trying to optimize our use of MKL.&lt;BR /&gt;Ampflier shows that a significant amount of time is spent in _kmp_wait_sleep called by BaseThreadStart. Our code uses MKL extensively. I am trying to understand "what this means" and how to improve this. We use MKL essentially as a black box, a lot of MKL time is spent in [dz]gemm3. &lt;BR /&gt;&lt;BR /&gt;BTW, Amplifier XE 2011 is excellent - a worthy replacement for the late lameted Rational Quantify.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 26 Oct 2010 16:44:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800383#M2970</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2010-10-26T16:44:39Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800384#M2971</link>
      <description>&lt;P&gt;kmp_wait_sleep is related to the OpenMP library which MKL uses for threading. May be, your computation is not big enough for the number of threads you use.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Oct 2010 17:13:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800384#M2971</guid>
      <dc:creator>VipinKumar_E_Intel</dc:creator>
      <dc:date>2010-10-26T17:13:44Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800385#M2972</link>
      <description>&lt;P&gt;Can you also please mention the below?&lt;BR /&gt;&lt;BR /&gt;1. Problem size&lt;BR /&gt;2. time spent on [dz]gemm&lt;BR /&gt;3. time spent on _kmp_wait_sleep&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Oct 2010 17:20:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800385#M2972</guid>
      <dc:creator>VipinKumar_E_Intel</dc:creator>
      <dc:date>2010-10-26T17:20:36Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800386#M2973</link>
      <description>&lt;BR /&gt;The matrix sizes are probably 512x512 double precision complex.&lt;BR /&gt;&lt;BR /&gt;CPU Time Overhead Time Wait Time Spin Time Module Function (Full)&lt;BR /&gt;_kmp_wait_sleep 248.483s 0usec 1072.405s 893.734s libguide40.dll _kmp_wait_sleep&lt;BR /&gt;&lt;BR /&gt;The stack shows that this calling sequence is where the "time" is spent, not zgemm, I was wrong about that.&lt;BR /&gt;&lt;BR /&gt;CPU Time Overhead Time Wait Time Spin Time Module Function (Full)&lt;BR /&gt;_kmpc_invoke_task_func&amp;lt;-_kmp_launch_worker&amp;lt;-BaseThreadStart 248.399s 0usec 1072.038s [Unknown] libguide40.dll _kmp_wait_sleep&lt;BR /&gt;&lt;BR /&gt;There is nothing in the stack "above" BaseThreadStart &lt;BR /&gt;&lt;BR /&gt;The Summary says&lt;BR /&gt;CPU 1476s&lt;BR /&gt;Elapsed 636s&lt;BR /&gt;Total thread count 6&lt;BR /&gt;Spin time 960s&lt;BR /&gt;Overhead 0&lt;BR /&gt;&lt;BR /&gt;Top Hot spots&lt;BR /&gt;[libguide40.dll] 278&lt;BR /&gt;NtDelayExecution 277&lt;BR /&gt;_kmp_wait_sleep 248&lt;BR /&gt;daxpy 215&lt;BR /&gt;</description>
      <pubDate>Tue, 26 Oct 2010 18:54:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800386#M2973</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2010-10-26T18:54:19Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800387#M2974</link>
      <description>&lt;P&gt;Please try to play with KMP_BLOCKTIMEenvironment
variable or by thekmp_set_blocktime()function. It will
allow You to manage the amount of time threads wait before sleeping.. .T&lt;SPAN style="font-size: 10.8333px;"&gt;he default value is 200 ms. You can try to set say
100 ms and it may offer better overall performance &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Oct 2010 04:51:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800387#M2974</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-10-27T04:51:16Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800388#M2975</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;Please try using libiomp5-library instead of libguide...&lt;/P&gt;</description>
      <pubDate>Wed, 27 Oct 2010 05:19:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800388#M2975</guid>
      <dc:creator>barragan_villanueva_</dc:creator>
      <dc:date>2010-10-27T05:19:41Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800389#M2976</link>
      <description>If you set affinity, e.g. by KMP_AFFINITY, libiompprof5 can show you if certain threads spend extra time at idle (work imbalance). You'll have to decide what you want to do. Do you want idle threads to yield sooner, according to KMP_BLOCKTIME, or do you want to optimize threading for a number of threads which doesn't fit with the way a function is threaded in MKL, by providing your own source code? Certain commercial applications provide for logging the problem sizes submitted to ?gemm. For example, it seems that large N is required for efficient working of the threading built into MKL ?gemm. Large matrices, with A transposing argument set, would seem, according to public source, to be more dependent on tiling according to the dimensions. If loops are skipped according to zero elements, that could produce idle time.&lt;BR /&gt;</description>
      <pubDate>Wed, 27 Oct 2010 05:34:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800389#M2976</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2010-10-27T05:34:38Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800390#M2977</link>
      <description>Not really getting anywhere with this.. I changed over to use libomp5.dll ( quite a hassle due to some older libraries) and have played with KMP_BLOCKTIME. A smaller KMP_BLOCKTIME resulted in less overall process CPU, but no change in actual elapsed time.&lt;BR /&gt;The profiler shows a lot of time spent in &lt;BR /&gt;&lt;BR /&gt;RtlTryEnterCriticalSection 167.424s ntdll.dll&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 09 Nov 2010 19:06:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800390#M2977</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2010-11-09T19:06:18Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800391#M2978</link>
      <description>We have escalated this issue to our compiler engineering team and we will update you very soon.</description>
      <pubDate>Fri, 03 Dec 2010 12:46:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800391#M2978</guid>
      <dc:creator>VipinKumar_E_Intel</dc:creator>
      <dc:date>2010-12-03T12:46:01Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800392#M2979</link>
      <description>Did we have any results?&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Petros&lt;/DIV&gt;</description>
      <pubDate>Thu, 06 Oct 2011 14:50:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800392#M2979</guid>
      <dc:creator>Petros</dc:creator>
      <dc:date>2011-10-06T14:50:02Z</dc:date>
    </item>
    <item>
      <title>Profiling MKL using Amplifier XE 2011 and _kmp_wait_sleep</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800393#M2980</link>
      <description>Essentially, it is an issue , not suprisingly, when using working with small matrices</description>
      <pubDate>Thu, 06 Oct 2011 14:54:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Profiling-MKL-using-Amplifier-XE-2011-and-kmp-wait-sleep/m-p/800393#M2980</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2011-10-06T14:54:24Z</dc:date>
    </item>
  </channel>
</rss>

