<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Tey, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Multithreading-with-MKl-Performance-Drop/m-p/1167254#M28315</link>
    <description>&lt;P&gt;Hi Tey,&lt;BR /&gt;
	If it is possible, could you please&amp;nbsp;try export MKL_VERBOSE=1 before run the two performance&amp;nbsp; and copy the result here?&lt;/P&gt;

&lt;P&gt;Second, how about if you unset MKL_NUM_THREADS&amp;nbsp; and just try OMP_NUM_THREADS = 2&amp;nbsp; or 8 as the article and copy the result?&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	Best Regards,&lt;BR /&gt;
	Ying&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Apr 2018 02:46:16 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2018-04-11T02:46:16Z</dc:date>
    <item>
      <title>Multithreading with MKl Performance Drop</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Multithreading-with-MKl-Performance-Drop/m-p/1167253#M28314</link>
      <description>&lt;P&gt;Hi all,&lt;BR /&gt;
	&lt;BR /&gt;
	Im first time user of MKL library and I thought a good place for me to get the hang of it is to replicate the results on this intel blog post:&lt;A href="https://software.intel.com/en-us/blogs/2017/04/18/intel-and-facebook-collaborate-to-boost-caffe2-performance-on-intel-cpu-s"&gt; &lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/blogs/2017/04/18/intel-and-facebook-collaborate-to-boost-caffe2-performance-on-intel-cpu-s"&gt;https://software.intel.com/en-us/blogs/2017/04/18/intel-and-facebook-collaborate-to-boost-caffe2-performance-on-intel-cpu-s&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Obviously I'm not using the same CPU so Im not expecting identical results. However I'm seeing negative scaling when multi-threading.&lt;/P&gt;

&lt;P&gt;I build Caffe2 with MKL BLAS and OpenMP enabled. I'm using the same benchmark mentioned in the blog post: convnet_benchmark.py (&lt;A href="https://github.com/pytorch/pytorch/blob/master/caffe2/python/convnet_benchmarks.py"&gt;https://github.com/pytorch/pytorch/blob/master/caffe2/python/convnet_benchmarks.py&lt;/A&gt;)&lt;/P&gt;

&lt;P&gt;Through various reading I found out that it's often best to set OMP_NUM_THREADS to 1 and MKL_NUM_THREADS to no more than the maximum number of physical cores. So I run the benchmark like so:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;export MKL_NUM_THREADS="8"
export OMP_NUM_THREADS="1"
python convnet_benchmarks.py --batch_size 8 --model AlexNet --iterations 10 --warmup_iterations 1 --cpu&lt;/PRE&gt;

&lt;P&gt;I use mpstat to monitor core usage and confirm that it's in fact running on multiple cores (and it is) and yet the performance drops, even if I run the benchmark on only 2 threads. It seems to me that there is a lot of overhead with using MKL_NUM_THREADS. Has anyone else ran into similar issues? I've noticed the topic of overhead come up here and there on the forms but it doesn't seem to be the same issue.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Apr 2018 22:33:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Multithreading-with-MKl-Performance-Drop/m-p/1167253#M28314</guid>
      <dc:creator>tey__aaron</dc:creator>
      <dc:date>2018-04-09T22:33:36Z</dc:date>
    </item>
    <item>
      <title>Hi Tey,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Multithreading-with-MKl-Performance-Drop/m-p/1167254#M28315</link>
      <description>&lt;P&gt;Hi Tey,&lt;BR /&gt;
	If it is possible, could you please&amp;nbsp;try export MKL_VERBOSE=1 before run the two performance&amp;nbsp; and copy the result here?&lt;/P&gt;

&lt;P&gt;Second, how about if you unset MKL_NUM_THREADS&amp;nbsp; and just try OMP_NUM_THREADS = 2&amp;nbsp; or 8 as the article and copy the result?&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	Best Regards,&lt;BR /&gt;
	Ying&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Apr 2018 02:46:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Multithreading-with-MKl-Performance-Drop/m-p/1167254#M28315</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-04-11T02:46:16Z</dc:date>
    </item>
  </channel>
</rss>

