<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Intel Python + Numpy much slower than Ubuntu + pip numpy in Intel® Distribution for Python*</title>
    <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158061#M1176</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I decided to try out the Intel Python version. Installed Conda, all libraries, etc. However, when I run my code it is considerably slower than the python 3.5 version that comes with Ubuntu 16.04 together with Numpy installed via pip. I use Joblib&amp;nbsp;to perform cross-validation&amp;nbsp;and with the intel Python, it takes almost a 100% more time using 2 cores(2jobs). Using without &lt;/SPAN&gt;joblib&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;the Intel Python takes roughly 1.4-1.5 more &lt;/SPAN&gt;time&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;than the OS Python. I run on a&amp;nbsp;Intel® Xeon(R) CPU E5-1603 0 @ 2.80GHz × 4&amp;nbsp; with 8GB of memory.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 16 Nov 2017 23:01:27 GMT</pubDate>
    <dc:creator>hjelm__martin</dc:creator>
    <dc:date>2017-11-16T23:01:27Z</dc:date>
    <item>
      <title>Intel Python + Numpy much slower than Ubuntu + pip numpy</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158061#M1176</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I decided to try out the Intel Python version. Installed Conda, all libraries, etc. However, when I run my code it is considerably slower than the python 3.5 version that comes with Ubuntu 16.04 together with Numpy installed via pip. I use Joblib&amp;nbsp;to perform cross-validation&amp;nbsp;and with the intel Python, it takes almost a 100% more time using 2 cores(2jobs). Using without &lt;/SPAN&gt;joblib&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;the Intel Python takes roughly 1.4-1.5 more &lt;/SPAN&gt;time&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;than the OS Python. I run on a&amp;nbsp;Intel® Xeon(R) CPU E5-1603 0 @ 2.80GHz × 4&amp;nbsp; with 8GB of memory.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Nov 2017 23:01:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158061#M1176</guid>
      <dc:creator>hjelm__martin</dc:creator>
      <dc:date>2017-11-16T23:01:27Z</dc:date>
    </item>
    <item>
      <title>Well I use the Intel Xeon CPU</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158062#M1177</link>
      <description>&lt;P&gt;Well I use the Intel Xeon CPU and I have no idea if I am using the SSEX instructions so I am not sure this&amp;nbsp;applies. All I am doing is massive amounts of matrix multiplications.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Nov 2017 14:35:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158062#M1177</guid>
      <dc:creator>hjelm__martin</dc:creator>
      <dc:date>2017-11-17T14:35:00Z</dc:date>
    </item>
    <item>
      <title>Hi Martin, </title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158063#M1178</link>
      <description>&lt;P&gt;Hi Martin,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;It is likely you are running into issues of over-subscription. Joblib is dispatching parts of work to different processes, each of which calls MKL's GEMM function, which itself is multi-threaded.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;By default MKL would utilize as many threads as the number of cores available on your machine. Hence with each extra concurrent processes spawned by joblib, your computation create more threads than the processor can service, they contend for resources, and slow-down ensues.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;There are 3 possible ways to approach the problem while using the Intel Distribution for Python.&lt;/P&gt;

&lt;P&gt;1. Disable application level parallelism, i.e. do not use joblib. This is suboptimal is your application has significant sequential segements.&lt;/P&gt;

&lt;P&gt;2. Use MKL in sequential mode, by running ``env MKL_THREADING_LAYER=sequential python your_script.py``, which is suboptimal if your application has serial regions (not running in parallel), which use numpy/MKL.&lt;/P&gt;

&lt;P&gt;3. Use package TBB included in the Intel Distribution for Python: &lt;SPAN style="font-size: 13.008px;"&gt;``python -m tbb --ipc your_script.py``&lt;/SPAN&gt;. This should achieve the best of both worlds. Alternatively you could try SMP package (``conda install -c intel smp``)&lt;SPAN style="font-size: 13.008px;"&gt;, which should mitigate the oversubscription by running ``python -m smp your_script.py``&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Please let us know if you run into further issues.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Nov 2017 16:39:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Intel-Python-Numpy-much-slower-than-Ubuntu-pip-numpy/m-p/1158063#M1178</guid>
      <dc:creator>Oleksandr_P_Intel</dc:creator>
      <dc:date>2017-11-17T16:39:00Z</dc:date>
    </item>
  </channel>
</rss>

