<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic You are spot on, it turns out in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142357#M26419</link>
    <description>&lt;P&gt;You are spot on, it turns out I have 2 cores which both hyperthread.. I had no idea since Ubuntu does not distinguish between the real and fake cores. I can see how 2 threads is the optimal choice and the basic compilation options will correctly pick that out. Thanks!&lt;/P&gt;</description>
    <pubDate>Sat, 13 Oct 2018 13:18:08 GMT</pubDate>
    <dc:creator>d_3</dc:creator>
    <dc:date>2018-10-13T13:18:08Z</dc:date>
    <item>
      <title>ifort/MKL vs Julia/OpenBLAS: ifort/MKL using less cores</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142355#M26417</link>
      <description>&lt;P&gt;I am new to Fortran and I am comparing ifort's speed to Julia. I would greatly appreciate any help or insight into my issues and taking the time to answer some of my questions.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;When running what appears to be similar code, calling LAPACK to diagonalize a large matrix, Julia/OpenBLAS blasts all 4 cores of my laptop (intel i5), while ifort/MKL only use 2 of the 4. If I use the MKL_NUM_THREADS and MKL_DYNAMIC variables, I can force ifort/MKL to use all 4 cores, but the performance actually goes down a little. Even though only 2 cores are running, Fortran still beats Julia/OpenBLAS (not by much though).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Questions: I am just naively watching the CPU activity in the Gnome system monitor. I understand it is possible the Lanczos routines may not be suited for parallel work, but why are 2 cores being used? I would expect either 1 or all cores. Is the CPU doing everything it can for the calculation? What is happening when I set MKL_DYANMIC=false? Is data just being sloshed around the cores in a slow manner, making the cores stay at "100%" but not advancing the calculation?&lt;/P&gt;

&lt;P&gt;Side note: Is is expected for MKL_DYNAMIC and other env variables to be undefined, even after sourcing ~/intel/parallel_studio_xe_2019.0.045/psxevars.sh?&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	Other info: This is on a fresh install of Ubuntu 18.04. I also had to set ulimit -s unlimited to avoid segfaults. The Parallel studio GUI install complained about 32bit libraries, installing libc6-dev-i386 got rid of the warnings. Julia is the standard v1.0.1 build as they describe on their github.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Below are the examples, the Julia eigen() function boils down to calling ?geev(x).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Fortran: compiled with ifort -mkl my_test.f90&amp;nbsp;&lt;/P&gt;

&lt;P&gt;program my_test&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; implicit none&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; integer::n,info,lwork&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; real(8) :: test(1)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; real(8), allocatable:: M(:,:),ansV(:,:),ansE(:),work(:)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; n = 10000&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; allocate(M(1:n,1:n))&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; allocate(ansV(1:n,1:n))&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; allocate(ansE(1:n))&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; ! this first call is just to get the ideal workspace size&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; ! lwork = -1, so dsyev figures out the workspace size (contained in test(1)) and exits&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; call dsyev('V','U',n,ansV,n,ansE,test,-1,info)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; lwork = int(test(1))&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; allocate(work(1:lwork))&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; call random_number(M)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; M = M + transpose(M)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; ansV = M&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; call dsyev('V','U',n,ansV,n,ansE,work,lwork,info)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; print*,ansE&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; print*,lwork&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; print*,info&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; deallocate(M)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; deallocate(ansV)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; deallocate(ansE)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; deallocate(work)&lt;BR /&gt;
	end program my_test&lt;BR /&gt;
	Julia: it is called via julia my_test.jl&lt;/P&gt;

&lt;P&gt;using LinearAlgebra&lt;BR /&gt;
	mat = rand(10000,10000)&lt;BR /&gt;
	mat = mat + transpose(mat)&lt;BR /&gt;
	u,v = eigen(mat)&lt;BR /&gt;
	print(u)&lt;/P&gt;</description>
      <pubDate>Wed, 10 Oct 2018 17:31:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142355#M26417</guid>
      <dc:creator>d_3</dc:creator>
      <dc:date>2018-10-10T17:31:03Z</dc:date>
    </item>
    <item>
      <title>Do you have 4 core via</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142356#M26418</link>
      <description>&lt;P&gt;Do you have 4 core via hyperthreading? Most older CPU's dont benefit from using the hyperthreads for intensive maths.That is probably why Intel have only used the real cores by default.&lt;/P&gt;</description>
      <pubDate>Sat, 13 Oct 2018 08:15:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142356#M26418</guid>
      <dc:creator>Andrew_Smith</dc:creator>
      <dc:date>2018-10-13T08:15:33Z</dc:date>
    </item>
    <item>
      <title>You are spot on, it turns out</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142357#M26419</link>
      <description>&lt;P&gt;You are spot on, it turns out I have 2 cores which both hyperthread.. I had no idea since Ubuntu does not distinguish between the real and fake cores. I can see how 2 threads is the optimal choice and the basic compilation options will correctly pick that out. Thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 13 Oct 2018 13:18:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ifort-MKL-vs-Julia-OpenBLAS-ifort-MKL-using-less-cores/m-p/1142357#M26419</guid>
      <dc:creator>d_3</dc:creator>
      <dc:date>2018-10-13T13:18:08Z</dc:date>
    </item>
  </channel>
</rss>

