using Logical core gots more slow by mkl.

Gaiger_Chen · ‎04-08-2011

Hi:

I have using MKL to replace lapack and fftw3 by run the openmx, a material simulation code.

gcc -msse4.2 for the openmx
( the openmx could not been build by icc, or you would got wrong result.)

======================
case 1:
ifort -msse4.2 for lapack and blas

icc -msse2 -openmp for fftw3

vs

case2:
mkl
========================

my cpu is i3 330M, as you know , it is 2 true core with 4 logical core.
input is in the work folder.

case 1 with 4 thread :

Met.dat : 41s
GaAs : 347s
C60: 81s

case1 with 2 thread:

Met.dat : 41s
GaAs : 290s
C60: 87s

case2 with 4 thread:

Met.dat : 40s
GaAs : 327s
C60: 94s

I do not know why the mkl would slow than auto-vectorizing by using 4 threads.
MKL is vectorizing lapack/blas by hand, not by compiler, it should better than machine done that.
Is it means, application should using number of true core instead of logical core by using MKL?

thank you.

mecej4 · ‎04-08-2011

You may find some answers to your questions in this earlier thread:

http://software.intel.com/en-us/forums/showthread.php?t=77753&o=a&s=lr