max core/thread utilzation in MKL with Core 2 T5500 and EPD/numpy FFT?

rjsdotorg · ‎06-06-2011

Main question: How do I coerce the MKL to thread my fft calls?
We are using Enthought Python with MKL linked to numpy http://www.enthought.com/epd/mkl/.
A 2D complex fft of random data with shape of (2, 65536) over 100 loops still only uses 50%.

Envronment is XP WIn32, py2.7, Intel MKL version: Intel Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications, max Intel threads: 2
My test script is attached.
On a Core 2 Duo, I:- removed the env var MKL_NUM_THREADS, rebooted, and use mkl.set_num_threads(2)- verified both cores do have affinity checked.

I also read this:
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_win/MKL_UG_managing_performance/Threaded_Routines.htm#fft

Script results:
True
Intel MKL version: Intel Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
Intel cpu_clocks: 8903351421090
Intel cpu_frequency: 1.66251
max Intel threads: 2
using numpy 1.5.1
(2, 65536) items
simple loop 2.54858477242

___________ Script: _________

import numpy
import numpy.fft as fft
print numpy.use_fastnumpy
import time
import mkl

print 'Intel MKL version:', mkl.get_version_string()
print 'Intel cpu_clocks:', mkl.get_cpu_clocks()
print 'Intel cpu_frequency:', mkl.get_cpu_frequency()
#print 'Intel MKL, freeing buffer memory:', mkl.thread_free_buffers()

print 'max Intel threads:', mkl.get_max_threads()
mkl.set_num_threads(2)

N = 2**16

print 'using numpy', numpy.__version__
a = numpy.random.rand(2, N)
print a.shape, 'items'
t0 = time.clock()
for i in range(10):
continue
base = time.clock()-t0
fftn = fft.fftn
t0 = time.clock()
for i in range(10):
r = fftn(a, (N,), (1,))
print 'simple loop', time.clock()-t0-base

VipinKumar_E_Intel · ‎07-05-2012

Have you used threaded MKL libs when build NumPY/SciPY ?