- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Main question: How do I coerce the MKL to thread my fft calls?
We are using Enthought Python with MKL linked to numpy http://www.enthought.com/epd/mkl/.
A 2D complex fft of random data with shape of (2, 65536) over 100 loops still only uses 50%.
Envronment is XP WIn32, py2.7, Intel MKL version: Intel Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications, max Intel threads: 2
My test script is attached.
On a Core 2 Duo, I:
- removed the env var MKL_NUM_THREADS, rebooted, and use mkl.set_num_threads(2)
- verified both cores do have affinity checked.
I also read this:
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_win/MKL_UG_managing_performance/Threaded_Routines.htm#fft
Script results:
True
Intel MKL version: Intel Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
Intel cpu_clocks: 8903351421090
Intel cpu_frequency: 1.66251
max Intel threads: 2
using numpy 1.5.1
(2, 65536) items
simple loop 2.54858477242
___________ Script: _________
import numpy
import numpy.fft as fft
print numpy.use_fastnumpy
import time
import mkl
print 'Intel MKL version:', mkl.get_version_string()
print 'Intel cpu_clocks:', mkl.get_cpu_clocks()
print 'Intel cpu_frequency:', mkl.get_cpu_frequency()
#print 'Intel MKL, freeing buffer memory:', mkl.thread_free_buffers()
print 'max Intel threads:', mkl.get_max_threads()
mkl.set_num_threads(2)
N = 2**16
print 'using numpy', numpy.__version__
a = numpy.random.rand(2, N)
print a.shape, 'items'
t0 = time.clock()
for i in range(10):
continue
base = time.clock()-t0
fftn = fft.fftn
t0 = time.clock()
for i in range(10):
r = fftn(a, (N,), (1,))
print 'simple loop', time.clock()-t0-base
We are using Enthought Python with MKL linked to numpy http://www.enthought.com/epd/mkl/.
A 2D complex fft of random data with shape of (2, 65536) over 100 loops still only uses 50%.
Envronment is XP WIn32, py2.7, Intel MKL version: Intel Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications, max Intel threads: 2
My test script is attached.
On a Core 2 Duo, I:
- removed the env var MKL_NUM_THREADS, rebooted, and use mkl.set_num_threads(2)
- verified both cores do have affinity checked.
I also read this:
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_win/MKL_UG_managing_performance/Threaded_Routines.htm#fft
Script results:
True
Intel MKL version: Intel Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
Intel cpu_clocks: 8903351421090
Intel cpu_frequency: 1.66251
max Intel threads: 2
using numpy 1.5.1
(2, 65536) items
simple loop 2.54858477242
___________ Script: _________
import numpy
import numpy.fft as fft
print numpy.use_fastnumpy
import time
import mkl
print 'Intel MKL version:', mkl.get_version_string()
print 'Intel cpu_clocks:', mkl.get_cpu_clocks()
print 'Intel cpu_frequency:', mkl.get_cpu_frequency()
#print 'Intel MKL, freeing buffer memory:', mkl.thread_free_buffers()
print 'max Intel threads:', mkl.get_max_threads()
mkl.set_num_threads(2)
N = 2**16
print 'using numpy', numpy.__version__
a = numpy.random.rand(2, N)
print a.shape, 'items'
t0 = time.clock()
for i in range(10):
continue
base = time.clock()-t0
fftn = fft.fftn
t0 = time.clock()
for i in range(10):
r = fftn(a, (N,), (1,))
print 'simple loop', time.clock()-t0-base
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you used threaded MKL libs when build NumPY/SciPY ?

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page