Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6592 Discussions

Multithreading with MKl Performance Drop


Hi all,

Im first time user of MKL library and I thought a good place for me to get the hang of it is to replicate the results on this intel blog post:

Obviously I'm not using the same CPU so Im not expecting identical results. However I'm seeing negative scaling when multi-threading.

I build Caffe2 with MKL BLAS and OpenMP enabled. I'm using the same benchmark mentioned in the blog post: (

Through various reading I found out that it's often best to set OMP_NUM_THREADS to 1 and MKL_NUM_THREADS to no more than the maximum number of physical cores. So I run the benchmark like so:

export MKL_NUM_THREADS="8"
export OMP_NUM_THREADS="1"
python --batch_size 8 --model AlexNet --iterations 10 --warmup_iterations 1 --cpu

I use mpstat to monitor core usage and confirm that it's in fact running on multiple cores (and it is) and yet the performance drops, even if I run the benchmark on only 2 threads. It seems to me that there is a lot of overhead with using MKL_NUM_THREADS. Has anyone else ran into similar issues? I've noticed the topic of overhead come up here and there on the forms but it doesn't seem to be the same issue.


0 Kudos
1 Reply

Hi Tey,
If it is possible, could you please try export MKL_VERBOSE=1 before run the two performance  and copy the result here?

Second, how about if you unset MKL_NUM_THREADS  and just try OMP_NUM_THREADS = 2  or 8 as the article and copy the result?

Best Regards,