Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Multithreading with MKl Performance Drop

tey__aaron
Beginner
1,064 Views

Hi all,

Im first time user of MKL library and I thought a good place for me to get the hang of it is to replicate the results on this intel blog post:

https://software.intel.com/en-us/blogs/2017/04/18/intel-and-facebook-collaborate-to-boost-caffe2-performance-on-intel-cpu-s

Obviously I'm not using the same CPU so Im not expecting identical results. However I'm seeing negative scaling when multi-threading.

I build Caffe2 with MKL BLAS and OpenMP enabled. I'm using the same benchmark mentioned in the blog post: convnet_benchmark.py (https://github.com/pytorch/pytorch/blob/master/caffe2/python/convnet_benchmarks.py)

Through various reading I found out that it's often best to set OMP_NUM_THREADS to 1 and MKL_NUM_THREADS to no more than the maximum number of physical cores. So I run the benchmark like so:

export MKL_NUM_THREADS="8"
export OMP_NUM_THREADS="1"
python convnet_benchmarks.py --batch_size 8 --model AlexNet --iterations 10 --warmup_iterations 1 --cpu

I use mpstat to monitor core usage and confirm that it's in fact running on multiple cores (and it is) and yet the performance drops, even if I run the benchmark on only 2 threads. It seems to me that there is a lot of overhead with using MKL_NUM_THREADS. Has anyone else ran into similar issues? I've noticed the topic of overhead come up here and there on the forms but it doesn't seem to be the same issue.

 

0 Kudos
1 Reply
Ying_H_Intel
Employee
1,064 Views

Hi Tey,
If it is possible, could you please try export MKL_VERBOSE=1 before run the two performance  and copy the result here?

Second, how about if you unset MKL_NUM_THREADS  and just try OMP_NUM_THREADS = 2  or 8 as the article and copy the result?


Best Regards,
Ying

 

 

 

0 Kudos
Reply