Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Optimization/Parallelization in MKL

Aaron_M_
Beginner
384 Views

I have a 16 core system with hyperthreading (32 threads). The matrix calculations are on a 4096x4096 matrix. Using "dfeast_syev", it takes a few mins. And using "cblas_dgemm" it takes a few seconds. When monitoring my computer, it shows me that only 16 threads are being used in the calculation. I want to use all 32 to speed up the program. I am using open-mp when I can, but I thought before running my program, at the command line, I could just type :

"export OMP_NUM_THREADS=32"  and that would do it. But it doesn't work. It still only uses 16. Then I tried;

"export MKL_NUM_THREADS=32", and that does nothing either. When I enter "echo OMP(MKL)_NUM_THREADS" it returns 32 in both cases, but when I monitor my CPU it still only shows 16 threads being used.

Any suggestions?

Aaron.

0 Kudos
4 Replies
Bernard
Valued Contributor I
384 Views

Probably there is contention between two threads running running on each core.As both of threads share the same SIMD vector stack there is possibility that one threads is stalled while waiting to access Vector stack.

0 Kudos
Gennady_F_Intel
Moderator
384 Views

that's happens because of MKL_DYNAMMIC is true. in that case, MKL will choose the best number of threads. HypetThreading doesn't help for this sort computation. 

0 Kudos
Bernard
Valued Contributor I
384 Views

Thanks for explanation.

0 Kudos
Aaron_M_
Beginner
384 Views

Is there no way to speed up the computations? I have 8 dgemm functions in two nested for-loops. The for-loops run from 1-4000. So there are roughly 16million calculations. I was hoping that by parallelizing the BLAS functions further I could speed up the computations.

Thank you,

Aaron.

0 Kudos
Reply