What is the size of m, n, p and how do you set KMP_AFFINITY for the operation
Could you please set MKL_VERBOSE=1 and KMP_AFFINITY=compact
or expose the MKL_VERBOSE=1 and your.exe and obverse the result?
and Please submit your question to our official support channel: Online Service Center - Intel Support
Thank you very much Ying and Tim for your help.
In our case, m, n, p are all set to 100.We did a test in which we set set MKL_VERBOSE=1 and KMP_AFFINITY=compact, and here is the result we got:
For the parallel version:
MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) for Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) enabled processors, Lnx 1.30GHz lp64 intel_thread NMICDev:0
We still have another question, and we will really appreciate your help:
How can we know which threads are executing a certain cblas-dgemm function? If we could know that, we would be able to put those threads close to each other using proc_list with KMP_AFFINITY.
Thank you very much
manually, you would be able to put those threads close to each other using proc_list with KMP_AFFINITY. and get information for which threads are executing a certain cblas-dgemm function. but it may bring all kind of technique discussion. So you may do that to set cblas-dgemm's openmp threads to proc_list by KMP_AFFINITY
MKL threading is based on OpenMP. you can control them as MKL developer guide mentioned: https://software.intel.com/en-us/node/528550
or intel compiler documentation https://software.intel.com/en-us/cpp-compiler-18.0-developer-guide-and-reference-thread-affinity-int...
and other discussion
theoretically, we don't recommend that.
about the performance, as you tested, if same sgemm function in multi-thread call, then use MKL internal multi-thread may better than your design thread affinity.