- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
What is the size of m, n, p and how do you set KMP_AFFINITY for the operation
Could you please set MKL_VERBOSE=1 and KMP_AFFINITY=compact
or expose the MKL_VERBOSE=1 and your.exe and obverse the result?
and Please submit your question to our official support channel: Online Service Center - Intel Support
Best regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much Ying and Tim for your help.
In our case, m, n, p are all set to 100.We did a test in which we set set MKL_VERBOSE=1 and KMP_AFFINITY=compact, and here is the result we got:
For the parallel version:
MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) for Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) enabled processors, Lnx 1.30GHz lp64 intel_thread NMICDev:0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much Tim. It will help us a lot. Thanks again !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again,
We still have another question, and we will really appreciate your help:
How can we know which threads are executing a certain cblas-dgemm function? If we could know that, we would be able to put those threads close to each other using proc_list with KMP_AFFINITY.
Thank you very much
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gheibi,
manually, you would be able to put those threads close to each other using proc_list with KMP_AFFINITY. and get information for which threads are executing a certain cblas-dgemm function. but it may bring all kind of technique discussion. So you may do that to set cblas-dgemm's openmp threads to proc_list by KMP_AFFINITY
MKL threading is based on OpenMP. you can control them as MKL developer guide mentioned: https://software.intel.com/en-us/node/528550
or intel compiler documentation https://software.intel.com/en-us/cpp-compiler-18.0-developer-guide-and-reference-thread-affinity-interface-linux-and-windows#LOW_LEVEL_AFFINITY_API
https://software.intel.com/en-us/node/528546#92D6DAD0-A858-4824-9A90-AC2AD2A9C2E1
and other discussion
https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application
https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/283564
theoretically, we don't recommend that.
about the performance, as you tested, if same sgemm function in multi-thread call, then use MKL internal multi-thread may better than your design thread affinity.
Best Regards,
Ying
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page