- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some time ago I have already opened a similar problem in the premier support. The difference was between MKL and some OS subroutines.
Now in the MKL documentation I can find:
___________________________________________
mkl_get_max_threads
Gets the number of OpenMP* threads targeted for parallelism
____________________________________________________
The following piece of code:
================================================
NMAXTH = OMP_GET_MAX_THREADS()
!
NPROC = OMP_GET_NUM_PROCS()
!
ITH = OMP_GET_THREAD_NUM()
!
! MX_THREADS = NMAXTH
NMAXTH0 = NMAXTH
!
NTMKL = MKL_GET_MAX_THREADS()
======================================================
gives:
NMAXTH=8, NPROC = 8, NTMKL=4.
The running workstation is: Intel Core i7-4810MQ CPU 2.80 Ghz.
Running compiler:
Intel® Parallel Studio XE 2016 Update 4 Composer Edition for Fortran Windows* Integration for Microsoft Visual Studio* 2015, Version 16.0.0063.14, Copyright © 2002-2016 Intel Corporation. All rights reserved.
Intel should clarify differences.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This would be better asked in the MKL forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mkl_get_max_threads returns the number of OpenMP threads for Intel MKL to use in internal parallel regions. This number depends on whether dynamic adjustment of the number of threads by Intel MKL is disabled (by an environment setting or in a function call):
-
If the dynamic adjustment is disabled, the function inspects the environment settings and return values of the function calls below in the order they are listed until it finds a non-zero value:
-
A call to mkl_set_num_threads_local
-
The last of the calls to mkl_set_num_threads or mkl_domain_set_num_threads( …, MKL_DOMAIN_ALL)
-
The MKL_DOMAIN_NUM_THREADS environment variable with the MKL_DOMAIN_ALL tag
-
The MKL_NUM_THREADS environment variable
-
A call to omp_set_num_threads
-
The OMP_NUM_THREADS environment variable
-
-
If the dynamic adjustment is enabled, the function returns the number of physical cores on your system.
The number of threads returned by this function is a hint, and Intel MKL may actually use a different number.
Reference:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
>>...The number of threads returned by this function is a hint, and Intel MKL may actually use a different number.
By default a number of threads used by MKL is equal to the number of cores.
Yes, although I am hoping this will change soon for Intel Xeon Phi x200 processors, where best performance may often be 2 threads per core.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the exact reason why this cannot be fixed to what everyone would obviously expect -- use full power of your CPU/all threads by default? Independently of threads per core?
This is the most tired MKL problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MKL performance is usually best with 1 hardware thread per core. For many HPC kernels and applications it is best not to use all available hardware threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For memory bounded functions like vdMul I double the performance by setting number of threads manually
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For VM functions I don't think there is any cache reuse...
Is it not possible to differentiate between different parallelization regimes, one for BLAS functions and something else for the VM/VSL pack?
I turned off hyper-threading for now...
"Intel® Math Kernel Library (Intel® MKL) accelerates math processing routines that increase application performance and reduce development time."
Seriously, VM functions running at 50% capacity on an Intel processor with hyper-threading (Intel's invention)?
If there is one company that should be able to solve this, it is definitely Intel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mikhail Kovalev wrote:
For VM functions I don't think there is any cache reuse...
Is it not possible to differentiate between different parallelization regimes, one for BLAS functions and something else for the VM/VSL pack?
I turned off hyper-threading for now...
"Intel® Math Kernel Library (Intel® MKL) accelerates math processing routines that increase application performance and reduce development time."
Seriously, VM functions running at 50% capacity on an Intel processor with hyper-threading (Intel's invention)?
If there is one company that should be able to solve this, it is definitely Intel.
Intel never intended hyperthreading to give a major boost to floating point applications with normal cache locality. Intel tried a slightly different tack with the MIC KNC but the current KNL returns to the favoritism for 1 thread per core with MKL. There have also been CPUs without HT, but Intel didn't consider the benefit large enough to continue (if you consider that a solution).
If you read these forums carefully, you will see reports about how hyperthreading gives a small benefit in many cases even though just 1 thread per core is active at the MKL level.
You may have a point that specialized applications which use VML may have low cache locality, but certainly not all of them would be that way.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page