Solved: There is a difficulty with

bo_y_ · ‎06-25-2017

Hi everyone,

I tried LAPACKE_dgels and change NO thread-nubmer settings at all. I guess the default thread number (the same as phycical core number) is used. As I wathch the CPU usage during the code running, it reach a peak at 50 %. I guess that means using 50% of CPU made the calculation run as fast as it could, and using more than 50% of CPU by hyper-threading only slow it down? Do I understand it right here?

McCalpinJohn · ‎06-27-2017

There is a difficulty with the use of the term "CPU" in this context.

When HyperThreading is enabled, each "physical core" is configured to support (typically) two "logical processors". Either of the two "logical processors" is capable of using all of the resources of the "physical core", though most of the time processes don't use all the resources of the "physical core", and it is usually possible to get increased throughput by using both "logical processors". Sometimes performance goes down when using both "logical processors" on a physical core. This is usually due to increased cache misses, but sometimes due to more complex and/or subtle details of the implementation. Most of routines in MKL provide better performance using one "logical processor" per "physical core", so MKL selects the number of threads to use and places them accordingly.

The problem is defining "CPU utilization" by the number of "logical processors" in use. Using one "logical processor" per "physical core" could be considered to be 100% "CPU utilization", but it is typically reported to be 50% "CPU utilization". This confuses a lot of people. There is no unambiguous "right answer" -- especially since you can reach "50% CPU utilization" either by using one "logical processor" on each "physical core" or by using two "logical processors" on one half of the "physical cores". The latter is not usually what you want to do, but with a single metric for "CPU utilization" it is not possible to tell the difference.

One possible alternative would be to provide two numbers -- one that shows how many "physical cores" have active processes and a second that tells the average number of active "logical processors" on each of the active "physical cores". For small core counts, graphical displays are convenient, but these become a bit unwieldy when dealing with (for example), the 68 physical cores and 4 logical processors per core on the Xeon Phi 7250.

View solution in original post

Zhen_Z_Intel · ‎06-27-2017

Hi,

If you do not set for MKL threads, for those function support multi-threading, MKL will choose the num of thread for best performance (flops), not directly relevant to CPU utilization. If you would like to increase CPU utilization, you probably could try to set max num of threads by OpenMP. However, it doesn't mean the performance must be better. We do not recommend to use hyper-threading, cause the performance/utilization of MKL would be worse.

Best regards,
Fiona

McCalpinJohn · ‎06-27-2017

There is a difficulty with the use of the term "CPU" in this context.

When HyperThreading is enabled, each "physical core" is configured to support (typically) two "logical processors". Either of the two "logical processors" is capable of using all of the resources of the "physical core", though most of the time processes don't use all the resources of the "physical core", and it is usually possible to get increased throughput by using both "logical processors". Sometimes performance goes down when using both "logical processors" on a physical core. This is usually due to increased cache misses, but sometimes due to more complex and/or subtle details of the implementation. Most of routines in MKL provide better performance using one "logical processor" per "physical core", so MKL selects the number of threads to use and places them accordingly.

The problem is defining "CPU utilization" by the number of "logical processors" in use. Using one "logical processor" per "physical core" could be considered to be 100% "CPU utilization", but it is typically reported to be 50% "CPU utilization". This confuses a lot of people. There is no unambiguous "right answer" -- especially since you can reach "50% CPU utilization" either by using one "logical processor" on each "physical core" or by using two "logical processors" on one half of the "physical cores". The latter is not usually what you want to do, but with a single metric for "CPU utilization" it is not possible to tell the difference.

One possible alternative would be to provide two numbers -- one that shows how many "physical cores" have active processes and a second that tells the average number of active "logical processors" on each of the active "physical cores". For small core counts, graphical displays are convenient, but these become a bit unwieldy when dealing with (for example), the 68 physical cores and 4 logical processors per core on the Xeon Phi 7250.

Ying_H_Intel · ‎06-28-2017

Hi Bo, y,

Do you have the CPU hyper threading switched On, right?

the discussion: Intel MKL threading behavior on Hyper-Threading systems may help you to understand the behaviors of MKL

https://software.intel.com/en-us/mkl-windows-developer-guide-improving-performance-with-threading

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/294954.

The key points:

By default, Intel MKL uses the number of OpenMP threads equal to the number of physical cores on the system

Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled.

Best Regards,

Ying

bo_y_ · ‎07-01-2017

thank u all! now i understand intel's physical/logical cores and the cpu "usage" much better

HyperThreading and CPU usage