Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Question about LINPACK benchmark

NKnmt
Beginner
771 Views

Hello,


Our customer is using the LINPACK benchmark to measure GFLOPS value below.

https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2024-2/intel-distribution-for-linpack-benchmark.html


They're using the Core i7-1187GRE and it has 4 cores and 8 threads.

However, they could not run the all 8 threads during LINPACK benchmark measurement.

They have tried changing the thread settings for LINPACK runs from 1 to 8 with the attachment results.


Is there any way to get all threads to work when LINPACK is running?


Regards,

Nobuharu

0 Kudos
1 Solution
Mahan
Moderator
91 Views

Hi Nobuharu,


I understand if the customer needs more time to do the tests, but as I pointed out, it is better not to use the logical threads but rather to set OMP_NUM_THREADS to the number of physical cores.

I am closing the current ticket, please feel free to open a new one if you face any more issues.


View solution in original post

0 Kudos
14 Replies
Gennady_F_Intel
Moderator
690 Views

Nobuharu,

by default this benchmark will take all physical threads available. You could manage the number of threads by explicitly  number of OpenMP threads like as following: MKL_NUM_THREADS=4 ./xlinpack_xeon64 lininput_xeon64

you could track how many openmp threads linpack will use by enabling MKL_VERBOSE mode: 

MKL_VERBOSE=1 MKL_NUM_THREADS=4 ./xlinpack_xeon64 lininput_xeon64

--Gennady

0 Kudos
NKnmt
Beginner
648 Views

Gannady,

 

Thank you for your feedback.

 

My customer tried "MKL_VERBOSE=1 MKL_NUM_THREADS=4 ./xlinpack_xeon64 lininput_xeon64".
However, the CPU workload displayed in Task Manager remained unchanged with only 5 threads applied.
The Linpack report showed a record with "4" threads.

 

Originally, the customer used the "OMP_NUM_THREADS=8" command.
Even with 8 threads explicitly set, the CPU load itself displayed in Task Manager was only 5 threads.
However, on the Linpack report, the number of threads is recorded as "8".

 

Regards,

Nobuharu

0 Kudos
Mahan
Moderator
628 Views

Hello Nobuharu,


Please check the following

  1. Check if Hyper-Threading is enabled in BIOS: The Core i7-1187GRE has 4 physical cores but supports 8 threads with Hyper-Threading technology. Ensure that Hyper-Threading is enabled in your system's BIOS settings.
  2. Update to the latest version of the benchmark: If you're using an older version of the Intel Optimized LINPACK Benchmark, consider updating to the latest version. Newer versions may have better support for utilizing all available threads.

Finally, you can set the envs MKL_VERBOSE=1 MKL_NUM_THREADS=8 before running the benchmark


0 Kudos
NKnmt
Beginner
596 Views

Hello Mahan,


Thank you for your feedback.
Here are the answer against your check.

1. Hyper Threading was set enabled in BIOS.

2. Our customer tried the latest LINPACK, but their issue has not solved.
They're download the latest LINPACK from here.
https://www.intel.com/content/www/us/en/developer/articles/technical/onemkl-benchmarks-suite.html

Finally, you can set the envs MKL_VERBOSE=1 MKL_NUM_THREADS=8 before running the benchmark
===>
They have set up the following in the runme_xeon64.bat file, is this correct?
...
set KMP_AFFINITY=nowarnings,compact,1,0,granularity=fine
set MKL_VERBOSE=1 MKL_NUM_THREADS=8 . /xlinpack_xeon64 lininput_xeon64

 

BTW, When they run the above bat file with the zip unzipped file configuration, the command prompt disappears momentarily (Application error occurs).
If they store "libiomp5md.dll", which was used when Linpack was running on another platform in the past, in the same folder as the above bat file, the command prompt remains displayed and Linpack is running.
According to "Developer Guide for Intel® oneAPI Math Kernel Library for Windows", Distributing Your Custom Dynamic-link Library
To enable use of your custom DLL in a threaded mode, distribute libiomp5md.dll along with the custom DLL.

Do they need to customize "libiomp5md.dll" to eliminate the thread limitation?

 

Regards,

Nobuharu

0 Kudos
Mahan
Moderator
578 Views

Hi Nobuharu,


Thanks for the reply.


To answer your first question regarding ENVs, you can set them two ways, for example,

  1. Using the "set" command before running the application, like set OMP_NUM_THREADS = 8
  2. The other one is to use them directly while running the application, like OMP_NUM_THREADS = 8 ./application.exe


To answer your second question

The "libiomp5md.dll" file is a pre-built library provided by Intel that enables threaded execution when distributed alongside your custom DLL.

In your case, when you store the "libiomp5md.dll" file in the same folder as the batch file, it allows Linpack to run properly, likely by enabling threaded execution. The application error you encountered earlier was possibly due to the absence of this library file, which is required for threaded execution.

So, you do not need to customize the "libiomp5md.dll" file itself. You just need to ensure that it is present in the same directory as your Linpack executable and other required files for proper threaded execution.


0 Kudos
Mahan
Moderator
456 Views

Hi Nobuharu,


Please let me know if there is any update.


0 Kudos
NKnmt
Beginner
445 Views

Hi  Mahan,

 

Our customer using the set OMP_NUM_THREADS=8, but even then only up to 5 threads reached 100%. (The others were almost 0%)

They tried running the tool with the latest (w_onemklbench_p_2024.2.0_525.zip), but only 4 threads max ran at 100%.
The remaining 4 threads seemed to be fluctuating (sometimes going to 100% instantaneously) with a average load of about 75%.
They set up and ran the test with 4 threads (set OMP_NUM_THREADS=4), but the load situation remained almost the same.

Please refer to the attachment file below.
TDP15_8threads_linpack.png
TDP15_4threads_linpack.png

 

Regards,

Nobuharu

0 Kudos
Mahan
Moderator
347 Views

Hello Nobuharu,


Thanks for your reply.


I have also perfomed a similar test at my end and it seems that this issue is there. I have used 12 threads of a 16 thread machine to run the benchmarks. But, I saw only 8 of them are in 100% use rest 4 are mostly 50-75%.

I think it might be an issue with how Windows OS is alotting threads to an application, this will require more investiogation.

Please allow me sometime to get back to you.


Thank you for your patience.


0 Kudos
Mahan
Moderator
320 Views

Hi Nobuharu,


On further investigation at our end, it has become evident that the env MKL_DYNAMIC is set to "True" by default and following the developer reference(https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2024-1/mkl-dynamic.html), it seems that  

“If the requested number of threads exceeds the number of physical cores (perhaps because of using the Intel® Hyper-Threading Technology), Intel® oneAPI Math Kernel Library (oneMKL) scales down the number of OpenMP threads to the number of physical cores.”

So, you could try setting MKL_DYNAMIC to "false" to utilize all the available logical threads, but in our experience using more threads than the available physical cores does not necessarily improve oneMKL's performance.


0 Kudos
NKnmt
Beginner
275 Views

Hi Mahan,

 

Thank you for your feedback.
Our customer will try setting MKL_DYNAMIC to "false" in next week.
Could you wait for their next update?

 

With regards to the following your comment, we have question.
What factor will limit GFLOPS performance in this case?
What should we do to achieve maximum GFLOPS? Is it effective for improve performance such as increasing DRAM capacity ant enabling Turbo Boost?
>>>>So, you could try setting MKL_DYNAMIC to "false" to utilize all the available logical threads, but in our experience using more threads than the available physical cores does not necessarily improve oneMKL's performance.

 

Regards,

Nobuharu

0 Kudos
Mahan
Moderator
259 Views

Hello Nobuharu,


Thanks for the update. Also, I can wait for a week for the update from the customer.


To answer your GFLOPS question,

  • Increasing DRAM capacity: While increasing DRAM capacity can improve overall system performance by allowing larger datasets to be processed, it may not directly translate to higher GFLOPS performance. GFLOPS performance is more dependent on the computational capabilities of the CPU, GPU, or accelerator, as well as the memory bandwidth and latency.
  • Enabling Turbo Boost: Turbo Boost is a feature in Intel CPUs that allows for dynamic overclocking of CPU cores when thermal and power conditions allow. Enabling Turbo Boost can potentially improve GFLOPS performance for workloads that can effectively utilize the additional CPU frequency boost. However, the impact will depend on the specific workload characteristics and the CPU's thermal and power constraints.


But as I have experienced, at least with MKL functions, it is better to set the OMP_NUM_THREADS= <number of physical cores> to get the best performance and scalability as opposed to populating all the available threads. This is due to the fact that the threads of a physical core share the resources.



0 Kudos
Mahan
Moderator
188 Views

Please let me know if there is any update.


0 Kudos
Mahan
Moderator
142 Views

Please let me know if there is any update from the customer.


0 Kudos
Mahan
Moderator
92 Views

Hi Nobuharu,


I understand if the customer needs more time to do the tests, but as I pointed out, it is better not to use the logical threads but rather to set OMP_NUM_THREADS to the number of physical cores.

I am closing the current ticket, please feel free to open a new one if you face any more issues.


0 Kudos
Reply