Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7015 Discussions

Performance issue of dgemm on Gold 6230R CPU

lianchen
Beginner
744 Views

Hi, I have some questions about the performance of dgemm on Intel(R) Xeon(R) Gold 6230R CPU.  On my machine, the performance of DGEMM seems weird. When the number of threads is large, the performance curve will rise and then fall, which is very difficult to explain. Below are some details. I really hope to get your help, thanks.


My Machine
CPU(s): 104
On-line CPU(s) list: 0-103
Thread(s) per core: 2
Core(s) per socket: 26
Socket(s): 2
NUMA node(s): 2
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-25,52-77
NUMA node1 CPU(s): 26-51,78-103

Core topology: two sockets, 26 cores per socket, 52 cores total
SMT status: enabled, but not utilized
Max clock rate: 2.0GHz(single-core and multicore)
Peak performance:
--single-core: 64 GFLOPS(double-precision)
--multicore: 64 GFLOPS/core (double-precision)
I have fixed the frequency of the CPU at 2.0GHz by commands: sudo cpupower -c all frequency-set -u 2.0GHz, sudo cpupower -c all frequency-set -d 2.0GHz

The dgemm performance on my machine
Multithreaded (8 core) execution

export GOMP_CPU_AFFINITY="0-7:1" MKL_NUM_THREADS=8

Multithreaded (13 core) execution
export GOMP_CPU_AFFINITY="0-12:1" MKL_NUM_THREADS=13

 

Multithreaded (26 core) execution

export GOMP_CPU_AFFINITY="0-25:1" MKL_NUM_THREADS=26

Multithreaded (52 core) execution

export GOMP_CPU_AFFINITY="0-51:1" MKL_NUM_THREADS=52

core_1.jpg

core_8.jpg

core_13.jpg

 

core_26.jpg

core_52.jpg

 

0 Kudos
3 Replies
VidyalathaB_Intel
Moderator
708 Views

Hi,

 

Thanks for reaching out to us.

>>When the number of threads is large, the performance curve will rise and then fall

Could you please provide us with the MKL version being used in this case?

And by default, MKL utilizes all the available physical cores if you run it in parallel mode.

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading.html

For Intel compilers the option is -qmkl=parallel

Here are some more details about Managing Multi-core performance 

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading/managing-multi-core-performance.html

You can also make use of the suggestions recommended by the Link Line Advisor for compiling and linking options depending on the environment you are working with.

 

Additionally, could you please provide us with the sample reproducer and the command you are using for compiling and executing so that we could test it from our end as well?

 

Regards,

Vidya.

 

0 Kudos
VidyalathaB_Intel
Moderator
684 Views

Hi,


Reminder:

Could you please provide us with an update regarding your issue? Please provide us with the above-mentioned details if your issue still persists.


Regards,

Vidya.


0 Kudos
VidyalathaB_Intel
Moderator
665 Views

As we haven't heard back from you, we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.


Regards,

Vidya.


0 Kudos
Reply