topic Re: Performance issue of dgemm on Gold 6230R CPU in Intel® oneAPI Math Kernel Library

Performance issue of dgemm on Gold 6230R CPU

lianchen — Sat, 11 Jun 2022 02:05:44 GMT

Hi, I have some questions about the performance of dgemm on Intel(R) Xeon(R) Gold 6230R CPU. On my machine, the performance of DGEMM seems weird. When the number of threads is large, the performance curve will rise and then fall, which is very difficult to explain. Below are some details. I really hope to get your help, thanks.

My Machine
CPU(s): 104
On-line CPU(s) list: 0-103
Thread(s) per core: 2
Core(s) per socket: 26
Socket(s): 2
NUMA node(s): 2
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-25,52-77
NUMA node1 CPU(s): 26-51,78-103

Core topology: two sockets, 26 cores per socket, 52 cores total
SMT status: enabled, but not utilized
Max clock rate: 2.0GHz(single-core and multicore)
Peak performance:
--single-core: 64 GFLOPS(double-precision)
--multicore: 64 GFLOPS/core (double-precision)
I have fixed the frequency of the CPU at 2.0GHz by commands: sudo cpupower -c all frequency-set -u 2.0GHz, sudo cpupower -c all frequency-set -d 2.0GHz

The dgemm performance on my machine
Multithreaded (8 core) execution

export GOMP_CPU_AFFINITY="0-7:1" MKL_NUM_THREADS=8

Multithreaded (13 core) execution
export GOMP_CPU_AFFINITY="0-12:1" MKL_NUM_THREADS=13

Multithreaded (26 core) execution

export GOMP_CPU_AFFINITY="0-25:1" MKL_NUM_THREADS=26

Multithreaded (52 core) execution

export GOMP_CPU_AFFINITY="0-51:1" MKL_NUM_THREADS=52

Re: Performance issue of dgemm on Gold 6230R CPU

VidyalathaB_Intel — Mon, 13 Jun 2022 11:29:52 GMT

Hi,

Thanks for reaching out to us.

>>When the number of threads is large, the performance curve will rise and then fall

Could you please provide us with the MKL version being used in this case?

And by default, MKL utilizes all the available physical cores if you run it in parallel mode.

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading.html

For Intel compilers the option is -qmkl=parallel

Here are some more details about Managing Multi-core performance

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading/managing-multi-core-performance.html

You can also make use of the suggestions recommended by the Link Line Advisor for compiling and linking options depending on the environment you are working with.

Additionally, could you please provide us with the sample reproducer and the command you are using for compiling and executing so that we could test it from our end as well?

Regards,

Vidya.

Re:Performance issue of dgemm on Gold 6230R CPU

VidyalathaB_Intel — Mon, 20 Jun 2022 04:54:54 GMT

Hi,

Reminder:

Could you please provide us with an update regarding your issue? Please provide us with the above-mentioned details if your issue still persists.

Regards,

Vidya.

Re:Performance issue of dgemm on Gold 6230R CPU

VidyalathaB_Intel — Mon, 27 Jun 2022 04:13:24 GMT

As we haven't heard back from you, we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.

Regards,

Vidya.