- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have some questions about the performance of dgemm on Intel(R) Xeon(R) Gold 6230R CPU. On my machine, the performance of DGEMM seems weird. When the number of threads is large, the performance curve will rise and then fall, which is very difficult to explain. Below are some details. I really hope to get your help, thanks.
My Machine
CPU(s): 104
On-line CPU(s) list: 0-103
Thread(s) per core: 2
Core(s) per socket: 26
Socket(s): 2
NUMA node(s): 2
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-25,52-77
NUMA node1 CPU(s): 26-51,78-103
Core topology: two sockets, 26 cores per socket, 52 cores total
SMT status: enabled, but not utilized
Max clock rate: 2.0GHz(single-core and multicore)
Peak performance:
--single-core: 64 GFLOPS(double-precision)
--multicore: 64 GFLOPS/core (double-precision)
I have fixed the frequency of the CPU at 2.0GHz by commands: sudo cpupower -c all frequency-set -u 2.0GHz, sudo cpupower -c all frequency-set -d 2.0GHz
The dgemm performance on my machine
Multithreaded (8 core) execution
export GOMP_CPU_AFFINITY="0-7:1" MKL_NUM_THREADS=8
Multithreaded (13 core) execution
export GOMP_CPU_AFFINITY="0-12:1" MKL_NUM_THREADS=13
Multithreaded (26 core) execution
export GOMP_CPU_AFFINITY="0-25:1" MKL_NUM_THREADS=26
Multithreaded (52 core) execution
export GOMP_CPU_AFFINITY="0-51:1" MKL_NUM_THREADS=52
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
>>When the number of threads is large, the performance curve will rise and then fall
Could you please provide us with the MKL version being used in this case?
And by default, MKL utilizes all the available physical cores if you run it in parallel mode.
For Intel compilers the option is -qmkl=parallel
Here are some more details about Managing Multi-core performance
You can also make use of the suggestions recommended by the Link Line Advisor for compiling and linking options depending on the environment you are working with.
Additionally, could you please provide us with the sample reproducer and the command you are using for compiling and executing so that we could test it from our end as well?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Reminder:
Could you please provide us with an update regarding your issue? Please provide us with the above-mentioned details if your issue still persists.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As we haven't heard back from you, we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.
Regards,
Vidya.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page