Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7032 Discussions

DGEQRF and DPOTRF performance with lda=m and lda=m+64

codecircuit
Beginner
919 Views

I am benchmarking DGEQRF and DPOTRF from MKL 2023.2.0 on a two socket Intel(R) Xeon(R) Platinum 8480CL system with hyperthreading enabled. As a prefix for my benchmark executable I use `KMP_AFFINITY=granularity=fine,compact,1,0 MKL_NUM_THREADS=56 numactl -N0 -m0 ` and as expected during benchmark execution I see in htop that only the first 56 cores are busy.

Now, I measure:

 

DPOTRF(uplo='L', n=32768,lda=32768) -> 5.55s

DPOTRF(uplo='L', n=32768,lda=32832) ->  4.74s

 

DGEQRF(m=32768,n=32768,lda=32768) -> 31.18s

DGEQRF(m=32768,n=32768,lda=32832) -> 18.55s

 

1. Is this performance expected? It would be very helpful if you could share the maximum performance of these routines from your benchmarks on the Intel(R) Xeon(R) Platinum 8480CL. 

2. Why is there such a big performance improvement when I set `lda=m+64`?

3. What is the expected maximum parallel and sequential DGEMM GFLOPS on this chip?

0 Kudos
5 Replies
IntelSupport
Community Manager
854 Views

Hi,



Thanks for posting in Intel Communities.


Could you please provide us a sample reproducer and OS details, so that we can replicate and investigate more at our end.



Regards,

Jilani


0 Kudos
JilaniS_Intel
Employee
795 Views

Hi,


We have not heard back from you. Could you please provide us with an update?


Regards,

Jilani


0 Kudos
Gennady_F_Intel
Moderator
739 Views
  1. Yes, this performance is expected.
  2. 32768 is a “bad” leading dimension (it’s a large power-of-two), hence the poor performance. Please check the notes about padding the matrices in this documentation about offloading computations, especially Rule 2 below:  Rule 2: For best performance, leading dimensions should not be a multiple of a large power of 2 (e.g. 4096 bytes). Increasing the leading dimension slightly (e.g. from 4096 bytes to 4096+64 bytes) can improve performance in some cases.
  3. Please check the official MKL product page - https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html#gs.qw2c2p  to see sgemm performance results on the same CPU. The performance results for the double precision would be ~ 2x smaller.

--Gennady

0 Kudos
JilaniS_Intel
Employee
695 Views

Hi,



A gentle reminder:

We have not heard back from you. Could you please provide us with an update?



Regards,

Jilani


0 Kudos
JilaniS_Intel
Employee
630 Views

Hi,


A gentle reminder:

We have not heard back from you. Could you please provide us with an update?


Regards,

Jilani


0 Kudos
Reply