DGEQRF and DPOTRF performance with lda=m and lda=m+64

codecircuit · ‎10-24-2023

I am benchmarking DGEQRF and DPOTRF from MKL 2023.2.0 on a two socket Intel(R) Xeon(R) Platinum 8480CL system with hyperthreading enabled. As a prefix for my benchmark executable I use `KMP_AFFINITY=granularity=fine,compact,1,0 MKL_NUM_THREADS=56 numactl -N0 -m0 ` and as expected during benchmark execution I see in htop that only the first 56 cores are busy.

Now, I measure:

DPOTRF(uplo='L', n=32768,lda=32768) -> 5.55s

DPOTRF(uplo='L', n=32768,lda=32832) -> 4.74s

DGEQRF(m=32768,n=32768,lda=32768) -> 31.18s

DGEQRF(m=32768,n=32768,lda=32832) -> 18.55s

1. Is this performance expected? It would be very helpful if you could share the maximum performance of these routines from your benchmarks on the Intel(R) Xeon(R) Platinum 8480CL.

2. Why is there such a big performance improvement when I set `lda=m+64`?

3. What is the expected maximum parallel and sequential DGEMM GFLOPS on this chip?

IntelSupport · ‎10-27-2023

Hi,

Thanks for posting in Intel Communities.

Could you please provide us a sample reproducer and OS details, so that we can replicate and investigate more at our end.

Regards,

Jilani

JilaniS_Intel · ‎11-03-2023

Hi,

We have not heard back from you. Could you please provide us with an update?

Regards,

Jilani

Gennady_F_Intel · ‎11-09-2023

Yes, this performance is expected.
32768 is a “bad” leading dimension (it’s a large power-of-two), hence the poor performance. Please check the notes about padding the matrices in this documentation about offloading computations, especially Rule 2 below: Rule 2: For best performance, leading dimensions should not be a multiple of a large power of 2 (e.g. 4096 bytes). Increasing the leading dimension slightly (e.g. from 4096 bytes to 4096+64 bytes) can improve performance in some cases.
Please check the official MKL product page - https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html#gs.qw2c2p to see sgemm performance results on the same CPU. The performance results for the double precision would be ~ 2x smaller.

--Gennady

JilaniS_Intel · ‎11-16-2023

Hi,

A gentle reminder:

We have not heard back from you. Could you please provide us with an update?

Regards,

Jilani

JilaniS_Intel · ‎11-22-2023

Hi,

A gentle reminder:

We have not heard back from you. Could you please provide us with an update?

Regards,

Jilani