Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7130 Discussions

bad performance with cblas_dger using AVX2 on i7 12th gen

Manuel7
Beginner
1,732 Views

We are currently evaluating the usage of Intel MKL to improve the performance of our application. However we found out that on computers with a Intel i7 12th gen CPU, the performance significantly decreased when using Intel MKL. Profiling the application showed that two MKL BLAS function were taking up most of the CPU time, namely

  • [MKL BLAS]@avx2_xdaxpy
  • [MKL BLAS]@avx2_dger

We are able to reproduce the issue with the attached modified mkl-sample programm.

With said programm we can see that the mkl function cblas_dger runs considerably slower on i7-12th gen CPU when using the AVX2 instruction-set with a single thread compared to using the AVX instruction-set with a single thread.

Running the same code on a i7 10th gen showed increased performance when using the AVX2 instruction set.

 

See the attached screenshot for a timing of 1'000 calls to said function on a i7-12700K.

screen_behaviour.PNG

 

used oneMKL version: oneMKL 2023.0 Product build 20221128

Labels (1)
0 Kudos
4 Replies
ShanmukhS_Intel
Moderator
1,655 Views

Hi Manuel,


Thanks for posting on Intel Communities.


Thanks for sharing the feedback. We have informed the development team regarding the same. We will get back to you soon with an update.


Best Regards,

Shanmukh.SS




0 Kudos
ShanmukhS_Intel
Moderator
1,587 Views

Hi Manuel,

 

We would like to inform you that the performance difference comes from the core architecture. The recent desktop uses Cove cores, but it has a larger cache and more memory channels than old AVX2 desktop cores. This resulted in behavior differences and simultaneous access against memory performs better on recent desktop parts. This is measured on ICX. "test" is Fortran code based and behavior is similar to AVX. Please find the performance charts attached.

 

Multiple memory access will cause performance degradations on AVX2-based Xeon. MKL doesn't have a mechanism to distinguish old and new AVX2-based architectures. So performance improvement could not be made. 

 

Best Regards,

Shanmukh.SS

 

0 Kudos
ShanmukhS_Intel
Moderator
1,524 Views

Hi Manuel,

 

A gentle reminder:

Has the information provided helped? Could you please let us know if we could close this case at our end?

 

Best Regards,

Shanmukh.SS

 

0 Kudos
ShanmukhS_Intel
Moderator
1,499 Views

Hi Manuel,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Best Regards,

Shanmukh.SS


0 Kudos
Reply