Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- AVX512 is slower than AVX2 when running CGESDD/SGESDD on Xeon Gold 6130

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Ding__Jian

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-21-2020
03:41 PM

213 Views

AVX512 is slower than AVX2 when running CGESDD/SGESDD on Xeon Gold 6130

I am evaluating the performance of Intel MKL on Xeon Gold 6130 processors, which have two AVX512 FMA units. I see performance improvement with AVX512 for matrix multiplication and FFT. However, for matrix inversion, the performance of AVX512 is worse than AVX2. I tested complex float (CGESDD) and float (SGESDD).

My question is: what is the reason that cause the slowdown of AVX512 for CGESDD/SGESDD? Is it because these functions are not optimized for AVX512 or something I did wrong?

Below is the output when MKL_VERBOSE is enabled

MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.10GHz lp64 sequential

I set MKL_ENABLE_INSTRUCTIONS to be AVX2 or AVX512 to compare their performance and set the library to be sequential.

-----------------------------------------------------------------

For SGESDD/CGESDD, AVX2 outperforms AVX512 in most cases

64x64 matrix:

- SGESDD: AVX2: 536.91us AVX512: 703.39us
- CGESDD: AVX2: 766.52us AVX512: 861.09us

1000x1000 matrix:

- SGESDD: AVX2: 305.60ms AVX512: 360.65ms
- CGESDD: AVX2: 744.38ms AVX512: 696.96ms (AVX512 is slightly better)

-----------------------------------------------------------------

For SGEMM/CGEMM, AVX512 outperforms AVX2

64x64 matrix:

SGEMM: AVX2: 8.58us AVX512: 7.08us

CGEMM: AVX2: 43.55us AVX512: 23.06us

1000x1000 matrix:

SGEMM: AVX2: 27.98ms AVX512: 18.40ms

CGEMM: AVX2: 109.17ms AVX512: 69.49ms

-----------------------------------------------------------------

Link Copied

5 Replies

Ruqiu_C_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-25-2020
12:24 AM

213 Views

Hello Ding, Jian,

Thank you for raising the topic! We will investigate the problem and back to here once there is any update.

One quick question is based on your test, the performance issue is exist in MKL 2020.0 or other version also has the same problem?

Best Regards，

Ruqiu

Ding__Jian

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-25-2020
12:37 AM

213 Views

Thanks Ruqiu. I have tested MKL 2019.0 as well and it has the same problem.

Best,

Jian

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-04-2020
09:00 PM

213 Views

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-03-2020
11:07 PM

213 Views

Jian,

The time spent in xGESDD highly depends on the distribution of singular values. Could you recheck the results by using exactly the same input matrix for calling ?GESDD on AVX2 and AVX512?

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-21-2020
09:54 PM

213 Views

Jian, have you tried to check the problem with the same inputs?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.