Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- AVX512 slower than AVX2? What I am doing wrong?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

tomasz_j_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-17-2018
01:03 PM

318 Views

AVX512 slower than AVX2? What I am doing wrong?

Hello All,

I was so excited to test new the new Intel Xeon Silver 4114 CPU just to find out that with AVX512 enabled the performance of the matrix multiplication is the same as with legacy SSE4. If I restrict the MKL library to use AVX2 only, then the speed of the computation is twice as fast. What I am doing wrong here? The library seem to respond OK to the following call (here in FORTRAN):

stat=mkl_cbwr_set (MKL_CBWR_AVX512)

with stat == 0. But the computation slows down by a factor of two compared to the speed I get after setting the environment variable MKL_ENABLE_INSTRUCTIONS to AVX2. Is possible that this is what I should get for this particular CPU? The MKL version is 2018.2.199.

Thanks!

Tomasz

Link Copied

6 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-17-2018
07:46 PM

318 Views

Tomasz, what input size do you observe such gap? We will check. Is that Lin* OS?

tomasz_j_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-18-2018
05:26 AM

318 Views

Yes, it is Linux, kernel 4.9.0-0, Debian OS.

I am testing on matrices that are 3000x3000 in dimension, double precision numbers. I did some research and I suspect that this is what I should get. Intel website says that Silver 4114 has one FPU per core which is capable of AVX512. If this is true, then the increase of efficiency coming from AVX512 is offset by less FPUs available on the chip (I suspect there are 2 FPUs capable of AVX2). The numbers I get for 3000x3000 matrices are as follows:

24.668 Gflop/s AVX512

40.540 Gflop/s AVX2

21.739 Gflop/s AVX

11.668 Gflop/s SSE4_2

If my suspicion about the number of FPUs is correct then MKL should fall back to AVX2 on Xeon Silver to get the max throughput.

I hope my guess is not correct, otherwise what would be goal of retrofitting the CPU with crippled AVX512 capability?

Thanks!

Tomasz

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-19-2018
09:23 PM

318 Views

tomasz_j_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-20-2018
05:37 AM

318 Views

Here is the result:

MKL_VERBOSE Intel(R) MKL 2018.0 Update 2 Product build 20180127 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 sequential

MKL_VERBOSE DGEMM(n,n,3000,3000,3000,0x7ffe29d0bda8,0x7f8b5bfe1620,3000,0x7f8b6048b820,3000,0x7ffe29d0bdb0,0x7f8b64935a20,3000) 2.14s CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1

tomasz_j_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2018
01:29 PM

318 Views

So, can I conclude that this is what I should get from this processor, and AVX512 is, in fact, slower than legacy AVX2 on Xeon Silver? I see that the the "Gold" series has two FPUs per core.

I still hope that the answer is negative and something can be done.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-12-2018
09:06 AM

318 Views

PMU unit of low-end "Silver" processors will probably more eagerly lower the reference clock of cores which execute AVX512 code.

You should invest in Gold SKU or maybe in HEDT Skylake-X processors.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.