Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- SVD weird perfomance issues

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Guillaume_A_

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-12-2019
09:03 AM

173 Views

Hi,

I am facing performance issues with the function **dgesvd** when running in **64bit** with **AVX2** (MKL_CBWR=AVX2)

For some sizes of matrix the SVD duration is **25 times longer** in 64bit than in 32bit !

You may reproduce with the test in attachment. On my side I get thoses durations for 1 svd on an mXn matrix:

- 101x63 : 32bit = 2ms, 64bit = 1.4ms;
- 10
**1**x64 : 32bit = 2ms, 64bit =**20**ms; - 10
**2**x64 : 32bit = 2ms, 64bit = 1.4ms; - 103x103 : 32bit = 4ms, 64bit =
**100**ms;

There is no problem with MKL_CBWR=AVX.

Could you please have a look ?

My configuration:

- Composer 2019 update 4 (same behaviour with 2018 up4)
- BasePlatformToolSet : vc12
- Win 10 Enterprise 64bit
- CPU: i7-6820HQ

Regards,

Guillaume A.

Link Copied

Accepted Solutions

Guillaume_A_

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-17-2019
08:33 AM

173 Views

Hi,

The issue has been fixed in the latest MKL release (2020) :)

Thanks,

Guillaume A.

7 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-12-2019
09:19 AM

173 Views

is that threaded mode? could you add verbose mode output for 103x103 case?

Guillaume_A_

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-13-2019
12:56 AM

173 Views

Hello Gennady,

Indeed I forgot to precise : I am using the sequential mode.

Here are the ouputs with MKL_VERBOSE=1 for one svd:

64bit:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.70GHz cdecl sequential MKL_VERBOSE DGESVD(A,A,103,103,0000000000FE5E40,103,0000000000FFAA40,0000000000FFAE00,103,000000000100FA00,103,0000000000CFF520,-1,0) 147.89us CNR:AVX2 Dyn:1 FastMM:1 TID:0 NThr:1 MKL_VERBOSE DGESVD(A,A,103,103,0000000001055300,103,0000000000FFAA40,0000000001069F80,103,000000000107EB80,103,000000000104E200,3605,0) 112.26ms CNR:AVX2 Dyn:1 FastMM:1 TID:0 NThr:1

32bit:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for 32-bit Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.70GHz sequential MKL_VERBOSE DGESVD(A,A,103,103,01045D40,103,0105A900,0105ACC0,103,0106F8C0,103,004FF604,-1,0) 116.49us CNR:AVX2 Dyn:1 FastMM:1 TID:0 NThr:1 MKL_VERBOSE DGESVD(A,A,103,103,010B5180,103,0105A900,010C9D80,103,010DE980,103,010AE000,3605,0) 4.64ms CNR:AVX2 Dyn:1 FastMM:1 TID:0 NThr:1

Regards,

Guillaume A.

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-16-2019
09:18 PM

173 Views

yes, I see ~ the same performance problem when linking with mkl_sequential lib. The gap is about 15 times for this specific problem sizes.

32 bit : [ PERF --> ] 0.004 clock for 1 iteration

64 bit : [ PERF --> ] 0.062 clock for 1 iteration

the Ratio is ~ 15 times

but there is no problem when linking with the threaded version of MKL ( 2019.4)

In the case, if the optimization for this specific problem sizes and ia32 version of MKL is important to you, could you please submit the request to the intel online service center to further communication internally.

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-16-2019
09:20 PM

173 Views

intel online service center - https://supporttickets.intel.com/?lang=en-US

Guillaume_A_

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-17-2019
05:29 AM

173 Views

Ok, thanks. Here is the ticket : 04232883

Please note that this behaviour may be observed for many other sizes of matrices: 160x160, 200x200, 302x302, ...

I add in attachment an Excel file containing the comparison 32 vs 64 bit of the svd duration for matrices extracted from real use-cases of my production.

My point of view: There is a performance issue in **64bit and AVX2 **for the svd. We do not need any problem sized specials optimizations. We just need to have as good performances in 64bit as in 32bit, never mind the size of the matrix ;)

Regards,

Guillaume A.

Guillaume_A_

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-17-2019
08:33 AM

174 Views

Hi,

The issue has been fixed in the latest MKL release (2020) :)

Thanks,

Guillaume A.

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-17-2019
07:01 PM

173 Views

thanks Guillaume!

For more complete information about compiler optimizations, see our Optimization Notice.