Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

SVD weird perfomance issues

Guillaume_A_
New Contributor I
666 Views

Hi,

I am facing performance issues with the function dgesvd when running in 64bit with AVX2 (MKL_CBWR=AVX2)

For some sizes of matrix the SVD duration is 25 times longer in 64bit than in 32bit !

You may reproduce with the test in attachment. On my side I get thoses durations for 1 svd on an mXn matrix:

  • 101x63 : 32bit = 2ms, 64bit = 1.4ms; 
  • 101x64 : 32bit = 2ms, 64bit = 20ms;
  • 102x64 : 32bit = 2ms, 64bit = 1.4ms;
  • 103x103 : 32bit = 4ms, 64bit = 100ms;

There is no problem with MKL_CBWR=AVX.

Could you please have a look ?

My configuration:

  • Composer 2019 update 4 (same behaviour with 2018 up4)
  • BasePlatformToolSet : vc12
  • Win 10 Enterprise 64bit
  • CPU: i7-6820HQ

Regards,

Guillaume A.

 

0 Kudos
1 Solution
Guillaume_A_
New Contributor I
666 Views

Hi,

The issue has been fixed in the latest MKL release (2020)  :)

Thanks,

Guillaume A.

View solution in original post

0 Kudos
7 Replies
Gennady_F_Intel
Moderator
666 Views

is that threaded mode? could you add verbose mode output for 103x103 case?

0 Kudos
Guillaume_A_
New Contributor I
666 Views

Hello Gennady,

Indeed I forgot to precise : I am using the sequential mode.

Here are the ouputs with MKL_VERBOSE=1 for one svd: 

64bit: 

MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.70GHz cdecl sequential
MKL_VERBOSE DGESVD(A,A,103,103,0000000000FE5E40,103,0000000000FFAA40,0000000000FFAE00,103,000000000100FA00,103,0000000000CFF520,-1,0) 147.89us CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE DGESVD(A,A,103,103,0000000001055300,103,0000000000FFAA40,0000000001069F80,103,000000000107EB80,103,000000000104E200,3605,0) 112.26ms CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1

32bit:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for 32-bit Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.70GHz sequential
MKL_VERBOSE DGESVD(A,A,103,103,01045D40,103,0105A900,0105ACC0,103,0106F8C0,103,004FF604,-1,0) 116.49us CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE DGESVD(A,A,103,103,010B5180,103,0105A900,010C9D80,103,010DE980,103,010AE000,3605,0) 4.64ms CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1

Regards,

Guillaume A.

0 Kudos
Gennady_F_Intel
Moderator
666 Views

yes, I see ~ the same performance problem when linking with mkl_sequential lib.  The gap is about 15 times for this specific problem sizes. 

32 bit :  [ PERF --> ] 0.004 clock for 1 iteration   

64 bit : [ PERF --> ] 0.062 clock for 1 iteration  

the Ratio is ~ 15 times

but there is no problem when linking with the threaded version of MKL ( 2019.4)

In the case, if the optimization for this specific problem sizes and ia32 version of MKL is important to you, could you please submit the request to the intel online service center to further communication internally.

0 Kudos
Gennady_F_Intel
Moderator
666 Views

intel online service center - https://supporttickets.intel.com/?lang=en-US

0 Kudos
Guillaume_A_
New Contributor I
666 Views

Ok, thanks. Here is the ticket : 04232883

Please note that this behaviour may be observed for many other sizes of matrices: 160x160, 200x200, 302x302, ...

I add in attachment an Excel file containing the comparison 32 vs 64 bit of the svd duration for matrices extracted from real use-cases of my production.

My point of view: There is a performance issue in 64bit and AVX2 for the svd. We do not need any problem sized specials optimizations. We just need to have as good performances in 64bit as in 32bit, never mind the size of the matrix ;)

Regards,

Guillaume A.

0 Kudos
Guillaume_A_
New Contributor I
667 Views

Hi,

The issue has been fixed in the latest MKL release (2020)  :)

Thanks,

Guillaume A.

0 Kudos
Gennady_F_Intel
Moderator
666 Views

thanks Guillaume!

0 Kudos
Reply