Solved: SVD weird perfomance issues

Guillaume_A_ · ‎06-12-2019

Hi,

I am facing performance issues with the function dgesvd when running in 64bit with AVX2 (MKL_CBWR=AVX2)

For some sizes of matrix the SVD duration is 25 times longer in 64bit than in 32bit !

You may reproduce with the test in attachment. On my side I get thoses durations for 1 svd on an mXn matrix:

101x63 : 32bit = 2ms, 64bit = 1.4ms;
101x64 : 32bit = 2ms, 64bit = 20ms;
102x64 : 32bit = 2ms, 64bit = 1.4ms;
103x103 : 32bit = 4ms, 64bit = 100ms;

There is no problem with MKL_CBWR=AVX.

Could you please have a look ?

My configuration:

Composer 2019 update 4 (same behaviour with 2018 up4)
BasePlatformToolSet : vc12
Win 10 Enterprise 64bit
CPU: i7-6820HQ

Regards,

Guillaume A.

Guillaume_A_ · ‎12-17-2019

Hi,

The issue has been fixed in the latest MKL release (2020) :)

Thanks,

Guillaume A.

View solution in original post

Gennady_F_Intel · ‎06-12-2019

is that threaded mode? could you add verbose mode output for 103x103 case?

Guillaume_A_ · ‎06-13-2019

Hello Gennady,

Indeed I forgot to precise : I am using the sequential mode.

Here are the ouputs with MKL_VERBOSE=1 for one svd:

64bit:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.70GHz cdecl sequential
MKL_VERBOSE DGESVD(A,A,103,103,0000000000FE5E40,103,0000000000FFAA40,0000000000FFAE00,103,000000000100FA00,103,0000000000CFF520,-1,0) 147.89us CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE DGESVD(A,A,103,103,0000000001055300,103,0000000000FFAA40,0000000001069F80,103,000000000107EB80,103,000000000104E200,3605,0) 112.26ms CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1

32bit:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for 32-bit Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.70GHz sequential
MKL_VERBOSE DGESVD(A,A,103,103,01045D40,103,0105A900,0105ACC0,103,0106F8C0,103,004FF604,-1,0) 116.49us CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1
MKL_VERBOSE DGESVD(A,A,103,103,010B5180,103,0105A900,010C9D80,103,010DE980,103,010AE000,3605,0) 4.64ms CNR:AVX2 Dyn:1 FastMM:1 TID:0  NThr:1

Regards,

Guillaume A.

Gennady_F_Intel · ‎06-16-2019

yes, I see ~ the same performance problem when linking with mkl_sequential lib. The gap is about 15 times for this specific problem sizes.

32 bit : [ PERF --> ] 0.004 clock for 1 iteration

64 bit : [ PERF --> ] 0.062 clock for 1 iteration

the Ratio is ~ 15 times

but there is no problem when linking with the threaded version of MKL ( 2019.4)

In the case, if the optimization for this specific problem sizes and ia32 version of MKL is important to you, could you please submit the request to the intel online service center to further communication internally.

Gennady_F_Intel · ‎06-16-2019

intel online service center - https://supporttickets.intel.com/?lang=en-US

Guillaume_A_ · ‎06-17-2019

Ok, thanks. Here is the ticket : 04232883

Please note that this behaviour may be observed for many other sizes of matrices: 160x160, 200x200, 302x302, ...

I add in attachment an Excel file containing the comparison 32 vs 64 bit of the svd duration for matrices extracted from real use-cases of my production.

My point of view: There is a performance issue in 64bit and AVX2 for the svd. We do not need any problem sized specials optimizations. We just need to have as good performances in 64bit as in 32bit, never mind the size of the matrix ;)

Regards,

Guillaume A.

Guillaume_A_ · ‎12-17-2019

Hi,

The issue has been fixed in the latest MKL release (2020) :)

Thanks,

Guillaume A.

Gennady_F_Intel · ‎12-17-2019

thanks Guillaume!