Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6980 Discussions

GESVD is slow on small matrix compared to numerical recipes svdcmp (FORTRAN)

ZlamalJakub
New Contributor III
723 Views

I need to calculate Singula value decomposition of matrix of max size 16x8 (but very often, let say 10000 for different matrices). I use it to calculate least squares interpation

When I use GESVD routine (mkl95_LAPACK module) it is about 2x slower than SVD from numerical recipes.

I think that problem can by in GESVD which each time allocates internal arrays. But declaration of GESVD for FORTRAN95

call gesvd(a, s [,u] [,vt] [,ww] [,job] [,info])

does not contain possibility to specify working array.

FORTRAN77 version have this possibility

call dgesvd(jobu, jobvt, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, info)

Is it possible speed up FORTRAN95 interface or I should use FORTRAN77

Jakub

0 Kudos
10 Replies
ArturGuzik
Valued Contributor I
723 Views

Hi,

the F95 interface is nothing more than, well, interface to F77 routine. So there is nothing you can speed up in F95 version. I believe, you're left with 2 choices: (a) just use F77 (b) write your own (extend existing) wrapper to F95 and provide working array as argument.

A.

0 Kudos
Vladimir_Koldakov__I
New Contributor III
723 Views

Hello,Jakub,

You can find the code of the F95 dgesvd interface in the MKL release
MKLROOT/interfaces/lapack95/source/dgesvd.f90

As you can see its really nothing more than awrapper over the F77 routine.

So to get the high performance you should call theoriginal F77 routine.

Thanks,
Vladimir

0 Kudos
Gennady_F_Intel
Moderator
723 Views

Jakub, one more comment to your question:

because of the inputmatrices are very small,I wouldrecommendyou to try IPP's functionality. Please refer to the IPP image processing manual, You can find there the simialr functionality ( e.g.ippsSVD(, ,,,,,,) which performs single value decompositin of matrix). IPP functionality is better optimized for small sizes, but supports only C API.

--Gennady

0 Kudos
ZlamalJakub
New Contributor III
723 Views

I wrote my GESVD calling routine

[fortran]SUBROUTINE DGESVD_MKL(A,M,N,LDA,S,VT,LDVT,LWORK,INFO)
IMPLICIT NONE
INTEGER, PARAMETER :: WP = KIND(1.0D0)
INTEGER, INTENT(OUT):: INFO
REAL(WP), INTENT(INOUT) :: A(:,:)
REAL(WP), INTENT(OUT) :: S(:)
REAL(WP), INTENT(OUT),TARGET :: VT(:,:)
CHARACTER(LEN=1) :: JOBU
CHARACTER(LEN=1) :: JOBVT
INTEGER :: M
INTEGER :: N
INTEGER :: LDA
INTEGER :: LDU
INTEGER :: LDVT
INTEGER :: LWORK
REAL(WP), POINTER :: O_U(:,:)
REAL(WP) WORK(LWORK)

LDU = 1
JOBU = 'O'
JOBVT = 'A'
CALL F77_GESVD(JOBU,JOBVT,M,N,A,LDA,S,O_U,LDU,VT,LDVT,WORK,LWORK,INFO)
if (INFO /=0) then
CALL F77_XERBLA('GESVD',INFO)
endif
END SUBROUTINE DGESVD_MKL[/fortran]

and used following calling routine

[sectionBody]subroutine testspeedmklF77(A,B,x,m,n)
USE mkl95_PRECISION, ONLY: WP => DP
use MKL_SVD
use mkl95_blas , only: GEMV
IMPLICIT NONE
INTEGER :: I, J, INFO, M, N
real*8 A(m,n),B(m),x(m)
real*8 S(min(m,n)),VT(n,n),aux(m)
real*8 AA(m,n)
integer*4 lda,ldvt,lwork

AA=A
!M = SIZE(A,1)
!N = SIZE(A,2)
!LDA = MAX(1,SIZE(A,1))
!LDVT = MAX(1,SIZE(VT,1))
LDA=M
LDVT=N
lwork=max(3*min(m, n)+max(m, n), 5*min(m,n))
CALL DGESVD_MKL( AA,m,n,LDA,S,VT,LDVT,lWork,INFO ) ! aa vraci u
call GEMV(aa,b,aux,trans='T')
aux=aux/S
call GEMV(vt,aux,x,trans='T')
end subroutine


[/sectionBody]
  • I obtained following results for 100000 calls to SVD
  • Numerical recipes routine - 0.22 sec
  • MKL FORTRAN 95 wrapper - 1.05 sec
  • MKL my F77 wrapper - 0.87 sec
  • So MKL is about 4 times slower. I think that it is due to small matrix and probably MKL internal allocations.
  • I add my test code.
  • Jakub
[sectionBody][/sectionBody]

0 Kudos
ZlamalJakub
New Contributor III
723 Views

I did not found ippsSVD function in "Reference Manual, Volume 2: Image and Video Processing".

Jakub

0 Kudos
Gennady_F_Intel
Moderator
723 Views

sorry, my fault - this is signal processing manual ( ippsman.chm)

and as an additionally,see, ippsr.h e,g:IPPAPI(IppStatus, ippsSVD_64f_D2,(const Ipp64f* pSrcA, Ipp64f* pDstU, int height,

Ipp64f* pDstW, Ipp64f* pDstV, int width, int step, int nIter))

--Gennady

0 Kudos
ZlamalJakub
New Contributor III
723 Views

I downloaded evaluation copy of IPP and tried to call it

[fortran]subroutine testspeedipp(A,B,x,m,n)
      USE mkl95_PRECISION, ONLY: WP => DP
      use mkl95_blas , only: GEMV
      IMPLICIT NONE

    interface
        !ippsSVD_64f_D2L_I(Ipp64f** mSrcDstA, int height, Ipp64f* pDstW,Ipp64f** mDstV, int width, int nIter);
	    integer*4 function ippsSVD_64f_D2L_I(mSrcDstA,height,pDstW,mDstV,width,nIter)
		    !DEC$ IF DEFINED(_X86_)
		    !DEC$ ATTRIBUTES STDCALL, ALIAS : '_ippsSVD_64f_D2L_I@24' :: ippsSVD_64f_D2L_I
		    !DEC$ ELSE
		    !DEC$ ATTRIBUTES STDCALL, ALIAS : 'ippsSVD_64f_D2L_I' :: ippsSVD_64f_D2L_I
		    !DEC$ ENDIF
            !DEC$ ATTRIBUTES REFERENCE :: mSrcDstA,mDstV,pDstW
            !DEC$ ATTRIBUTES VALUE :: height,width,nIter
		    use ifwinty
		    integer*4 height,width,nIter
		    real*8 mSrcDstA(width,height)
		    real*8 mDstV(width,width)
		    real*8 pDstW(width)
        end function
    end interface

      INTEGER*4 :: M, N
      real*8 A(m,n),B(m),x(m)
      real*8 S(min(m,n)),V(n,n),aux(m)
      real*8 AA(m,n)
      integer*4 iret,nIter
      AA=A
      nIter=60
      iret=ippsSVD_64f_D2L_I(AA,m,s,v,n,nIter)
      call GEMV(aa,b,x,trans='T')
      aux=x/S  
      call GEMV(v,aux,x,trans='N')

end
[/fortran]
But I obtained error message

Unhandled exception at 0x0245b7ac in Program.exe: 0xC0000005: Access violation reading location 0x00000000.

when I tried to pass line calling ippsSVD_64f_D2L_I

I suppose that routine ippsSVD_64f_D2L_I uses STDCALL calling convention so I defined my FORTRAN interface correctly.

I called testspeedipp(A,B,x,m,n) routine with A(1:5,1:5),B(1:5),x(1:5) and m=n=5 values.

Where can be problem?

Jakub

0 Kudos
TimP
Honored Contributor III
723 Views

I suppose that routine ippsSVD_64f_D2L_I uses STDCALL calling convention so I defined my FORTRAN interface correctly.

___________________

Unlikely;you could easily find out by dumpbin or equivalent.

May require iso_c_binding, as it isn't written to accomodate Fortran.

0 Kudos
ZlamalJakub
New Contributor III
723 Views

I do not understand why to use dumpbin on ipp library. What I am able to find with?

I tried to use

[fortran]    interface
        !ippsSVD_64f_D2L_I(Ipp64f** mSrcDstA, int height, Ipp64f* pDstW,Ipp64f** mDstV, int width, int nIter);
	    integer*4 function ippsSVD_64f_D2L(mSrcDstA,height,pDstW,mDstV,width,nIter) bind(C,name="ippsSVD_64f_D2L_I@24")
            !DEC$ ATTRIBUTES REFERENCE :: mSrcDstA,mDstV,pDstW
            !DEC$ ATTRIBUTES VALUE :: height,width,nIter
		    use ifwinty
            use, intrinsic :: iso_c_binding
		    integer(C_INT):: height,width,nIter
		    real(C_DOUBLE):: mSrcDstA(width,height)
		    real(C_DOUBLE) mDstV(width,width)
		    real(C_DOUBLE) pDstW(width)
        end function
    end interface
[/fortran]

But I stil did not succeeded.

0 Kudos
ArturGuzik
Valued Contributor I
723 Views

Tim's suggestion is to use dumpbin to get the exported symbols to check the calling convention related issues.

A.

0 Kudos
Reply