Decrease the computation time for DGELSD subroutine.

pankaj_p_ · ‎03-25-2017

Hello all,

I am writing a code in fortran that involve computation of linear least squares (A^(-1)*b). For this I have used the subroutine "DGELSD". I am able to get the correct answer from my code. I have cross checked my solution with that of the data available from MATLAB.

Now when it comes to the computation time, MATLAB is taking much less time than that of my .f90 code. The main motivation for me to write the .f90 code was to do heavy computations, as MATLAB was unable to handle this problem (as the matrix size is increased, it takes more and more time). I am talking of the order of matrix around (10^6 x 10^6).

I know that I may be lacking somewhere around the vectorization or parallelization of code (as I am new to it). But would it make any difference? As the subroutine "DGELSD" is already highly optimized. Also I am using Intel ifort compiler and visual studio as an editor.

I have attached the part of the main code below for reference. Please suggest what can be done to decrease the computation time. I am having good hardware configuration to run heavy computation. It is extremely required. Please help.

Workstation specification: Intel Xeon 32 core - 64 bit processor, 64 GB RAM.

Code information:

a) I have used 'DGELSD' example available on Intel MKL example site..

b) I am using x64 architecture.

c) This is the code I am using in MATLAB for comparison

function min_norm_check(m,n)
tic
a=15*m*n;
b=15*m*n;
c=6;
A=rand(a,b);
B=rand(b,c);
C=A\B;
toc
end

! Program to check the time required to
! calculate linear least square solution 
! using DGELSD subroutine

PROGRAM MAIN

      IMPLICIT NONE
      
      INTEGER :: M,N,LDA,LDB,NRHS,NB,NG

      REAL T1,T2,START,FINISH
      
      DOUBLE PRECISION, DIMENSION (:,:), ALLOCATABLE :: D_CAP,A,B,TG,DG
      
      INTEGER :: I=0

      NB=10
      NG=10
      
      M = 15*NB*NG
      N = 15*NB*NG
      NRHS = 6
!!
      
      LDA   = MAX(M,N)
      LDB   = MAX(M,NRHS)
!
      ALLOCATE (A(LDA,N))
      ALLOCATE (B(LDB,NRHS))


      A = 0
      B = 0
      
      CALL RANDOM_NUMBER(A)
      
      CALL RANDOM_NUMBER(B)
      
      CALL CPU_TIME(START)


      DO I=1,1
          
          WRITE(*,*) I
          
          CALL CALC_MIN_NORM_D(M,N,LDA,LDB,NRHS,A,B,D_CAP)   
          
      ENDDO
      
      CALL CPU_TIME(FINISH)
      
      WRITE(*,*)'("TIME =',FINISH-START,'SECONDS.")'

      END

      SUBROUTINE CALC_MIN_NORM_D(M,N,LDA,LDB,NRHS,A,B,D_CAP)
!
!     SUBROUTINE DEFINITION TO CALCULATE THE 
!     LINEAR LEAST SQUARE SOLUTION OF (^-1*B)
!
      IMPLICIT NONE

      INTEGER :: M,N,NRHS,LDA,LDB,LWMAX,INFO,LWORK,RANK

      DOUBLE PRECISION RCOND

      INTEGER, ALLOCATABLE, DIMENSION(:) :: IWORK
      
      DOUBLE PRECISION :: A(LDA,N),B(LDB,NRHS),D_CAP(LDB,NRHS) 
      DOUBLE PRECISION, ALLOCATABLE, DIMENSION(:) :: S,WORK
!
      WRITE(*,*)'IN CALC MIN NORM BEGINING'
      ...

Thanks

Pankaj

jimdempseyatthecove · ‎03-26-2017

Please show your compile and link options. Version number of ifort may be helpful too.

Jim Dempsey

pankaj_p_ · ‎03-26-2017

Hello Jimdempseyatthecove

I have attached the build log having compilation information:

---------------------------------------------------------------------------------------------------------

Compiling with Intel(R) Visual Fortran Compiler 17.0.2.187 [IA-32]...
ifort /nologo /debug:full /Od /warn:interfaces /module:&quotDebug\\&quot /object:&quotDebug\\&quot /Fd&quotDebug\vc110.pdb&quot /traceback /check:bounds /check:stack /libs:dll /threads /dbglibs /c /Qlocation,link,&quotC:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\\bin&quot /Qm32 &quotD:\Google Drive\Friday_May_27_2016\Mtech Thesis Job\All new work\Fortran coding\fortran hfgmc\Console6\Console6\Console6.f90&quot
Linking...
Link /OUT:&quotDebug\Console6.exe&quot /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:&quotDebug\Console6.exe.intermediate.manifest&quot /MANIFESTUAC:&quotlevel='asInvoker' uiAccess='false'&quot /DEBUG /PDB:&quotD:\Google Drive\Friday_May_27_2016\Mtech Thesis Job\All new work\Fortran coding\Console6\Console6\Debug\Console6.pdb&quot /SUBSYSTEM:CONSOLE /IMPLIB:&quotD:\Google Drive\Friday_May_27_2016\Mtech Thesis Job\All new work\Fortran coding\fortran hfgmc\Console6\Console6\Debug\Console6.lib&quot -qm32 &quotDebug\Console6.obj&quot
Embedding manifest...
mt.exe /nologo /outputresource:&quotD:\Google Drive\Friday_May_27_2016\Mtech Thesis Job\All new work\Fortran coding\fortran hfgmc\Console6\Console6\Debug\Console6.exe;#1&quot /manifest &quotDebug\Console6.exe.intermediate.manifest&quot

Console6 - 0 error(s), 0 warning(s)

--------------------------------------------------------------------------------------------------------------------------

Also for further information, I have attached the small report on the time taken by Intel visual fortran compiler and MATLAB respectively.

I have compiled the program and the output has been written in file 'OUTPUT_FILE.TXT'.

IN CALC MIN NORM BEGINING

DGELSD EXAMPLE PROGRAM RESULTS

EFFECTIVE RANK = 1500

IN CALC MIN NORM ENDING

("TIME TAKEN TO SOLVE DGELSD= 4.290028 SECONDS.")

On the other hand MATLAB gives the following result in the output command window:

min_norm_check(10,10)

Elapsed time is 0.224172 seconds.

-------------------------------------------------------------------------

Also I run my programs from visual studio. I don't use command line interface. And thus I am using default compile and link options (but I had already included intel mkl library). Also in the properties of the program I find many options in Linker. So can you please tell me which information is required and how can I figure it out on my system.

Thanks again

Jimdempseyatthecove

mecej4 · ‎03-26-2017

The options /Od means "disable optimizations" and when you specify /debug and various checking options, extra code may be placed in the EXE to implement those requests. There are few instances where it makes sense to time runs of unoptimized code or to compare the run times of two pieces of software, only one of which has been optimized.

Furthermore, versions of Matlab released in recent years have included MKL DLLs, so there should be no difference in the time taken in the actual execution of the DGELSD call itself. What you are measuring may actually be a difference in gathering and organizing the input data to DGELSD and reporting the output.

pankaj_p_ · ‎03-26-2017

thanks mecej4,

But i just wanted to make sure that computations involving "DGELSD" should take less time.

Thanks

mecej4 · ‎03-26-2017

This thread should probably be moved to the MKL forum.

pankaj pandya wrote:

But i just wanted to make sure that computations involving "DGELSD" should take less time.

How did you "make sure"? By making the other parts of your program slow by turning off optimization and turning on run-time checking?

From the IFort documentation of CPU_TIME:

If you want to estimate performance or scaling of multithreaded applications, you should use intrinsic subroutine SYSTEM_CLOCK or portability function DCLOCK. Both of these routines return the elapsed time from a single clock.

In effect, you are comparing the sum of the times consumed in many simultaneous threads with time on the wall clock.

Secondly, your Matlab code does not ask for a minimum norm solution at all, since you create and use a square matrix of full rank. In Matlab, the "mldivide" operator is overloaded. Matlab is probably using DGESV, but your Fortran code is calling DGELSD. Similarly, in your Fortran code you are using GELSD for a problem to solve which there are better routines. Therefore, your comparison of run times is of no use.

Your code for sizing and allocating the work array WORK is wrong. In short, you make a workspace query, and then ignore the returned information. In the case when WORK(1) > 100,000,000, your second call to DGELSD will fail.

Please read the documentation of GELSD in the MKL manual and follow its prescriptions.