Solved: advice for getting rid of temporary creation and fortran95 interface for dsecnd

utab · ‎06-29-2014

I am using intel fortran compiler and intel mkl for a performance check. I am passing some array sections to Fortran 77 interface with calls like

<code>call dgemm( transa,transb,sz_s,P,P,&
            a, Ts_tilde,&
            sz_s,R_alpha,P,b,tr(:sz_s,:),sz_s)</code>

as evident, tr(:sz_s,:) is not contiguous in memory and the Fortran 77 interface is expecting a continuous block and creating a temporary for this.

What I was wondering is that will there be a difference if I create my temporary array explicitly in the code for tr and copy information from that temporary back and forth before and after the operation, or will that be the same as compiler itself creating the temporary from a performance point of view? I guess compiler will always be more efficient.

And of course any more suggestions to eliminate these temporaries are welcome.

One more point, If I use the Fortran 95 interface of the library apparently, with a similar call on a simpler test problem, no warning is issued for the creation of a temporary. Then I read in the manual of mkl that Fortran 95 interface uses assumed shape arrays which explains why temporaries are not created. Is this the logical way to continue if I can not reshape the above code to work with contiguous blocks?

However at that point, while testing the Fortran95 interfaces, I run into a problem with the support function dsecnd. With the below code using mkl_service module I am getting

dgemm95_test.f90(30): error #6404: This name does not have a type, and must have an explicit type. [DSECND]
t1 = dsecnd()
-----^

Any idea for this problem is also welcome. The simple code for the dsecnd problem is

<code>program dgemm95_test
! some modules for Fortran 95 interface
use mkl_service
use mkl95_precision
use mkl95_blas
!
implicit none
!
double precision, dimension(4,3) :: a
double precision, dimension(6,4) :: b
double precision, dimension(5,5) :: r ! result array
double precision, dimension(3,2) :: dummy_b
!
character(len=1) :: transa
character(len=1) :: transb
!
double precision :: alpha, beta, t1, t2, t
integer :: sz1, sz2

! initialize some variables
alpha = 1.0
beta = 0.0
a = 2.3
b = 4.5
r = 0.0
transa = 'n'
transb = 'n'
dummy_b = 0.0
! Fortran 95 interface
t1 = dsecnd()
call gemm( a, b(4:6,1:3:2), r(2:5,3:4),&
 transa, transb, alpha, beta )
t2 = dsecnd()
!
write(*,*) r
dummy_b  = r(2:4,4:5)
!
end program dgemm95_test</code>

Any help and advice on these points are highly appreciated.

Best regards,

Umut

mecej4 · ‎06-30-2014

Here is an example. With a 2 X 4 matrix A, a 4 X 5 matrix B, and a 2 X 5 matrix C, the program computes (i) α A B + β C, (ii) α A₁ B₁ + β C₁, where A₁ is A(2:,:), B₁ is B(:,2:) and C₁ is C(2:,2:). You may try the same example in a C.A. system such as Matlab.

      program dgmsub
      implicit none
      integer m,n,k,i
      parameter (m=2,n=5,k=4)
      double precision A(m,k),B(k,n),C(m,n)
      data A/1.5d0,2.22d0,6.3d0,9d0,1d0,-4d0,0.2d0,7.5d0/
      data B/1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0,
     +      1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0/
      data C/4*0d0,6*1d0/
      double precision alpha,beta
c
      alpha=0.5d0
      beta=-1.2d0
      write(*,*)' C = beta * C + alpha * A * B'
      write(*,*)
      call dgemm('N','N',m,n,k,alpha,A,m,B,k,beta,C,m)
      do i=1,m
         write(*,10)(C(i,1:n))
      end do
C
C     Restore C
C
      do i=1,2
         C(:,i) = 0d0
      end do
      do i=3,5
         C(:,i) = 1d0
      end do
      write(*,*)
      write(*,*)' C1 = beta * C1 + alpha * A1 * B1, where'
      write(*,*)' C1 = C(2:,2:), A1 = A(2:,:), B1 = B(:,2:)'
      write(*,*)
      call dgemm('N','N',m-1,n-1,k,alpha,A(2,1),m,B(1,2),k,beta,
     +           C(2,2),m)
      do i=2,m
         write(*,10)(C(i,2:n))
      end do
   10 format(1x,5F9.2)
      end

View solution in original post

mecej4 · ‎06-29-2014

The array section is being assembled because you asked for it. If, instead, you had used tr and ldtr as the last two arguments to dgemm, where ldtr is the declared leading dimension of tr, you would have used the BLAS routine as it was designed to be, and no array section would be needed. Whether this would be slower or faster than code in which you assemble the sub-matrix yourself is debatable, and would depend on how often this code is executed and with which argument values.

Is dsecnd() declared in the modules that you included? If not, because of the IMPLICIT NONE statement that you included, you should either declare the type of the function, or include/USE a file where that declaration is made available, such as mkl_lapack.fi.

utab · ‎06-29-2014

mecej4 wrote:

The array section is being assembled because you asked for it. If, instead, you had used tr and ldtr as the last two arguments to dgemm, where ldtr is the declared leading dimension of tr, you would have used the BLAS routine as it was designed to be, and no array section would be needed.

I did not clearly understand, I need the upper half part of the block of tr in this case. I did not completely get how this is possible with your explanation. Can you maybe rephrase a bit? Say, for instance, tr is of size 3922X4 originally and I need the upper block represented with tr(1:1490,:) where I would like to assign the result in that multiplication. How will that accomplish the correct addressing in this case and moreover how will that do it if I did not start sectioning from 1 but 100, namely, tr(100:1490,:)? Can you comment on these points also a bit?

Thanks.

Umut

Edit: In the below code, indeed if I keep your advice, it does the right thing but if I start from row 2, it creates a temporary, I did not clearly understand the difference between these.

program dgemm77_test
! some modules for Fortran 95 interface
use mkl95_precision
use mkl95_blas
!
implicit none
!
double precision, dimension(4,3) :: a
double precision, dimension(6,4) :: b
double precision, dimension(5,5) :: r ! result array
double precision, dimension(3,2) :: dummy_b
!
character(len=1) :: transa
character(len=1) :: transb
!
double precision :: alpha, beta
integer :: sz1, sz2, i, j
!initialize some variables
alpha = 1.0
beta = 0.0
a = 2.3
b = 4.5
r = 0.0
transa = 'n'
transb = 'n'
dummy_b = 0.0
!Fortran 77 interface
!creates a temporary
!call dgemm( transa,transb,4,2,3,&
!     alpha, a,&
!     4,b,3,beta,r(2:,:),5)


! does not create a temporary
! start adressing from row 1
! I did not get how the striding is
! done correctly although
call dgemm( transa,transb,4,2,3,&
     alpha, a,&
     4,b,3,beta,r,5)
!
do i=1,5
   write(*,'(1X5E12.5)') (r(i,j),j=1,5)
end do

end program dgemm77_test

mecej4 · ‎06-30-2014

Here is an example. With a 2 X 4 matrix A, a 4 X 5 matrix B, and a 2 X 5 matrix C, the program computes (i) α A B + β C, (ii) α A₁ B₁ + β C₁, where A₁ is A(2:,:), B₁ is B(:,2:) and C₁ is C(2:,2:). You may try the same example in a C.A. system such as Matlab.

      program dgmsub
      implicit none
      integer m,n,k,i
      parameter (m=2,n=5,k=4)
      double precision A(m,k),B(k,n),C(m,n)
      data A/1.5d0,2.22d0,6.3d0,9d0,1d0,-4d0,0.2d0,7.5d0/
      data B/1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0,
     +      1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0/
      data C/4*0d0,6*1d0/
      double precision alpha,beta
c
      alpha=0.5d0
      beta=-1.2d0
      write(*,*)' C = beta * C + alpha * A * B'
      write(*,*)
      call dgemm('N','N',m,n,k,alpha,A,m,B,k,beta,C,m)
      do i=1,m
         write(*,10)(C(i,1:n))
      end do
C
C     Restore C
C
      do i=1,2
         C(:,i) = 0d0
      end do
      do i=3,5
         C(:,i) = 1d0
      end do
      write(*,*)
      write(*,*)' C1 = beta * C1 + alpha * A1 * B1, where'
      write(*,*)' C1 = C(2:,2:), A1 = A(2:,:), B1 = B(:,2:)'
      write(*,*)
      call dgemm('N','N',m-1,n-1,k,alpha,A(2,1),m,B(1,2),k,beta,
     +           C(2,2),m)
      do i=2,m
         write(*,10)(C(i,2:n))
      end do
   10 format(1x,5F9.2)
      end

utab · ‎06-30-2014

mecej4 wrote:

do i=1,2 C(:,i) = 0d0 end do do i=3,5 C(:,i) = 1d0 end do write(*,*) write(*,*)' C1 = beta * C1 + alpha * A1 * B1, where' write(*,*)' C1 = C(2:,2:), A1 = A(2:,:), B1 = B(:,2:)' write(*,*) call dgemm('N','N',m-1,n-1,k,alpha,A(2,1),m,B(1,2),k,beta, + C(2,2),m) do i=2,m write(*,10)(C(i,2:n)) end do 10 format(1x,5F9.2) end

ok, now it makes sense, thanks for the example.

Giving a starting adress like C(2,2)(already passed by reference, which is the default in Fortran, in contrary to C like languages) and the leading dimension makes fortran to fill up the matrix without making temporaries. I guess this interpretation is correct, right?

I should read a bit more carefully on the information that is passed to subroutines when you pass an array. Any pointers on this? And moreover, does this information change with the type of the array, such as an assumed-shape array, assumed-size array, allocatable array, and so on?

Many thanks for the explanations and very helpful example.

Umut

mecej4 · ‎06-30-2014

Giving a starting adress like C(2,2)(already passed by reference, which is the default in Fortran, in contrary to C like languages) and the leading dimension makes fortran to fill up the matrix without making temporaries.

Conceptually, and in effect, yes. However, there is no need to "fill up the matrix". Passing by reference is common practice, but is not required by the Fortran standard. In Fortran, many lower level details are left out from the standard as "implementation details".

In older versions of Fortran, array arguments had to be of type "assumed size", and enough information had to be available to compute a linear address from a multi-dimensional array reference. Newer versions of Fortran allow array arguments to be assumed shape, allocatable, and arguments may be optional. For such arguments to be treated properly, additional information (such as interfaces) is required, as specified in the language standard. The actual details of how this information is transmitted are implementation-dependent, and you should not bother with such details if you want your code to be portable.

TimP · ‎06-30-2014

assumed shape array requires explicit interface on both sides of the call. The only explicit blas interfaces available for MKL BLAS are the blas95/lapack95 ones. The original BLAS which you called Fortran 77 interface uses what became called assumed size throughout. Allocatable doesn't affect what is passed down to BLAS.

If you restrict your view to Fortran implementations which pass by reference, it's probably OK to consider notation like C(2,2) as passing a C pointer to that element of the array. I don't know whether any BLAS implementations exist on platforms supporting both C and Fortran which don't work this way.