- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I am using intel fortran compiler and intel mkl for a performance check. I am passing some array sections to Fortran 77 interface with calls like

<code>call dgemm( transa,transb,sz_s,P,P,& a, Ts_tilde,& sz_s,R_alpha,P,b,tr(:sz_s,:),sz_s)</code>

as evident, tr(:sz_s,:) is not contiguous in memory and the Fortran 77 interface is expecting a continuous block and creating a temporary for this.

What I was wondering is that will there be a difference if I create my temporary array explicitly in the code for tr and copy information from that temporary back and forth before and after the operation, or will that be the same as compiler itself creating the temporary from a performance point of view? I guess compiler will always be more efficient.

And of course any more suggestions to eliminate these temporaries are welcome.

One more point, If I use the Fortran 95 interface of the library apparently, with a similar call on a simpler test problem, no warning is issued for the creation of a temporary. Then I read in the manual of mkl that Fortran 95 interface uses assumed shape arrays which explains why temporaries are not created. Is this the logical way to continue if I can not reshape the above code to work with contiguous blocks?

However at that point, while testing the Fortran95 interfaces, I run into a problem with the support function dsecnd. With the below code using mkl_service module I am getting

dgemm95_test.f90(30): error #6404: This name does not have a type, and must have an explicit type. [DSECND]

t1 = dsecnd()

-----^

Any idea for this problem is also welcome. The simple code for the dsecnd problem is

<code>program dgemm95_test ! some modules for Fortran 95 interface use mkl_service use mkl95_precision use mkl95_blas ! implicit none ! double precision, dimension(4,3) :: a double precision, dimension(6,4) :: b double precision, dimension(5,5) :: r ! result array double precision, dimension(3,2) :: dummy_b ! character(len=1) :: transa character(len=1) :: transb ! double precision :: alpha, beta, t1, t2, t integer :: sz1, sz2 ! initialize some variables alpha = 1.0 beta = 0.0 a = 2.3 b = 4.5 r = 0.0 transa = 'n' transb = 'n' dummy_b = 0.0 ! Fortran 95 interface t1 = dsecnd() call gemm( a, b(4:6,1:3:2), r(2:5,3:4),& transa, transb, alpha, beta ) t2 = dsecnd() ! write(*,*) r dummy_b = r(2:4,4:5) ! end program dgemm95_test</code>

Any help and advice on these points are highly appreciated.

Best regards,

Umut

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here is an example. With a 2 X 4 matrix A, a 4 X 5 matrix B, and a 2 X 5 matrix C, the program computes (i) α A B + β C, (ii) α A_{1} B_{1} + β C_{1}, where A_{1} is A(2:,:), B_{1} is B(:,2:) and C_{1} is C(2:,2:). You may try the same example in a C.A. system such as Matlab.

program dgmsub implicit none integer m,n,k,i parameter (m=2,n=5,k=4) double precision A(m,k),B(k,n),C(m,n) data A/1.5d0,2.22d0,6.3d0,9d0,1d0,-4d0,0.2d0,7.5d0/ data B/1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0, + 1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0/ data C/4*0d0,6*1d0/ double precision alpha,beta c alpha=0.5d0 beta=-1.2d0 write(*,*)' C = beta * C + alpha * A * B' write(*,*) call dgemm('N','N',m,n,k,alpha,A,m,B,k,beta,C,m) do i=1,m write(*,10)(C(i,1:n)) end do C C Restore C C do i=1,2 C(:,i) = 0d0 end do do i=3,5 C(:,i) = 1d0 end do write(*,*) write(*,*)' C1 = beta * C1 + alpha * A1 * B1, where' write(*,*)' C1 = C(2:,2:), A1 = A(2:,:), B1 = B(:,2:)' write(*,*) call dgemm('N','N',m-1,n-1,k,alpha,A(2,1),m,B(1,2),k,beta, + C(2,2),m) do i=2,m write(*,10)(C(i,2:n)) end do 10 format(1x,5F9.2) end

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The array section is being assembled because you asked for it. If, instead, you had used **tr **and **ldtr **as the last two arguments to **dgemm**, where **ldtr **is the declared leading dimension of **tr**, you would have used the BLAS routine as it was designed to be, and no array section would be needed. Whether this would be slower or faster than code in which you assemble the sub-matrix yourself is debatable, and would depend on how often this code is executed and with which argument values.

Is dsecnd() declared in the modules that you included? If not, because of the IMPLICIT NONE statement that you included, you should either declare the type of the function, or include/USE a file where that declaration is made available, such as mkl_lapack.fi.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

mecej4 wrote:

The array section is being assembled because you asked for it. If, instead, you had used

trandldtras the last two arguments todgemm, whereldtris the declared leading dimension oftr, you would have used the BLAS routine as it was designed to be, and no array section would be needed.

I did not clearly understand, I need the upper half part of the block of tr in this case. I did not completely get how this is possible with your explanation. Can you maybe rephrase a bit? Say, for instance, tr is of size 3922X4 originally and I need the upper block represented with tr(1:1490,:) where I would like to assign the result in that multiplication. How will that accomplish the correct addressing in this case and moreover how will that do it if I did not start sectioning from 1 but 100, namely, tr(100:1490,:)? Can you comment on these points also a bit?

Thanks.

Umut

Edit: In the below code, indeed if I keep your advice, it does the right thing but if I start from row 2, it creates a temporary, I did not clearly understand the difference between these.

program dgemm77_test ! some modules for Fortran 95 interface use mkl95_precision use mkl95_blas ! implicit none ! double precision, dimension(4,3) :: a double precision, dimension(6,4) :: b double precision, dimension(5,5) :: r ! result array double precision, dimension(3,2) :: dummy_b ! character(len=1) :: transa character(len=1) :: transb ! double precision :: alpha, beta integer :: sz1, sz2, i, j !initialize some variables alpha = 1.0 beta = 0.0 a = 2.3 b = 4.5 r = 0.0 transa = 'n' transb = 'n' dummy_b = 0.0 !Fortran 77 interface !creates a temporary !call dgemm( transa,transb,4,2,3,& ! alpha, a,& ! 4,b,3,beta,r(2:,:),5) ! does not create a temporary ! start adressing from row 1 ! I did not get how the striding is ! done correctly although call dgemm( transa,transb,4,2,3,& alpha, a,& 4,b,3,beta,r,5) ! do i=1,5 write(*,'(1X5E12.5)') (r(i,j),j=1,5) end do end program dgemm77_test

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here is an example. With a 2 X 4 matrix A, a 4 X 5 matrix B, and a 2 X 5 matrix C, the program computes (i) α A B + β C, (ii) α A_{1} B_{1} + β C_{1}, where A_{1} is A(2:,:), B_{1} is B(:,2:) and C_{1} is C(2:,2:). You may try the same example in a C.A. system such as Matlab.

program dgmsub implicit none integer m,n,k,i parameter (m=2,n=5,k=4) double precision A(m,k),B(k,n),C(m,n) data A/1.5d0,2.22d0,6.3d0,9d0,1d0,-4d0,0.2d0,7.5d0/ data B/1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0, + 1d0,2d0,3d0,4d0,1d0,2d0,3d0,4d0/ data C/4*0d0,6*1d0/ double precision alpha,beta c alpha=0.5d0 beta=-1.2d0 write(*,*)' C = beta * C + alpha * A * B' write(*,*) call dgemm('N','N',m,n,k,alpha,A,m,B,k,beta,C,m) do i=1,m write(*,10)(C(i,1:n)) end do C C Restore C C do i=1,2 C(:,i) = 0d0 end do do i=3,5 C(:,i) = 1d0 end do write(*,*) write(*,*)' C1 = beta * C1 + alpha * A1 * B1, where' write(*,*)' C1 = C(2:,2:), A1 = A(2:,:), B1 = B(:,2:)' write(*,*) call dgemm('N','N',m-1,n-1,k,alpha,A(2,1),m,B(1,2),k,beta, + C(2,2),m) do i=2,m write(*,10)(C(i,2:n)) end do 10 format(1x,5F9.2) end

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

mecej4 wrote:

do i=1,2 C(:,i) = 0d0 end do do i=3,5 C(:,i) = 1d0 end do write(*,*) write(*,*)' C1 = beta * C1 + alpha * A1 * B1, where' write(*,*)' C1 = C(2:,2:), A1 = A(2:,:), B1 = B(:,2:)' write(*,*) call dgemm('N','N',m-1,n-1,k,alpha,A(2,1),m,B(1,2),k,beta, + C(2,2),m) do i=2,m write(*,10)(C(i,2:n)) end do 10 format(1x,5F9.2) end

ok, now it makes sense, thanks for the example.

Giving a starting adress like C(2,2)(already passed by reference, which is the default in Fortran, in contrary to C like languages) and the leading dimension makes fortran to fill up the matrix without making temporaries. I guess this interpretation is correct, right?

I should read a bit more carefully on the information that is passed to subroutines when you pass an array. Any pointers on this? And moreover, does this information change with the type of the array, such as an assumed-shape array, assumed-size array, allocatable array, and so on?

Many thanks for the explanations and very helpful example.

Umut

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Giving a starting adress like C(2,2)(already passed by reference, which is the default in Fortran, in contrary to C like languages) and the leading dimension makes fortran to fill up the matrix without making temporaries.Conceptually, and in effect, yes. However, there is no need to "fill up the matrix". Passing by reference is common practice, but is not required by the Fortran standard. In Fortran, many lower level details are left out from the standard as "implementation details".

In older versions of Fortran, array arguments had to be of type "assumed size", and enough information had to be available to compute a linear address from a multi-dimensional array reference. Newer versions of Fortran allow array arguments to be assumed shape, allocatable, and arguments may be optional. For such arguments to be treated properly, additional information (such as interfaces) is required, as specified in the language standard. The actual details of how this information is transmitted are implementation-dependent, and you should not bother with such details if you want your code to be portable.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

assumed shape array requires explicit interface on both sides of the call. The only explicit blas interfaces available for MKL BLAS are the blas95/lapack95 ones. The original BLAS which you called Fortran 77 interface uses what became called assumed size throughout. Allocatable doesn't affect what is passed down to BLAS.

If you restrict your view to Fortran implementations which pass by reference, it's probably OK to consider notation like C(2,2) as passing a C pointer to that element of the array. I don't know whether any BLAS implementations exist on platforms supporting both C and Fortran which don't work this way.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page