Correct way to compile and call blas subroutines in F95

Cayo_G_ · ‎02-10-2021

Hello,

I am having a lot of trouble to understand how to use fortran 95 versions of BLAS subroutines.

For example, I would like to use dgemm to perform an outer product of two vectors in a subroutine that is inside a module, called by a main program.

I am trying a simple test, take a vector [1,2,3] and do the outer product with itself, which should return me the matrix:

1 2 3
2 4 6
3 6 9

So, at the beginning of my module I use:

include "blas.f90"

In my subroutine where I want to perform this operation, I try:

real(dp) :: matrix(3,3),w(3,1),wt(1,3),u(3)
matrix = 0.0_dp
u = [1.0_dp,2.0_dp,3.0_dp]
w(:,1)=u(:)
wt(1,:)=u(:)
!call dgemm('T', 'N', 3, 3, 1, 1.0_dp, u, 1, u, 1, 1.0_dp, matrix, 3)  ! WORKS
!call dgemm('N', 'N', 3, 3, 1, 1.0_dp, w, 3, wt, 1, 1.0_dp, matrix, 3) ! WORKS
call dgemm_f95(w,wt,matrix)                       ! DO NOT WORK
call dgemm_f95(w,wt,matrix,'N','N',1.0_dp,1.0_dp) ! DO NOT WORK
call dgemm_f95(u,u,matrix)                        ! DO NOT WORK
write(*,'(3f5.2)')matrix(1,:)
write(*,'(3f5.2)')matrix(2,:)
write(*,'(3f5.2)')matrix(3,:)

Any of the uncommented call commands works, returning me a zero matrix, as if dgemm_f95 is not doing anything.

The compilation is a little complex because I do a lot of different stuff in the main program, and I wanted to ensure 64 bits inetgers:

h5pfc -c -qopenmp -i8 -I${MKLROOT}/include/intel64/ilp64 -mkl=parallel ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -liomp5 -lpthread -lm -ldl module.f90
h5pfc -c -qopenmp -i8 -I${MKLROOT}/include/intel64/ilp64 -mkl=parallel ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -liomp5 -lpthread -lm -ldl main.f90 
h5pfc    -qopenmp -i8 -I${MKLROOT}/include/intel64/ilp64 -mkl=parallel ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -liomp5 -lpthread -lm -ldl main.o module.o -o exe

I would like also to do the same for cross products with these vectors, but I am lost which BLAS subroutine does that.

Can someone help me with this, specially if I am using somehow the wrong flags?

Many thanks in advance,

Cayo Gonçalves

mecej4 · ‎02-10-2021

Instead of haphazardly trying various calls as you did, please follow the directions given in the MKL documentation, and study the example driver source files that are provided with MKL. Instead of resorting to recipes, trust reasoning.

You should have almost no occasion for using routines with _f95 in their names. Instead, use generic names such as DGEMM, and make sure that in any routine that calls DGEMM you have a USE BLAS95 statement (and similarly a USE LAPACK95 statement for routine that invoke LAPACK95 routines). The module files for these USE statements can be generated by compiling the BLAS.f90 and LAPACK.f90 source files in the MKL include directory, after copying them to a working directory, and then placing the resulting *.MOD files in a place where the compiler can access them (see the -I and -module options).

Cayo_G_ · ‎02-11-2021

Hello mecej4, thanks for your reply.

I am sorry but MKL documentation is extremely not user friendly. The examples I found were all using fortran 77, and the idea of creating 95 versions of BLAS functions and subroutines is to simplify a lot their use, by excluding the need to specify nowadays normally useless information as dimensions and increments. Would be very helpful if in the MKL documention on the web, for each function or subroutine it was included an example of code in f77 and f95, since the calls, the compiling flags, and include and use statements are different.

Otherwise it becomes very misleading like what is happening to me. I obviously don't do several calls one after the other as in my example, I do one by one. I just showed the calls I tried by following MKL documentation. Since the direct application of what is in the documentation didn't work, I tried to use different ways of declaring the variables.

Besides, your suggestion didn't work, giving a segmentation fault on the execution. I tried the most simple test, compiling with the flag -mkl only:

include "blas.f90"
program test
use blas95

integer, parameter :: dp = selected_real_kind(15, 307)
real(dp) :: v1(3),v2(3,1),v3(1,3),m1(3,3)

v1 = [1.0_dp,2.0_dp,3.0_dp]
v2(:,1) = v1
v3(1,:) = v1

m1 = 0.0_dp
call dgemm(v1,v1,m1)
write(*,'(3f6.1)')m1(1,:)
write(*,'(3f6.1)')m1(2,:)
write(*,'(3f6.1)')m1(3,:)

end program test

You see, either the f95 in badly implemented, or the way of using it is not that trivial as you think.

mecej4 · ‎02-13-2021

Cayo_G, using the BLAS95 interface enables convenient calls with short argument lists, but the user has the responsibility to follow the language rules regarding optional arguments, etc. When it comes to numerical analysis and mathematics, mathematical correctness overrides so-called user-friendliness.

Prasanth showed you the F77 way of calling GEMM. Here is the F95+ way:

!include "blas.f90"
program test
use blas95

integer, parameter :: dp = selected_real_kind(15, 307)
real(dp) :: v1(3),v2(3,1),v3(1,3),m1(3,3)

v1 = [1.0_dp,2.0_dp,3.0_dp]
v2(:,1) = v1
v3(1,:) = v1

!m1 = 0.0_dp
call gemm(v2,v3,m1)
write(*,'(3f6.1)')m1(1,:)
write(*,'(3f6.1)')m1(2,:)
write(*,'(3f6.1)')m1(3,:)

end program test

Your call,

call dgemm(v1,v1,m1)

does not make any sense at all since v1, being a vector, i.e., a 3 X1 matrix, cannot be multiplied by itself.

PrasanthD_intel · ‎02-12-2021

Hi Cayo,

Could you replace the dgemm call parameters with example given in this documentation (Multiplying Matrices Using dgemm Multiplying Matrices Using dgemm (intel.com) ).

I am attaching the code please try and let me know if this helps:

include "blas.f90"
program test
use blas95

integer, parameter :: dp = selected_real_kind(15, 307)
DOUBLE PRECISION ALPHA, BETA
INTEGER          M, K, N
PARAMETER        (M=3, K=1, N=3)
real(dp) :: v1(3),v2(3,1),v3(1,3),m1(3,3)

v1 = [1.0_dp,2.0_dp,3.0_dp]
v2(:,1) = v1
v3(1,:) = v1

ALPHA = 1.0
BETA = 0.0


m1 = 0.0_dp
CALL DGEMM('N','N',M,N,K,ALPHA,v2,M,v3,K,BETA,m1,M)
!call dgemm(v1,v1,m1)
write(*,'(3f6.1)')m1(1,:)
write(*,'(3f6.1)')m1(2,:)
write(*,'(3f6.1)')m1(3,:)

end program test

Regards

Prasanth

PrasanthD_intel · ‎02-18-2021

Hi Cayo,

The F95+ code snippet gave by mecej4 solves your segmentation problem.

Let us know if your issue is resolved so we can close the thread.

Regards

Prasanth

PrasanthD_intel · ‎02-25-2021

Hi Cayo,

We are forwarding this query to the internal team.

Regards

Prasanth

Gennady_F_Intel · ‎03-05-2021

You may see the f90 blas example into mklroot\examples\blas95\source\dgemmx.f90 and check how to build it by using the makefile. The example shows how to make the GEMM call for double precision.

The topic is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.