Re: Problems with ifort: GEMM matrix multiplication in BLAS

kevinkarin · ‎04-05-2009

Hi all,

In the subroutine sgemm in BLAS,

sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

What is ``transa'' and ``transb''?

It says

transa CHARACTER*1. Specifies the form of op(A) used in the matrix
multiplication:

if transa = 'N' or 'n', then op(A) = A;

if transa = 'T' or 't', then op(A) = A';

if transa = 'C' or 'c', then op(A) = conjg(A').

I didn't understand this.

I wrote a program to test sgemm:

[cpp]c matrix multiplication c=a*b
c using sgemm(transa, transb, m, n, k,
c           alpha, a, lda, b, ldb, beta, c, ldc)

      program bar
      real a(3,2),b(2,3),c(3,3)

      data a / 1,2,3,4,5,6 /
      data b / 7,8,9,10,11,12 /

      write(*,*) a(1,1), a(1,2)
      write(*,*) a(2,1), a(2,2)
      write(*,*) a(3,1), a(3,2)
      write(*,*) '-------------------------------------'
      write(*,*) b(1,1), b(1,2), b(1,3)
      write(*,*) b(2,1), b(2,2), b(2,3)
      write(*,*) '-------------------------------------'

      call sgemm('N','N',3,3,2,1,a,3,b,2,1,c,3)

      write(*,*) c(1,1), c(1,2), c(1,3)
      write(*,*) c(2,1), c(2,2), c(2,3)
      write(*,*) c(3,1), c(3,2), c(3,3)

      stop
      end
[/cpp]

and the result is weird, c!=a*b at all, instead, c=0:

$ ifort data.f -lmkl -lguide -liomp5 -lpthread

$ ./a.out
1.000000 4.000000
2.000000 5.000000
3.000000 6.000000
-------------------------------------
7.000000 9.000000 11.00000
8.000000 10.00000 12.00000
-------------------------------------
0.0000000E+00 0.0000000E+00 0.0000000E+00
0.0000000E+00 0.0000000E+00 0.0000000E+00
0.0000000E+00 0.0000000E+00 0.0000000E+00

I am quite sure it's because of my misuse of gemm, but I don't know where my
mistake is.

Any suggestion?

Thanks in advance.

Tabrez_Ali · ‎04-05-2009

program main
!use mkl95_blas
real :: a(3,2), b(2,3), c(3,3)

a=reshape((/ 1.00, 2.00, 3.00, 4.00, 5.00, 6.00 /), shape=(/3,2/))
b=reshape((/ 7.00, 8.00, 9.00, 10.0, 11.0, 12.0 /), shape=(/2,3/))

!call gemm(a,b,c)
call sgemm('N','N',3,3,2,1.0,a,3,b,2,1.0,c,3)
print'(3F10.3)', c
end

stali@dell-4300:~$ ./a.out
39.000 54.000 69.000
49.000 68.000 87.000
59.000 82.000 105.000

kevinkarin · ‎04-05-2009

Quoting - Tabrez Ali

call sgemm('N','N',3,3,2,1.0,a,3,b,2,1.0,c,3)

Thanks!

I changed ``1'' on the alpha and beta position of sgemm to ``1.0'' and it works immediately.

This is so weird. Shouldn't Fortran realize automatically that my ``1'' is a real ``1.0''?

TimP · ‎04-05-2009

Quoting - kevinkarin

I changed ``1'' on the alpha and beta position of sgemm to ``1.0'' and it works immediately.

This is so weird. Shouldn't Fortran realize automatically that my ``1'' is a real ``1.0''?

No, in the absence of an interface block, such as you would get from the MKL include file, Fortran has to assume that 1 corresponds with a default integer. With the interface block, it would throw an error, but at least you know then where the problem is. If you write in the style of 20 years ago, you take on the corresponding responsibility.

kevinkarin · ‎04-06-2009

Quoting - tim18

No, in the absence of an interface block, such as you would get from the MKL include file, Fortran has to assume that 1 corresponds with a default integer. With the interface block, it would throw an error, but at least you know then where the problem is. If you write in the style of 20 years ago, you take on the corresponding responsibility.

I'm sorry. I've learned Fortran for only 3 days.

I am working on a course project dealing with scientific computing on GPUs. I am not familiar with matrix computation at all, so I bought a very old book called ``Matrix Computation'' by Gene Golub. Then I found out that the time is so limited, so I skip that book and tried to use BLAS and LAPACK.

Currently, my program only achieved 5Gflop/s on an ATI GPU which has 2Tflop/s in theory. I will fail the course if I cannot achieve a decent result in two weeks. =( I have to find out what is the problem with the GEMM running on GPUs.

The only book I have about Fortran and LAPCK is ``LINPACK Users' Guide'' by J.J Dongarra,Published by SIAM, 1979. That's why I am writing Fortran in the old style. Is there any better book about Fortran and algorithms in LAPACK? Or any material that maybe helpful

Thanks.

TimP · ‎04-06-2009

Surely, for a result to be expected in 2 weeks, your course material must give some guidance. If you are using a canned BLAS, you take what you get, and maybe do some preliminary research into why it doesn't match the advertisements. Those references are excellent, but the subject you raise is far off topic from the references and this forum.

Tabrez_Ali · ‎04-06-2009

Quoting - kevinkarin

Currently, my program only achieved 5Gflop/s on an ATI GPU which has 2Tflop/s in theory. I will fail the course if I cannot achieve a decent result in two weeks. =( I have to find out what is the problem with the GEMM running on GPUs.

But then why are you using MKL? Shouldnt you be using ACML-GPU.

http://developer.amd.com/gpu/acmlgpu/Pages/default.aspx

Or did I miss something?

kevinkarin · ‎04-08-2009

Quoting - Tabrez Ali

But then why are you using MKL? Shouldnt you be using ACML-GPU.

http://developer.amd.com/gpu/acmlgpu/Pages/default.aspx

Or did I miss something?

Yes, thanks for the URL. I was learning fortran on my laptop with mkl so I used mkl instead to get familiar with LAPACK and BLAS.

kevinkarin · ‎04-08-2009

Quoting - tim18

Surely, for a result to be expected in 2 weeks, your course material must give some guidance. If you are using a canned BLAS, you take what you get, and maybe do some preliminary research into why it doesn't match the advertisements. Those references are excellent, but the subject you raise is far off topic from the references and this forum.

Thanks for the information. This project is expected to be implemented independently by oneself from scratch. My only guidance is Google. Anyway I will try my best.