Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Waht is wrong with cblas_dgemv

pawan_k_
Beginner
1,829 Views

Hello,

   I am interested in the following operation:

Y = A^T * X

using the cblas_dgemv or cblas_dgemm routine in MKL. In the code I use small letters: y, a, and x.

I have a test example as follows:

#include <cstdio>
#include <cstdlib>
#include "mkl.h"

void main()
{

 double a[6] = {2., 3., 1., 1., 0., 2. }; // 3 x 2 matrix
  double x[3] = {1., 1., 1.};
  double y[2] = {0., 0.};
  int m = 3;
  int n = 2;
  double alpha = 1.0;
  double beta = 1.0;
  int incx = 1;
  int incy = 1;
  int i,j;

  for (i=0; i<m; i++){
    for (j=0; j<n; j++)
      printf("%g ", a[i*n + j]);
    printf("\n");
  }

  cblas_dgemv(CblasRowMajor, CblasTrans, m, n, alpha, a, m, x, incx, beta, y, incy );

  for (i=0; i<n; i++)
    printf("%g\n", y);

}

After running the code, I get wrong results. Can someone point out the mistake?

Regards,

Pawan

0 Kudos
2 Solutions
mecej4
Honored Contributor III
1,829 Views

The seventh argument to cblas_dgemv should be n. The convention is that a matrix is passed by specifying four attributes:

  •  the address of the base, i.e., the address of the (1,1) element in Fortran or the [0][0] element in C/C++, 
  • the number of rows, m
  • the number of columns, n
  • the stride from one row to the next, if using row-major order as in C/C++, or the stride from one column to the next, if using column-major order as in Fortran.

You are passing a 3 X 2 matrix using row-major order; therefore, the stride from each row to the next is 2.

There is some avoidable confusion in the MKL documentation; this is caused by using Fortran-centric terms such as "leading dimension", which a C/C++ programmer may take to be something different.

View solution in original post

0 Kudos
Ying_H_Intel
Employee
1,829 Views

Hi Pawan,

Thank you a lot for your comments about dgemv and dgemm.

Just add some

based on the documentation of dgemm: Cmxn=op(A)mxk*op(B)kxn

m INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero

k, INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

Where the documetnation of dgemv, the m. n is from A and LDA is  A.

m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero.
n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero.

So you are exctaly right, that for dgemm, we are required to put number of rows and number of colums of A^T when transA=T.

But LDA is  the stride of A both the dgemv and dgemm, whatever transA= N or T.

I guess, you may have tried

1. cblas_dgemm(CblasRowMajor, CblasTrans, CblasNoTrans, 2, 1, 3,  1.0, a, 2, x, 1, 0, y, 1);   get y=(6, 3)

2. cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 1, 3,  1.0, a, 3, x, 1, 0, y, 1); get y=(3,6)

thus happen to get the conclustion of  LDA=A.nrows when trans='N'.  Actually, it is not correct because the A are different under the two cases.  for example, if you change the original A =  {2., 3., 5., 1., 0., 2. }, you will see the difference more obviously.

2 3 )T        1      m=3, n=2                                     2   3    5               1        m=2, n=3

5 1     *     1   =   7                                               1   0     2   *          1   =     10

(0 2    *      1        6                                                                          1            3

Hope it can lighten some confustion.

Best Regards,

Ying

 

 

 

View solution in original post

0 Kudos
13 Replies
mecej4
Honored Contributor III
1,830 Views

The seventh argument to cblas_dgemv should be n. The convention is that a matrix is passed by specifying four attributes:

  •  the address of the base, i.e., the address of the (1,1) element in Fortran or the [0][0] element in C/C++, 
  • the number of rows, m
  • the number of columns, n
  • the stride from one row to the next, if using row-major order as in C/C++, or the stride from one column to the next, if using column-major order as in Fortran.

You are passing a 3 X 2 matrix using row-major order; therefore, the stride from each row to the next is 2.

There is some avoidable confusion in the MKL documentation; this is caused by using Fortran-centric terms such as "leading dimension", which a C/C++ programmer may take to be something different.

0 Kudos
pawan_k_
Beginner
1,829 Views
Thanks. It works now. This was tricky! I will apply this trick also to the dgemm routine. regards, Pawan
0 Kudos
pawan_k_
Beginner
1,829 Views

For dgemm, it was confusing as well. I put an working

example below for someone in need.

Goal: C := alpha A^T * B + beta * C, with alpha = 1; beta = 0;

Use the following:

 cblas_dgemm(CblasRowMajor, CblasTrans, CblasNoTrans, A.ncols(), B.ncols(), A.nrows(),

                       alpha, A.vec, A.ncols(), B.vec, B.ncols(), beta, C.vec, B.ncols())

  Here I take:

     LDA: stride for A = A.ncols();  // it would have been A.nrows(), if transA = 'N'

NOTE: Notice that this is a bit confusing when compared to dgemv, because for dgemm, the number of rows and number of columns of A^T are input. By this logic, one may take LDA to be A.nrows() above, because the rows of A^T have strides of A.nrows(). In other words, although for dgemm, we are required to put number of rows and number of columns of A^T, for LDA we must consider strides for A not for A^T.

     LDB: stride for B = B.ncols();

     LDC: stride for C = B.ncols();

Regards,

Pawan

 

 

0 Kudos
Ying_H_Intel
Employee
1,830 Views

Hi Pawan,

Thank you a lot for your comments about dgemv and dgemm.

Just add some

based on the documentation of dgemm: Cmxn=op(A)mxk*op(B)kxn

m INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero

k, INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

Where the documetnation of dgemv, the m. n is from A and LDA is  A.

m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero.
n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero.

So you are exctaly right, that for dgemm, we are required to put number of rows and number of colums of A^T when transA=T.

But LDA is  the stride of A both the dgemv and dgemm, whatever transA= N or T.

I guess, you may have tried

1. cblas_dgemm(CblasRowMajor, CblasTrans, CblasNoTrans, 2, 1, 3,  1.0, a, 2, x, 1, 0, y, 1);   get y=(6, 3)

2. cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 1, 3,  1.0, a, 3, x, 1, 0, y, 1); get y=(3,6)

thus happen to get the conclustion of  LDA=A.nrows when trans='N'.  Actually, it is not correct because the A are different under the two cases.  for example, if you change the original A =  {2., 3., 5., 1., 0., 2. }, you will see the difference more obviously.

2 3 )T        1      m=3, n=2                                     2   3    5               1        m=2, n=3

5 1     *     1   =   7                                               1   0     2   *          1   =     10

(0 2    *      1        6                                                                          1            3

Hope it can lighten some confustion.

Best Regards,

Ying

 

 

 

0 Kudos
pawan_k_
Beginner
1,829 Views
Thanks Ying for a correction. I quote your correction: "But LDA is the stride of A both the dgemv and dgemm, whatever transA= N or T. " Regards, Pawan
0 Kudos
Anas_A_
Beginner
1,829 Views

Hello!

I have a problem with the function dgemv so I thought to post it here. If I should open a new topic please let me know to fix it.

 

Let A1, A2, A3 be 4x4 matrices.

A1 = 5  6  7  8     , A2 = 21 22 23 24  , A3 = 37 38 39 40

       9 10 11 12              25 26 27 28            41 42 43 44

      13 14 15 16             29 30 31 32            45 46 47 48

      17 18 19 20             33 34 35 36             49 50 51 52

I defined in this way in my code

double A[48];

double y[4];

for( int i=0; i<48; i++) {A = i+5;}.

 

I want to multiply the matrix A1 with the vector

 

double pi0 = {0.1, 0.2, 0.3, 0.4};

so I did this

     

		 cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, &A[0:15], 4, pi0, 1, 0.0, y, 1);

The result I got is {21, 25, 29,31} , although the true result is {7, 11 , 15 ,19};

One solution that worked was to copy the first 15 values of the matrix A to another matrix B, then when I did
 

int B[16];

        for(int i=0; i<16; i++)
        {B =A;
        }
cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, B, 4, pi0, 1, 0.0, y, 1);

it worked.

 

The problem is that I don't understand what went wrong in the first way. Also if I first copy the matrix I think that this will require more time. Please, help me.

Thank you.

0 Kudos
pawan_k_
Beginner
1,829 Views

Hi

Replace &A[0:15] by A.

 Does it work now?

  cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, A, 4, pi0, 1, 0.0, y, 1);

Regards,

Pawan

0 Kudos
Anas_A_
Beginner
1,829 Views

Hi! Yes, now it works. Could I ask you something? How do we find the lda? I have understood that if we have a matrix mxn and we save it row-by-row then the lda is equal to the number of columns. But, how do we decide in a situation like here? I mean when we have 3 matrices which we have defined firstly the first matrix row-by-row, then the second matrix row-by-row and etc. 

If  we want to compute the product of the matrix A2 (or A3) with the vector pi0 how do we select the lda?

In general how do we decide the lda  in we have P matrices (not 3). (I have a problem like this but with P matrices, where P =0,1,2,3,...)

If you know a book, notes , anything that could help me with that please tell me.

 

Thanks for helping me!

0 Kudos
Ying_H_Intel
Employee
1,829 Views

Hi Anas,

Right, lda is generally equal to the number of columns when A is mxn.

About the complex situation like A[48], it is a 1-D array in C. and

A1={A[0],  A[15]};

A2={A[16],  A[31]};

A3={A[32], .. A[47];

As you see, the A1, A2, A3, can be accessed by regular patten as a 1-D C-array.  The adress of A2[mxn] is &A[16]. and it's mxn elements are continous elements in A[16] .. A[31], so lda is still n. you can just move the first pointer to access A2, A3. for example,   

if  A2 * pio,  call cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, &A[16], 4, pi0, 1, 0.0, y, 1);  or A+16

if A3* pio,   call cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, A+32 , pi0, 1, 0.0, y, 1);   or &A[32]

The key point is how you access the array in C array. if only the array can be accessed by some regular pattern as a C array, then you can decide the input parameter of A.

Best Regards,

Ying

P.S. Please ignore the syntax I mentioned last time about Array notation like A[0:15]. It is special syntax only for Array notation, can't be taken as a C pointer to pass to MKL funtion.

0 Kudos
Anas_A_
Beginner
1,829 Views

I had serious problems with my pc and I just saw the answer. Ying, you are great! I have no words to express my appreciation. Thank you.

0 Kudos
Eman_H_
Beginner
1,829 Views

can anyone please explain what does each of the parameters incX and incY indicate ?

0 Kudos
mecej4
Honored Contributor III
1,829 Views

Eman H. wrote:

can anyone please explain what does each of the parameters incX and incY indicate ?

These are described in the MKL documentation, as well at http://www.netlib.org/lapack/explore-html/dc/da8/dgemv_8f.html .

The ?gemv routines compute α A x + β y or α AT x + β y, where A is a matrix and x, y are vectors. Sometimes, we wish to extract a row from a matrix and regard that as a vector. In Fortran, storage is by columns, so the distance between successive row elements to be extracted is not equal to 1, as in a column vector of the same matrix, but is equal to the number of columns in the matrix. Similar considerations apply to the destination of the computed vector.

0 Kudos
Ying_H_Intel
Employee
1,829 Views

Hi Eman,

In simple word, the incX is the distance of  each  element  (who will involved the compute A*X)  or the stride of X[1] and X[2].     In most of case, the vector(involved the compute) is continuous,  it is 1. 

In some times, if the element (will involve the compute) is not continuous, but can be accessed by regular way, for example,  X[1], X[3], X[5], then the incX = 2.  

Best Regards,
Ying

0 Kudos
Reply