Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- Waht is wrong with cblas_dgemv

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

pawan_k_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
10:09 AM

54 Views

Hello,

I am interested in the following operation:

Y = A^T * X

using the cblas_dgemv or cblas_dgemm routine in MKL. In the code I use small letters: y, a, and x.

I have a test example as follows:

#include <cstdio>

#include <cstdlib>

#include "mkl.h"

void main()

{

double a[6] = {2., 3., 1., 1., 0., 2. }; // 3 x 2 matrix

double x[3] = {1., 1., 1.};

double y[2] = {0., 0.};

int m = 3;

int n = 2;

double alpha = 1.0;

double beta = 1.0;

int incx = 1;

int incy = 1;

int i,j;

for (i=0; i<m; i++){

for (j=0; j<n; j++)

printf("%g ", a[i*n + j]);

printf("\n");

}

cblas_dgemv(CblasRowMajor, CblasTrans, m, n, alpha, a, m, x, incx, beta, y, incy );

for (i=0; i<n; i++)

printf("%g\n", y*);*

}

After running the code, I get wrong results. Can someone point out the mistake?

Regards,

Pawan

Accepted Solutions

Highlighted

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
01:03 PM

54 Views

The seventh argument to cblas_dgemv should be n. The convention is that a matrix is passed by specifying four attributes:

- the address of the base, i.e., the address of the (1,1) element in Fortran or the [0][0] element in C/C++,
- the number of rows, m
- the number of columns, n
- the stride from one row to the next, if using row-major order as in C/C++, or the stride from one column to the next, if using column-major order as in Fortran.

You are passing a 3 X 2 matrix using row-major order; therefore, the stride from each row to the next is 2.

There is some avoidable confusion in the MKL documentation; this is caused by using Fortran-centric terms such as "leading dimension", which a C/C++ programmer may take to be something different.

Highlighted

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
10:25 PM

54 Views

Hi Pawan,

Thank you a lot for your comments about dgemv and dgemm.

Just add some

based on the documentation of dgemm: C_{mxn=}op(A)_{mxk}*op(B)_{kxn}

m INTEGER. Specifies the number of rows of the matrix **op(A) **and of the matrix C. The value of m must be at least zero

k, INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

Where the documetnation of dgemv, the m. n is from A and LDA is A.

m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero.

n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero.

So you are exctaly right, that for dgemm, we are required to put number of rows and number of colums of A^T when transA=T.

But **LDA is the stride of A both the dgemv and dgemm, whatever transA= N or T.**

I guess, you may have tried

1. cblas_dgemm(CblasRowMajor, CblasTrans, CblasNoTrans, 2, 1, 3, 1.0, a, 2, x, 1, 0, y, 1); get y=(6, 3)

2. cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 1, 3, 1.0, a, 3, x, 1, 0, y, 1); get y=(3,6)

thus happen to get the conclustion of LDA=A.nrows when trans='N'. Actually, it is not correct because the A are different under the two cases. for example, if you change the original A = {2., 3., **5.,** 1., 0., 2. }, you will see the difference more obviously.

2 3 )T 1 m=3, n=2 2 3 5 1 m=2, n=3

5 1 * 1 = 7 1 0 2 * 1 = 10

(0 2 * 1 6 1 3

Hope it can lighten some confustion.

Best Regards,

Ying

13 Replies

Highlighted

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
01:03 PM

55 Views

The seventh argument to cblas_dgemv should be n. The convention is that a matrix is passed by specifying four attributes:

- the address of the base, i.e., the address of the (1,1) element in Fortran or the [0][0] element in C/C++,
- the number of rows, m
- the number of columns, n
- the stride from one row to the next, if using row-major order as in C/C++, or the stride from one column to the next, if using column-major order as in Fortran.

You are passing a 3 X 2 matrix using row-major order; therefore, the stride from each row to the next is 2.

There is some avoidable confusion in the MKL documentation; this is caused by using Fortran-centric terms such as "leading dimension", which a C/C++ programmer may take to be something different.

Highlighted
Thanks. It works now.
This was tricky! I will apply this trick also to the dgemm routine.
regards,
Pawan

pawan_k_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
03:08 PM

54 Views

Highlighted

pawan_k_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
04:23 PM

54 Views

For dgemm, it was confusing as well. I put an working

example below for someone in need.

Goal: C := alpha A^T * B + beta * C, with alpha = 1; beta = 0;

Use the following:

cblas_dgemm(CblasRowMajor, CblasTrans, CblasNoTrans, A.ncols(), B.ncols(), A.nrows(),

alpha, A.vec, A.ncols(), B.vec, B.ncols(), beta, C.vec, B.ncols())

Here I take:

LDA: stride for A = A.ncols(); // it would have been A.nrows(), if transA = 'N'

NOTE: Notice that this is a bit confusing when compared to dgemv, because for dgemm, the number of rows and number of columns of A^T are input. By this logic, one may take LDA to be A.nrows() above, because the rows of A^T have strides of A.nrows(). In other words, although for dgemm, we are required to put number of rows and number of columns of A^T, for LDA we must consider strides for A not for A^T.

LDB: stride for B = B.ncols();

LDC: stride for C = B.ncols();

Regards,

Pawan

Highlighted

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2014
10:25 PM

55 Views

Hi Pawan,

Thank you a lot for your comments about dgemv and dgemm.

Just add some

based on the documentation of dgemm: C_{mxn=}op(A)_{mxk}*op(B)_{kxn}

m INTEGER. Specifies the number of rows of the matrix **op(A) **and of the matrix C. The value of m must be at least zero

k, INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

Where the documetnation of dgemv, the m. n is from A and LDA is A.

m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero.

n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero.

So you are exctaly right, that for dgemm, we are required to put number of rows and number of colums of A^T when transA=T.

But **LDA is the stride of A both the dgemv and dgemm, whatever transA= N or T.**

I guess, you may have tried

1. cblas_dgemm(CblasRowMajor, CblasTrans, CblasNoTrans, 2, 1, 3, 1.0, a, 2, x, 1, 0, y, 1); get y=(6, 3)

2. cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 1, 3, 1.0, a, 3, x, 1, 0, y, 1); get y=(3,6)

thus happen to get the conclustion of LDA=A.nrows when trans='N'. Actually, it is not correct because the A are different under the two cases. for example, if you change the original A = {2., 3., **5.,** 1., 0., 2. }, you will see the difference more obviously.

2 3 )T 1 m=3, n=2 2 3 5 1 m=2, n=3

5 1 * 1 = 7 1 0 2 * 1 = 10

(0 2 * 1 6 1 3

Hope it can lighten some confustion.

Best Regards,

Ying

Highlighted
Thanks Ying for a correction. I quote your correction:
"But LDA is the stride of A both the dgemv and dgemm, whatever transA= N or T. "
Regards,
Pawan

pawan_k_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-07-2014
01:19 AM

54 Views

Highlighted

Anas_A_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2014
07:53 AM

54 Views

Hello!

I have a problem with the function dgemv so I thought to post it here. If I should open a new topic please let me know to fix it.

Let A1, A2, A3 be 4x4 matrices.

A1 = 5 6 7 8 , A2 = 21 22 23 24 , A3 = 37 38 39 40

9 10 11 12 25 26 27 28 41 42 43 44

13 14 15 16 29 30 31 32 45 46 47 48

17 18 19 20 33 34 35 36 49 50 51 52

I defined in this way in my code

double A[48]; double y[4]; for( int i=0; i<48; i++) {A= i+5;}.

I want to multiply the matrix A1 with the vector

double pi0 = {0.1, 0.2, 0.3, 0.4};

so I did this

cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, &A[0:15], 4, pi0, 1, 0.0, y, 1);

The result I got is {21, 25, 29,31} , although the true result is {7, 11 , 15 ,19};

One solution that worked was to copy the first 15 values of the matrix A to another matrix B, then when I did

int B[16]; for(int i=0; i<16; i++) {B=A; } cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, B, 4, pi0, 1, 0.0, y, 1);

it worked.

The problem is that I don't understand what went wrong in the first way. Also if I first copy the matrix I think that this will require more time. Please, help me.

Thank you.

Highlighted

pawan_k_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2014
12:54 PM

54 Views

Hi

Replace &A[0:15] by A.

Does it work now?

cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, A, 4, pi0, 1, 0.0, y, 1);

Regards,

Pawan

Highlighted

Anas_A_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2014
03:36 PM

54 Views

Hi! Yes, now it works. Could I ask you something? How do we find the lda? I have understood that if we have a matrix mxn and we save it row-by-row then the lda is equal to the number of columns. But, how do we decide in a situation like here? I mean when we have 3 matrices which we have defined firstly the first matrix row-by-row, then the second matrix row-by-row and etc.

If we want to compute the product of the matrix A2 (or A3) with the vector pi0 how do we select the lda?

In general how do we decide the lda in we have P matrices (not 3). (I have a problem like this but with P matrices, where P =0,1,2,3,...)

If you know a book, notes , anything that could help me with that please tell me.

Thanks for helping me!

Highlighted

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2014
06:04 PM

54 Views

Hi Anas,

Right, lda is generally equal to the number of columns when A is mxn.

About the complex situation like A[48], it is a 1-D array in C. and

A1={A[0], A[15]};

A2={A[16], A[31]};

A3={A[32], .. A[47];

As you see, the A1, A2, A3, can be accessed by regular patten as a 1-D C-array. The adress of A2[mxn] is &A[16]. and it's mxn elements are continous elements in A[16] .. A[31], so lda is still n. you can just move the first pointer to access A2, A3. for example,

if A2 * pio, call cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, &A[16], 4, pi0, 1, 0.0, y, 1); or A+16

if A3* pio, call cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 4, 1.0, A+32 , pi0, 1, 0.0, y, 1); or &A[32]

The key point is how you access the array in C array. if only the array can be accessed by some regular pattern as a C array, then you can decide the input parameter of A.

Best Regards,

Ying

P.S. Please ignore the syntax I mentioned last time about Array notation like A[0:15]. It is special syntax only for Array notation, can't be taken as a C pointer to pass to MKL funtion.

Highlighted
I had serious problems with my pc and I just saw the answer. Ying, you are great! I have no words to express my appreciation. Thank you.

Anas_A_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-09-2014
11:07 AM

54 Views

Highlighted

Eman_H_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-13-2016
06:08 AM

54 Views

can anyone please explain what does each of the parameters incX and incY indicate ?

Highlighted

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-13-2016
06:28 AM

54 Views

Eman H. wrote:

can anyone please explain what does each of the parameters incX and incY indicate ?

These are described in the MKL documentation, as well at http://www.netlib.org/lapack/explore-html/dc/da8/dgemv_8f.html .

The ?gemv routines compute α A x + β y or α A^{T} x + β y, where A is a matrix and x, y are vectors. Sometimes, we wish to extract a row from a matrix and regard that as a vector. In Fortran, storage is by columns, so the distance between successive row elements to be extracted is not equal to 1, as in a column vector of the same matrix, but is equal to the number of columns in the matrix. Similar considerations apply to the destination of the computed vector.

Highlighted

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-13-2016
07:53 PM

54 Views

Hi Eman,

In simple word, the incX is the distance of each element (who will involved the compute A*X) or the stride of X[1] and X[2]. In most of case, the vector(involved the compute) is continuous, it is 1.

In some times, if the element (will involve the compute) is not continuous, but can be **accessed by regular way**, for example, X[1], X[3], X[5], then the incX = 2.

Best Regards,

Ying

For more complete information about compiler optimizations, see our Optimization Notice.