Vector-vector product, cblas_sgemm problem

apolo74 · ‎07-21-2011

I have a couple of vectors ( ax=[5, 2, 6, 1, 7] and ay=[8, 1, 7] ). The result of doing ay' * ax should be

[ 40 16 48 8 56

5 2 6 1 7

35 14 42 7 49]

But when running this operation using MKL (CBLAS):

cblas_sgemm( CblasRowMajor, CblasTrans, CblasNoTrans, 3,

5, 5, 1.0f, ay, 3, ax, 5,0.0f, aa, 5 );

being 'aa' the matrix where I'm planning to save the result of size 15. I get ALMOST what I'm supposed to get, since every time I run the code the element in the middle (row 2, col 3, aa[7] = 6) gives noise values, but all the other values are as they should. If I plot 'aa' before calling cblas_sgemm(), then some times I get the right result and some others I get 6.0001 or some others it is aa[8]=1.0001. So it seems that I'm missing some initialization, but according to the examples I don't see any initizalization either. Can someone tell me what I'm doing wrong? Thanks for any help,

Boris

mecej4 · ‎07-21-2011

Since CBlas is merely an interface to the Fortran-77 BLAS routines, it is easy to get some of the parameters wrong, especially to a user unfamiliar with Fortran-77 conventions regarding multidimensional arrays.

That is why there is a cautionary note at the beginning of Appendix D:

"Warning

Users of the CBLAS interface should be aware that the CBLAS are just a C interface to the BLAS, which is based on the FORTRAN standard and subject to the FORTRAN standard restrictions. In particular, the output parameters should not be referenced through more than one argument."

"In the descriptions of CBLAS interfaces, links provided for each function group lead to the descriptions of the respective Fortran-interface BLAS functions."

From the MKL documentation page for ?GEMM, we have

A, B and C are matrices:

op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,

C is an m-by-n matrix.

For your example, op(A)=ay' is 3 X 1, op(B)=ax is 1 X 5. Thus, arguments #4, #5, #6, which are m,n,k, should be 3,5,1, not 3,5,5 as you have.

One could probe the consequences of how the incorrect arguments that you provided caused the mystefying behavior in the results that you observed, but I do not think that it is worthwhile. Your conjecture as to initialization being the cause turns out to be incorrect since, when argument beta = 0, the matrix C need not be initialized, as the documentation clearly states.

The next time you post a similar query, it would make it easier to analyze the reported problem if you gave a complete example that could be compiled and run to reproduce the problem.

[cpp]#include 
#include 

main(){
float ax[]={5, 2, 6, 1, 7}, ay[]={8, 1, 7}; 
float aa[5*3]; 
int i,j;
cblas_sgemm( CblasRowMajor, CblasTrans, CblasNoTrans, 
             3, 5, 1, 
			 1.0f, 
			 ay, 3, ax, 5, 
			 0.0f, 
			 aa, 5 );
for(i=0; i<3; i++){
   for(j=0; j<5; j++)printf("%9.4f ",aa[5*i+j]);
   printf("n");
   }
}   
[/cpp]

TimP · ‎07-22-2011

I'm a little hesitant to rely on the beta=0f. eliminating the need for initialization, although I do this myself. If the uninitialized data patterns happen to include NaN or Inf, you would be depending on the library code explicitly zeroing the data for that case. Also, it's conceivable that the result might depend on whether gradual underflow is enabled (no-ftz), as unitialized data typically includes de-normals.

Murat_G_Intel · ‎07-22-2011

Tim, I understand your hesitation. However, when the beta=0.0, the library code should ignore the values of the C matrix. In other words, the values of C Matrix are overwritten. Therefore, as long as you set beta=0.0, Nan, Inf or subnormals inC matrix should not change the results.

mecej4 · ‎07-22-2011

In other words, the code would do something similar to

if(beta != 0.0){
for(i...) for(j...)
C += ...
}
else{
for(i...) for(j...)
C = ...
}

I have seen code with this structure in the Fortran-77 sources of BLAS and Lapack.

Murat_G_Intel · ‎07-22-2011

Precisely, mecej4. The code treats beta == 0.0 as a special case and does C = alpha*op(A)*op(B). It DOES NOT explicity multiply the entries of C with 0.0.

This behaviour is described in MKL Reference Manual, under the entry for C matrix.All Level3 BLAS should havethe same behaviour except from TRSM and TRMM. There is no C matrix for TRSM and TRMM.

apolo74 · ‎07-25-2011

Dear mecej4 and all, thanks for your help and sorry for bothering you with this simple error (for you at least)... in my defense I started using MKL a couple of days ago and I'm not that familiar with the logic and documentation. Thanks again for the explanation and next time I'll build some simple code of my problem.

Boris