cblas_zgemm problem

andragon · ‎08-23-2010

First of all hello to everyone. I am having some problems with the cblas_zgemm function. I am trying to multiply two matrices, and all the input parameters are correct and in the right order. For some reason, i dont know why if the matrices are smaller than 15x15 the calculations are correct, and if they are bigger than that the 15-th column is filled only with zeros, and those are not the correct values, because i checked them in matlab. All other values are correct, only the 15-th column is wrong. I am using mkl 8.0.1. This gives me great trouble because the error than stackes, because there are more multiplications further. I would appreciate any help.

Just ot add that no matter what the matrix size iz, if it is greater than 15x15 the 15-th column gives bad results, then the 30-th column... The arguments ae as follows:

cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasTrans, N, N, N1, α, A, N1, A, N1, β, C, N) where

alpha=(1,0) and beta=(0,0). Thans in advance.

Gennady_F_Intel · ‎08-23-2010

Actually this version is no longer support.

Please check this probelm with the latest 10.3 beta and let us know if the probelm is still there.

Here is the linkwhichinvites you to participate in the Intel MKL 10.3 Beta program.

--Gennady

andragon · ‎08-24-2010

Ok, will try. I would realy appreciate if anyone would go to the trouble to repeat my results, even with a different version of the mkl library. I have a relatively old computer, a 32 bit architecture, so i dont know if the new beta is realy for me. I made an experiment, and an arbitrary complex matrix works fine, but a mode matching matrix i am working with gives those fanthom zeros that drive me crazy. I mean there should be zeros in the matrix but not in those columns, or at least not in most of it. Here is the code, i forgot to copy the header files with the declarations but you can get a good picture from the code. Just call the function and see in the error . txt what you get. Any and all sugestions are welcome. Thank you all in advance.

#include
#include
#include

extern "C" {
#include"mkl_cblas.h"
#include"mkl_lapack.h"
}

using namespace std;

void Alfa::hstep(char* file, double a, double c, double x1, const int N, double f1, double f2, int k) {
const double c1 = 300000000;//299792458
const double PI = 3.141592653589793;
f1*=1000000000;
f2*=1000000000;
double st = (f2-f1)/k;
const int N1 = ((c/a*N)<(floor(c/a*N)+0.5)) ? (int)floor(c/a*N) : (int)ceil(c/a*N);

FILE *fs11 = fopen("Step.txt", "w");
FILE *fs12 = fopen("error.txt", "w");

complex *Bna, *Bnc;
Bna = new complex;
Bnc = new complex[N1];
MKL_Complex16 *Le = new MKL_Complex16[N*N1];
MKL_Complex16 *L = new MKL_Complex16[N*N];
MKL_Complex16 *Li = new MKL_Complex16[N*N], *Ld = new MKL_Complex16[N*N];
int *NLi = new int(N), *MLi = new int(N), *ldaLi = new int(N), *ipivLi = new int, *infoLi = new int(0);
int lworkLi = -1;
MKL_Complex16 *s11 = new MKL_Complex16[N*N];
MKL_Complex16 *s12 = new MKL_Complex16[N*N1];
MKL_Complex16 *s21 = new MKL_Complex16[N1*N];
MKL_Complex16 *s22 = new MKL_Complex16[N1*N1];

MKL_Complex16 *neka = new MKL_Complex16[N*N];
MKL_Complex16 *neka1 = new MKL_Complex16[N*N];

//

MKL_Complex16 *dummywork=new MKL_Complex16[1];
zgetri(NLi, Li, ldaLi, ipivLi, dummywork, &lworkLi, infoLi);
lworkLi=(int)(dummywork[0].real);
MKL_Complex16 *work=new MKL_Complex16[lworkLi];

//

for (double temp=f1; temp<=f2; temp+=st) {
double k0=2*PI*temp/c1;

for (int i=0; i = conj(sqrt(complex(k0*k0-((i+1)*PI/a)*((i+1)*PI/a),0)));

for (int i=0; i = conj(sqrt(complex(k0*k0-((i+1)*PI/c)*((i+1)*PI/c),0)));

for (int i=0; i for (int j=0; j complex prvi(c*a*Bnc.real(), c*a*Bnc.imag());
complex ctemp = sqrt(Bna/prvi);
double itemp = int1(a,c,i,j,x1);
complex ftemp(2*itemp*ctemp.real(), 2*itemp*ctemp.imag());
Le[i*N1+j].real = ftemp.real();
Le[i*N1+j].imag = ftemp.imag();
}

complex alpha(1,0), beta(0,0);
cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasTrans, N, N, N1, α, Le, N1, Le, N1, β, L, N); <<<<<------------------------------------This is where the problem appeares

for (int i=0; i fprintf(fs12, "\n");
for (int j=0; j fprintf(fs12, "%lf\t %1d %.18lf %.18lf\n", temp/1000000000, i*N+j, L[i*N+j].real, L[i*N+j].imag);
}
}

////////////////////////////

int1.cpp

///////////////////////////

#include
using namespace std;

double Alfa::int1(double a, double c, int n, int m, double x1) {
const double PI = 3.141592653589793;
double ret = 0;
n++;
m++;
if (n*c!=a*m) {
double e = a*c*sin(PI*n*(c+x1)/a-PI*m)/(2*PI*(c*n-a*m));
double e1 = a*c*sin(PI*n*(c+x1)/a+PI*m)/(2*PI*(c*n+a*m));
double e2 = a*a*c*m*sin(PI*n*x1/a)/(PI*(a*m+c*n)*(a*m-c*n));
ret = e-e1+e2;
} else {

ret = c/2*cos(m*PI*x1/c);
}

return ret;
}

Gennady_F_Intel · ‎08-24-2010

don't doubt - the latest version supports all IA-32 Architecture systems generally compatible with the Intel Pentium processors, (for example, Intel Pentium 4 processor or Intel Xeon processor), or processors from other manufacturers supporting the same instruction set and running a 32-bit operating system.

--Gennady

andragon · ‎08-25-2010

I've tried the same problem in mkl 10.2.2.025, the result was the same. Sorry about the bad news. Maybe i did something wrong in the definitions, or some parameter was wrong, or something else, but the fanthom zeros are there. If anybody is willing to repeat my results i would be gratefull. Any other idea is welcome. Thanks in advance.

mecej4 · ‎08-25-2010

Please give a specific example where the claimed error occurs, and state which compiler you used.

Please give source code that is complete, so that someone interested can compile and run the program to reproduce the errors that you saw. The code that you have given is incomplete (no main(), missing definition of Alfa), and you have not specified the values of variables (a, c, n, m, etc.) that are needed to enable running the program.

andragon · ‎08-25-2010

Will do. Im working in visual studio 2005. I will put all of the components, and the values that i used are the ones in the main program.

///////////////////////////

int1.h

//////////////////////////

#ifndef _int1_h_
#define _int1_h_

namespace Alfa {
double int1(double a, double c, int n, int m, double x1);
}

#endif

///////////////////////////

hstep.h

//////////////////////////

#ifndef _hstep_h_
#define _hstep_h_

namespace Alfa {
void hstep(char* file, double a, double c, double x1, int N, double f1, double f2, int k);
}

#endif

///////////////////////////

int1.cpp

//////////////////////////

#include "int1.h"
#include
using namespace std;

double Alfa::int1(double a, double c, int n, int m, double x1) {
const double PI = 3.141592653589793;
double ret = 0;
n++;
m++;
if (n*c!=a*m) {
double e = a*c*sin(PI*n*(c+x1)/a-PI*m)/(2*PI*(c*n-a*m));
double e1 = a*c*sin(PI*n*(c+x1)/a+PI*m)/(2*PI*(c*n+a*m));
double e2 = a*a*c*m*sin(PI*n*x1/a)/(PI*(a*m+c*n)*(a*m-c*n));
ret = e-e1+e2;
} else {

ret = c/2*cos(m*PI*x1/c);
}

return ret;
}

///////////////////////////

hstep.cpp

//////////////////////////

#include "hstep.h"
#include "int1.h"
#include
#include
#include

extern "C" {
#include"mkl_cblas.h"
#include"mkl_lapack.h"
}

using namespace std;

void Alfa::hstep(char* file, double a, double c, double x1, const int N, double f1, double f2, int k) {
const double c1 = 300000000;//299792458
const double PI = 3.141592653589793;
f1*=1000000000;
f2*=1000000000;
double st = (f2-f1)/k;
const int N1 = ((c/a*N)<(floor(c/a*N)+0.5)) ? (int)floor(c/a*N) : (int)ceil(c/a*N);

FILE *fs11 = fopen("Hstep.txt", "w");
FILE *fs12 = fopen("error.txt", "w");

complex *Bna, *Bnc;
Bna = new complex;
Bnc = new complex[N1];
MKL_Complex16 *Le = new MKL_Complex16[N*N1];
MKL_Complex16 *L = new MKL_Complex16[N*N];
MKL_Complex16 *Li = new MKL_Complex16[N*N], *Ld = new MKL_Complex16[N*N];
int *NLi = new int(N), *MLi = new int(N), *ldaLi = new int(N), *ipivLi = new int, *infoLi = new int(0);//*ipivLi = new int(0)
int lworkLi = -1;
MKL_Complex16 *s11 = new MKL_Complex16[N*N];
MKL_Complex16 *s12 = new MKL_Complex16[N*N1];
MKL_Complex16 *s21 = new MKL_Complex16[N1*N];
MKL_Complex16 *s22 = new MKL_Complex16[N1*N1];

MKL_Complex16 *neka = new MKL_Complex16[N*N];
MKL_Complex16 *neka1 = new MKL_Complex16[N*N];

//

MKL_Complex16 *dummywork=new MKL_Complex16[1];
zgetri(NLi, Li, ldaLi, ipivLi, dummywork, &lworkLi, infoLi);
lworkLi=(int)(dummywork[0].real);
MKL_Complex16 *work=new MKL_Complex16[lworkLi];

//

for (double temp=f1; temp<=f2; temp+=st) {
double k0=2*PI*temp/c1;

for (int i=0; i = conj(sqrt(complex(k0*k0-((i+1)*PI/a)*((i+1)*PI/a),0)));

for (int i=0; i = conj(sqrt(complex(k0*k0-((i+1)*PI/c)*((i+1)*PI/c),0)));

for (int i=0; i for (int j=0; j complex prvi(c*a*Bnc.real(), c*a*Bnc.imag());
complex ctemp = sqrt(Bna/prvi);
double itemp = int1(a,c,i,j,x1);
complex ftemp(2*itemp*ctemp.real(), 2*itemp*ctemp.imag());
Le[i*N1+j].real = ftemp.real();
Le[i*N1+j].imag = ftemp.imag();
}

complex alpha(1,0), beta(0,0);
cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasTrans, N, N, N1, α, Le, N1, Le, N1, β, L, N); /*<<<<<<<<<<<-------------------This is the place where the problem begins, it happens in all the other places with cblas_zgemm*/

for (int i=0; i fprintf(fs12, "\n");
for (int j=0; j fprintf(fs12, "%lf\t %1d %.18lf %.18lf\n", temp/1000000000, i*N+j, L[i*N+j].real, L[i*N+j].imag);
}
}

for (int i=0; i for (int j=0; j Li[i*N+j]=L[i*N+j];
Ld[i*N+j]=L[i*N+j];
if (i==j) {
Li[i*N+j].real++;
Ld[i*N+j].real--;
}
}

zgetrf(MLi, NLi, Li, ldaLi, ipivLi, infoLi);

zgetri(NLi, Li, ldaLi, ipivLi, work, &lworkLi, infoLi);

cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N, N, α, Li, N, Ld, N, β, s11, N);

/*for (int i=0; i fprintf(fs12, "\n");
for (int j=0; j fprintf(fs12, "%lf\t %1d %.15lf %.15lf\n", temp/1000000000, i*N+j, s11[i*N+j].real, s11[i*N+j].imag);
}
}*/

alpha+=complex(1,0);

cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N1, N, α, Li, N, Le, N1, β, s12, N1);

for (int i=0; i for (int j=0; j neka[i*N+j].real = -s11[i*N+j].real;
neka[i*N+j].imag = -s11[i*N+j].imag;
if (i==j) neka[i*N+j].real++;
}

alpha-=complex(1,0);

cblas_zgemm(CblasRowMajor, CblasTrans, CblasNoTrans, N1, N, N, α, Le, N1, neka, N, β, s21, N);

cblas_zgemm(CblasRowMajor, CblasTrans, CblasNoTrans, N1, N1, N, α, Le, N1, s12, N1, β, neka1, N1);

for (int i=0; i for (int j=0; j s22[i*N1+j].real = -neka1[i*N1+j].real;
s22[i*N1+j].imag = -neka1[i*N1+j].imag;
if (i==j) s22[i*N1+j].real++;
}
if (temp==f1)
fprintf(fs11," > GHz %1d\n ", lworkLi);
//else if (temp>=f1) {
fprintf(fs11, "%lf\t %1d %1d 0 0 0 0 %.15lf %.15lf\n", temp/1000000000, 1, 1, s11[0].real, s11[0].imag);
fprintf(fs11, "%lf\t %1d %1d 0 0 0 0 %.15lf %.15lf\n", temp/1000000000, 1, 2, s12[0].real, s12[0].imag);
fprintf(fs11, "%lf\t %1d %1d 0 0 0 0 %.15lf %.15lf\n", temp/1000000000, 2, 1, s21[0].real, s21[0].imag);
fprintf(fs11, "%lf\t %1d %1d 0 0 0 0 %.15lf %.15lf\n", temp/1000000000, 2, 2, s22[0].real, s22[0].imag);
//}
}

fclose(fs11);
fclose(fs12);

delete[] Bna;
delete[] Bnc;
delete[] Le;
delete[] L;
delete[] Li;
delete NLi;
delete MLi;
delete ldaLi;
delete[] Ld;
delete[] work;
delete[] s11;
delete[] s12;
delete[] s21;
delete[] s22;
delete[] neka;
delete[] neka1;
delete[] ipivLi;
delete infoLi;

}

///////////////////////////

prog.cpp

//////////////////////////

#include
#include
#include
#include "int1.h"
#include "hstep.h"

extern "C" {
#include"mkl_cblas.h"
#include"mkl_lapack.h"
}

using namespace std;

int main() {

clock_t t1 = clock();
Alfa::hstep("", 0.1, 0.08, 0, 20, 1.8, 2.4, 3);
clock_t t2 = clock();

cout << (double)(t2-t1)/CLOCKS_PER_SEC;

}

Thanks to all, i hope that i didnt make any stupid errors.

mecej4 · ‎08-25-2010

The error is not caused by MKL, but by a misuse on your part of a floating point expression. When you call function int1 with a=0.1, c=0.08, n=14 and m=11, the test if (n*c!=a*m) can fail to perform as you intended it to. The test succeeds only if all bits of n*c and a*m coincide. However, because of the inability to represent numbers exactly, 15*0.08 need not equal 12*0.1. To see this, add the following statement prior to the test and look at the output produced:

[cpp]    if(n==15 && m==12)printf("n=%d, m=%d, n*c==a*m? %sn",n,m,n*c==a*m ? "yes":"no");
[/cpp]

To avoid this error, replace the test by something similar to

[bash]if(fabs(n*c/(a*m)-1) > 1e-8)
[/bash]

in whatever programming language you use (C++, Fortran, Matlab, etc.).

For helpful discussions on this and related topics, see Goldberg's article.

There is a second lesson to be drawn from this example: it would have been impossible to make a diagnosis based on your first post in this thread. With the full code available, on the other hand, one could have given the Intel compiler the -Wall option to receive the helpful warning:

[bash]int1.cpp(12): remark #1572: floating-point equality and inequality comparisons are unreliable
      if (n*c!=a*m) {
[/bash]

After you have corrected for this point, you will need to fix another potential source of error. The GEMM routines perform the operation

C = alpha op(A) op(B) + beta C

and, therefore, C has to be initialized before calling the routine, unless beta is zero and the elements of C are well-defined.

andragon · ‎08-29-2010

Thanks man, you have helped me alot. Sorry for the stupid error, i am kind of a begginer. Live and learn, right. Thanks again.