Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Coding mistake or numerical problem?

jonnin
Beginner
576 Views
I am new to the math kernel, but this simple example should have worked. Is there a code problem, or did I just happen to choose a bad example? The result is closeto the original. It is trying to factor A into QBP using dorgbr.
Here is the code. It uses a few of my own routines (transpose, copy, and display). Results are at the end. Copy is abused to create larger memory than the orig so there is room for the result.
-------------------------------
//required parameters, oversized, same notation as the documentation
int n,m, lda, lwork, info;
double d[100],e[100], tauq[100],taup[100];
char ul, pq;
double work[100];

double a[] = //this is another test problem from the book, C style
{
4,11,14,
8,7,-2
};
double madeup[] = //what P should be, more or less, hand computed
{
-4/18.2482875908, 11/18.2482875908, 14/18.2482875908,
-.91,.541,.446,
0,0,0 //padding
};

lda = 2;
m = 2;
n = 3;
lwork = 100;

double * at = transpose(a,m,n); //at = a transposed for fortran memory problems
//step 1 -- make a bidiagonal matrix from A and copy it -- this matrix is called 'B'
dgebrd(&m,&n,at,&lda,d,e,tauq,taup,work,&lwork,&info);
double * p = ct_copy(at, n,n); //save a copy of b for gettin P&Q
double * q = ct_copy(at, n,n); //save another copy of b (these allocate n*n locations even though matrix is smaller)
//Now try to create P and Q such that A = QBP -- (p is transposed?)
pq = 'Q'; //get Q
if(m>n)
dorgbr(&pq,&m,&n,&n,q,&lda,tauq,work,&lwork,&info);
else
dorgbr(&pq,&m,&m,&n,q,&lda,tauq,work,&lwork,&info);
pq = 'P'; // get P
if(n > m)
dorgbr(&pq,&m,&n,&m,p,&lda,taup,work,&lwork,&info);
else
dorgbr(&pq,&n,&n,&m,p,&lda,taup,work,&lwork,&info);
//Right, so now we can supposedly get A back from QBP or QB P(transposed)
double * Q;
double * P;
double * B;

Q = transpose(q,m,m); //convert back to C memory alignment and multiply
cout << "Q ___________________________________"<< endl;
display(Q, m,m,3); //this seems reasonable
P = transpose(p,m,n);
//P = ct_copy(p,m,n); //its already transposed, just copy it?
cout << "P ___________________________________"<< endl;
display(P, m,n,10);

B = asgn(m,n,0.0); //fill an m*n with zeros
B[0] = d[0]; //reconstruct B (manually for this example)
B[4] = d[1];
B[3] = e[0];
cout << "B ___________________________________"<< endl;
display(B,m,n,3);
double * new_result = multiply(B,m,n,madeup,3,3);
double * tr = multiply(Q,m,m,B,m,n);
double *res = multiply(tr,m,n,P,n,n);
cout << "A ___________________________________"<< endl;
display(a,m,n,1);
cout << "Result ___________________________________"<< endl;
display(res,m,n,5); //wrong answer - Where is the problem???
-------------------------------
Here is the output:
Q ___________________________________
1.000 0.000
0.000 1.000
P ___________________________________
-0.2191986497 -0.4383972994 -0.7123956116
-0.7671952740 -0.6027962867 0.5479966243
B ___________________________________
-18.248 0.000 0.000
-4.438 -9.863 0.000
A ___________________________________
4.0 11.0 14.0
8.0 7.0 -2.0
Result ___________________________________
3.99999 7.99999 13.00000
8.54054 7.89189 -2.24324
Press any key to continue
0 Kudos
4 Replies
michael_chuvelev
Beginner
576 Views
Hello.
I think you've made a mistake in your code while computing PT. You want PT, dimension nxn, n=3,be returned in p and pass the leading dimension of p lda=2.
Here is my code that should work correctly.
------------------------------------------------------------
#include "stdio.h"
int main() {
double A[6] = { 4, 8, 11, 7, 14, -2 };
int m = 2, n = 3, lda = 2, ldaa = 3, lwork = 100, info;
int i, j;
double D[3], E[3], TAUQ[3], TAUP[3], WORK[100], AA[9];
double temp;

printf( "On entry: A= " );
for( i = 0; i < m; i++ ) {
for( j = 0; j < n; j++ ) {
printf( "%f ", A[i+j*lda] );
}
printf( " " );
}
// Reduce to bidiagonal
dgebrd_( &m, &n, A, &lda, D, E, TAUQ, TAUP, WORK, &lwork, &info );
printf( "After DGEBRD: A= " );
for( i = 0; i < m; i++ ) {
for( j = 0; j < n; j++ ) {
printf( "%f ", A[i+j*lda] );
AA[i+j*ldaa] = A[i+j*lda];
}
printf( " " );
}
printf( "After DGEBRD: D= %f, %f ", D[0], D[1] );
printf( "After DGEBRD: E= %f ", E[0] );
printf( "After DGEBRD: TAUQ= %f, %f ", TAUQ[0], TAUQ[1] );
printf( "After DGEBRD: TAUP= %f, %f ", TAUP[0], TAUP[1] );
// Restore Q
dorgbr_( "Q", &m, &m, &n, A, &lda, TAUQ, WORK, &lwork, &info, 1 );
printf( "After DORGBR: Q= " );
for( i = 0; i < m; i++ ) {
for( j = 0; j < m; j++ ) {
printf( "%f ", A[i+j*lda] );
}
printf( " " );
}
// Restore PT
dorgbr_( "P", &n, &n, &m, AA, &ldaa, TAUP, WORK, &lwork, &info, 1 );
printf( "After DORGBR: PT= " );
for( i = 0; i < n; i++ ) {
for( j = 0; j < n; j++ ) {
printf( "%f ", AA[i+j*ldaa] );
}
printf( " " );
}
// Compute B*PT
AA[1] = E[0] * AA[0] + D[1] * AA[1];
AA[4] = E[0] * AA[3] + D[1] * AA[4];
AA[7] = E[0] * AA[6] + D[1] * AA[7];
AA[0] = D[0] * AA[0];
AA[3] = D[0] * AA[3];
AA[6] = D[0] * AA[6];
// Compute Q*B*PT
temp = A[0]*AA[0] + A[2]*AA[1];
AA[1] = A[1]*AA[0] + A[3]*AA[1];
AA[0] = temp;
temp = A[0]*AA[3] + A[2]*AA[4];
AA[4] = A[1]*AA[3] + A[3]*AA[4];
AA[3] = temp;
temp = A[0]*AA[6] + A[2]*AA[7];
AA[7] = A[1]*AA[6] + A[3]*AA[7];
AA[6] = temp;
printf( "Q*B*PT= " );
for( i = 0; i < m; i++ ) {
for( j = 0; j < n; j++ ) {
printf( "%f ", AA[i+j*ldaa] );
}
printf( " " );
}
return 0;
}
------------------------------------------------------------
The output:
On entry: A=
4.000000 11.000000 14.000000
8.000000 7.000000 -2.000000
After DGEBRD: A=
-18.248288 0.494420 0.629262
-4.438773 -9.863939 -0.917237
After DGEBRD: D= -18.248288, -9.86393 9
After DGEBRD: E= -4.438773
After DGEBRD: TAUQ= 0.000000, 0.000000
After DGEBRD: TAUP= 1.219199, 1.086175
After DORGBR: Q=
1.000000 0.000000
0.000000 1.000000
After DORGBR: PT=
-0.219199 -0.602796 -0.767195
-0.712396 -0.438397 0.547997
-0.666667 0.666667 -0.333333
Q*B*PT=
4.000000 11.000000 14.000000
8.000000 7.000000 -2.000000
0 Kudos
jonnin
Beginner
576 Views
Thank you. I will compare the two to see where I went wrong. I have n = 3, lda 2, and a 3x3 allocated, so it will require a little digging to spot the mistake.
0 Kudos
jonnin
Beginner
576 Views
I found it now.
if(m>n)
dorgbr(&pq,&m,&n,&n,q,&lda,tauq,work,&lwork,&info);
else
dorgbr(&pq,&m,&m,&n,q,&lda,tauq,work,&lwork,&info);
This does the else -- which has m m n not n n m, because of the fortran transpose again. (which also means my lda was wrong too, I see it all now) That problem is probably going to bite me a few times untilI get my head around it all... Thanks again --it works now!
0 Kudos
evgeny1
Beginner
576 Views

Hi,

I am new to this forum but used MKL a bit. I wonder if memory allocation is a problem. MKL manual says arrays have to be aligned at 16 byte boundary. It can be arranged in C /C++ as well by calling a special allocator.

Do you know if not aligning memeory at 16 byte boundary is more than performance issue but also can cause wrong answers ?

Thank you,

Evgeny

0 Kudos
Reply