Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6827 Discussions

## Parallel execution, CPU time of PARDISO for multiple rhs and wrong result of LAPACK dgetrf Beginner
234 Views
Hello,
First of all, please understand my poor english.
I have two problems.
1. CPU time of PARDISO
I use the PARDISO to solve a sparse matrix.
Just one sparse matrix and multiple nrhs...
(typically, 10k~10M dimension and 100~10k rhs)
So, phase=33 is used to solve iteratively.
Problem is that,, PARDISO gives just 1.6X faster result whan compared to sequential mode (4 core used).
(dimension 45k, 35 rhs, time solve: 0.13s vs 0.23s, time total: 0.5s vs 0.8s)
I expected more than 3X faster result because each column of B can be solved independently from each other.
What is problem?
If I execute PARDISO in parallel for each column, can i get better speed?
Can I share arguments for PARDISO when i execute it in that manner?
eg>
phase = 12
PARDISO(...)
for each column of B
PARDISO(...) <-- parallel for each column
end
2. Wrong result of LAPACK dgetri when liked with thread library
I use dgetrf and dgetri to inverse a unsymmetric dense matrix.
Other LAPACK functions give correct results except dgetri.
My code is...
```[cpp]typedef struct MATRIX{
double *data;

int nRow;
int nCol;
int isSet;

int ld;
int mCol;
} MATRIX;
int inverseMatrix(MATRIX *M){/*{{{*/
int *ipiv;
int lwork;
int info;
int ispec = 1;
double *work;

if (M->nRow != M->nCol) return 0;
ipiv = (int *) calloc (M->nRow, sizeof(int));
dgetrf(&(M->nRow), &(M->nCol), M->data, &(M->ld), ipiv, &info);
if (info < 0){
fprintf(stderr, "%d element of Matrix is illegal value!\n", -info);
return 0;
}
else if (info > 0){
fprintf(stderr, "Factored matrix is singular!\n");
return 0;
}

lwork = ilaenv(&ispec, "dgetri", "", \
&(M->nRow), &(M->nCol), &info, &info);
lwork = lwork * M->nRow;
work = (double *) calloc(lwork, sizeof(double));
dgetri (&(M->nRow), M->data, &(M->ld), ipiv, work, &lwork, &info);
if (info < 0){
fprintf(stderr, "%d element of Matrix is illegal value!\n", -info);
free(work);
free(ipiv);
return 0;
}
else if (info > 0){
fprintf(stderr, "Factored matrix is singular!");
free(work);
free(ipiv);
return 0;
}
free(work);
free(ipiv);
return 1;

}/*}}}*/[/cpp]```
When i checked line-by-line using gdb, dgetrf (linked with threaded lib) gave correct result
and both threaded and sequential programs gave same results.
however, results of dgetri were different, and threaded dgetri was wrong.
What is the problem?
My system is...
6GB memory
Ubuntu 10.10 x86_64 (Linux 2.6.35-26-generic)
icc 12.0
MKL 10.3
compile option = -g -m64 -w -Wall
Thanks,
Jinwook Kim.
3 Replies Employee
234 Views
HiJinwook Kim,
In general, the bigger NRHS value (number of RHS passed to PARDISO), the better performance can be expected. It also makes sense to choose NRHS so that NRHS = K * num_threads, where K is integer. So, I would recommend you to improve NRHS up to 64 or 128. Of course, it won't be possible for dimension=10M task due to high memory overhead, but at least try for task up to 1M.
I ran similar test, dimension=50K. It was ~2x solve speed-up on NRHS=32 and more than 3x on NRHS=128.
Also I should note that you use rather old processor (without integrated memory controller yet) that has less efficient memory subsystem. For example, running the same test on X5650 system (2x6 cores) lunched in Q1'2010 I obtained almost 4x speed-up on 4 threads and 8x speed-up on 12 threads for NRHS=128.
Regards,
Konstantin Beginner
234 Views
Hi Konstantin,  