Phase 11 slow with PARDISO

julien_doublet · ‎07-29-2009

Hi,

I test the example examples\solver\source\pardiso_unsym_c.c with exactly the same parameters and get the speed of the process with QueryPerformanceCounter. The outputs are goods but the speed of the phase 11 is very slow (158ms on a Intel Core2Duo 2Ghz) and the number of nonzeros in L and U matrices are impossible (17179869199 and 17179869188!!!)

The example is:

MKL_INT main( void ) {
/* Matrix data. */
int n = 5;
int ia[ 6] = { 1, 4, 6, 9, 12, 14};
int ja[13] = { 1, 2, 4,
1, 2,
3, 4, 5,
1, 3, 4,
2, 5 };
double a[18] = { 1.0, -1.0, -3.0,
-2.0, 5.0,
4.0, 6.0, 4.0,
-4.0, 2.0, 7.0,
8.0, -5.0 };
int mtype = 11; /* Real unsymmetric matrix */
/* RHS and solution vectors.*/
double b[5], x[5];
int nrhs = 1; /* Number of right hand sides. */
/* Internal solver memory pointer pt, */
/* 32-bit: int pt[64]; 64-bit: long int pt[64] */
/* or void *pt[64] should be OK on both architectures */
void *pt[64];
/* Pardiso control parameters.*/
int iparm[64];
int maxfct, mnum, phase, error, msglvl;
/* Auxiliary variables.*/
int i;
double ddum; /* Double dummy */
int idum; /* Integer dummy. */

__int64 ticksPerSecond ;
__int64 tick_start, tick_end ;
QueryPerformanceFrequency((LARGE_INTEGER *)&ticksPerSecond);

/* -------------------------------------------------------------------- */
/* .. Setup Pardiso control parameters. */
/* -------------------------------------------------------------------- */
/* --------------------------------------------------------------------*/
/* .. Setup Pardiso control parameters.*/
/* --------------------------------------------------------------------*/
for (i = 0; i < 64; i++) {
iparm = 0;
}
iparm[0] = 1; /* No solver default */
iparm[1] = 2; /* Fill-in reordering from METIS */
/* Numbers of processors, value of MKL_NUM_THREADS */
iparm[2] = 1;//mkl_get_max_threads();
iparm[3] = 0; /* No iterative-direct algorithm */
iparm[4] = 0; /* No user fill-in reducing permutation */
iparm[5] = 0; /* Write solution into x */
iparm[6] = 0; /* Not in use */
iparm[7] = 2; /* Max numbers of iterative refinement steps */
iparm[8] = 0; /* Not in use */
iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
iparm[11] = 0; /* Not in use */
iparm[12] = 0; /* Not in use */
iparm[13] = 0; /* Output: Number of perturbed pivots */
iparm[14] = 0; /* Not in use */
iparm[15] = 0; /* Not in use */
iparm[16] = 0; /* Not in use */
iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
iparm[18] = -1; /* Output: Mflops for LU factorization */
iparm[19] = 0; /* Output: Numbers of CG Iterations */
maxfct = 1; /* Maximum number of numerical factorizations. */
mnum = 1; /* Which factorization to use. */
msglvl = 1; /* Print statistical information in file */
error = 0; /* Initialize error flag */
/* -------------------------------------------------------------------- */
/* .. Initialize the internal solver memory pointer. This is only */
/* necessary for the FIRST call of the PARDISO solver. */
/* -------------------------------------------------------------------- */
for (i = 0; i < 64; i++) {
pt = 0;
}
/* -------------------------------------------------------------------- */
/* .. Reordering and Symbolic Factorization. This step also allocates */
/* all memory that is necessary for the factorization. */
/* -------------------------------------------------------------------- */

QueryPerformanceCounter((LARGE_INTEGER *)&tick_start) ;

phase = 11;
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, &idum, &nrhs,
iparm, &msglvl, &ddum, &ddum, &error);
if (error != 0) {
printf("\nERROR during symbolic factorization: %d", error);
exit(1);
}

QueryPerformanceCounter((LARGE_INTEGER *)&tick_end) ;
printf("time = %f\n", (float)( (tick_end-tick_start)*1000.0/ticksPerSecond ));

return 0;
}

and the return:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

local PARDISO version is 106

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 0.007895 s
Time reorder: 0.002547 s
Time symbfct: 0.000416 s
Time parlist: 0.000001 s
Time malloc : 0.010157 s
Time total : 0.025498 s total - sum: 0.004482 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 2
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 5
#non-zeros in A: 13
non-zeros in A (): 52.000000

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3
size of largest supernode: 3
number of nonzeros in L 17179869199
number of nonzeros in U 17179869188
number of nonzeros in L+U 17179869203
time = 158.445938

I don't understand what's wrong is this code. Is it normal that the reordering function
is slow with this matrix?

best regards, Julien.

Sergey_K_Intel1 · ‎07-30-2009

Quoting - julien.doublet

Hi,

I test the example examplessolversourcepardiso_unsym_c.c with exactly the same parameters and get the speed of the process with QueryPerformanceCounter. The outputs are goods but the speed of the phase 11 is very slow (158ms on a Intel Core2Duo 2Ghz) and the number of nonzeros in L and U matrices are impossible (17179869199 and 17179869188!!!)

The example is:

MKL_INT main( void ) {
/* Matrix data. */
int n = 5;
int ia[ 6] = { 1, 4, 6, 9, 12, 14};
int ja[13] = { 1, 2, 4,
1, 2,
3, 4, 5,
1, 3, 4,
2, 5 };
double a[18] = { 1.0, -1.0, -3.0,
-2.0, 5.0,
4.0, 6.0, 4.0,
-4.0, 2.0, 7.0,
8.0, -5.0 };
int mtype = 11; /* Real unsymmetric matrix */
/* RHS and solution vectors.*/
double b[5], x[5];
int nrhs = 1; /* Number of right hand sides. */
/* Internal solver memory pointer pt, */
/* 32-bit: int pt[64]; 64-bit: long int pt[64] */
/* or void *pt[64] should be OK on both architectures */
void *pt[64];
/* Pardiso control parameters.*/
int iparm[64];
int maxfct, mnum, phase, error, msglvl;
/* Auxiliary variables.*/
int i;
double ddum; /* Double dummy */
int idum; /* Integer dummy. */

__int64 ticksPerSecond ;
__int64 tick_start, tick_end ;
QueryPerformanceFrequency((LARGE_INTEGER *)&ticksPerSecond);

/* -------------------------------------------------------------------- */
/* .. Setup Pardiso control parameters. */
/* -------------------------------------------------------------------- */
/* --------------------------------------------------------------------*/
/* .. Setup Pardiso control parameters.*/
/* --------------------------------------------------------------------*/
for (i = 0; i < 64; i++) {
iparm = 0;
}
iparm[0] = 1; /* No solver default */
iparm[1] = 2; /* Fill-in reordering from METIS */
/* Numbers of processors, value of MKL_NUM_THREADS */
iparm[2] = 1;//mkl_get_max_threads();
iparm[3] = 0; /* No iterative-direct algorithm */
iparm[4] = 0; /* No user fill-in reducing permutation */
iparm[5] = 0; /* Write solution into x */
iparm[6] = 0; /* Not in use */
iparm[7] = 2; /* Max numbers of iterative refinement steps */
iparm[8] = 0; /* Not in use */
iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
iparm[11] = 0; /* Not in use */
iparm[12] = 0; /* Not in use */
iparm[13] = 0; /* Output: Number of perturbed pivots */
iparm[14] = 0; /* Not in use */
iparm[15] = 0; /* Not in use */
iparm[16] = 0; /* Not in use */
iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
iparm[18] = -1; /* Output: Mflops for LU factorization */
iparm[19] = 0; /* Output: Numbers of CG Iterations */
maxfct = 1; /* Maximum number of numerical factorizations. */
mnum = 1; /* Which factorization to use. */
msglvl = 1; /* Print statistical information in file */
error = 0; /* Initialize error flag */
/* -------------------------------------------------------------------- */
/* .. Initialize the internal solver memory pointer. This is only */
/* necessary for the FIRST call of the PARDISO solver. */
/* -------------------------------------------------------------------- */
for (i = 0; i < 64; i++) {
pt = 0;
}
/* -------------------------------------------------------------------- */
/* .. Reordering and Symbolic Factorization. This step also allocates */
/* all memory that is necessary for the factorization. */
/* -------------------------------------------------------------------- */

QueryPerformanceCounter((LARGE_INTEGER *)&tick_start) ;

phase = 11;
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, &idum, &nrhs,
iparm, &msglvl, &ddum, &ddum, &error);
if (error != 0) {
printf("nERROR during symbolic factorization: %d", error);
exit(1);
}

QueryPerformanceCounter((LARGE_INTEGER *)&tick_end) ;
printf("time = %fn", (float)( (tick_end-tick_start)*1000.0/ticksPerSecond ));

return 0;
}

and the return:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

local PARDISO version is 106

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 0.007895 s
Time reorder: 0.002547 s
Time symbfct: 0.000416 s
Time parlist: 0.000001 s
Time malloc : 0.010157 s
Time total : 0.025498 s total - sum: 0.004482 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 2
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 5
#non-zeros in A: 13
non-zeros in A (): 52.000000

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3
size of largest supernode: 3
number of nonzeros in L 17179869199
number of nonzeros in U 17179869188
number of nonzeros in L+U 17179869203
time = 158.445938

I don't understand what's wrong is this code. Is it normal that the reordering function
is slow with this matrix?

best regards, Julien.

Dear Julien,

PARDISO is intended for solving large sparse linear system wth at least several tens ofthousands of rows. It will be moreappropriate to look at the performance of reordering and symbolic factorizationfor matrices of such dimensions. Normally the time spent for reordering and symbolic factorization doesn't take more than 10% of the total time. However there exist some exceptions.

As concerns as the wrong number of non-zeros in L&U factors, it was aspecific problem in MKL 10.1 only observed on Windows. The problem was resolved in the recently released MKL 10.2. I tested your code with MKL 10.2 under Win 32 and PARDISO from this releasereports correct numbers

Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 0.000021 s
Time reorder: 0.000184 s
Time symbfct: 0.004639 s
Time malloc : 0.003127 s
Time total : 0.013814 s total - sum: 0.005843 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 5
#non-zeros in A: 13
non-zeros in A (): 52.000000

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3
size of largest supernode: 3
number of nonzeros in L 15
number of nonzeros in U 4
number of nonzeros in L+U 19

All the best
Sergey