PARDISO hanging on factorization...

ljbetche · ‎07-03-2009

Hello,

My understanding is that when using the direct-iterative preconditioning CGS routine in PARDISO, the full LU factorization is computed (i.e. direct solve) the first time a given system is solved, with the CGS, preconditioned with the initial LU matrices, used in subsequent solves. In attempting to solve an admittedly large system (~450,000 equations, ~25,000,000 non-zero elements) on a system with 8 Xeon cores and 16 GB RAM, I get the following on the first solve of the system:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 100 %
local PARDISO version is 106

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( reorder to solve )
================

Times:
======
Time fulladj: 0.788188 s
Time reorder: 0.765143 s
Time symbfct: 2.455881 s
Time parlist: 0.047076 s
Time A to LU: 0.000000 s
Time numfct : 348.804676 s
Time cgs : 7.084059 s cgx iterations 1

Time malloc : 1.102595 s
Time total : 362.927283 s total - sum: 1.879665 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 452232
#non-zeros in A: 27749790
non-zeros in A (%): 0.013569

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with input permutation >
#supernodes: 34772
size of largest supernode: 12456
number of nonzeros in L 667911177
number of nonzeros in U 655048539
number of nonzeros in L+U 1322959716
gflop for the numerical factorization: 8263.608500

gflop/s for the numerical factorization: 23.691221

Is the more than 5 minutes taken to compute the factorization reasonable given the size of this problem? It seems odd, sincethe single-processor CGS solver with ILU preconditioning I had previously been using solves the same problem in a few seconds. Also, it seems that the program "hangs" at one point, remaining at the"Percentage of computed non-zeros for LL^T factorization" line for most of the 5 minutes before ever printing out any of the percentages, leading me to believe that something is wrong. Note that the routine does, however, find the correct solution to the system. My code is written in Fortran 77 and compiled with ifort, andthe parameters I am using are:

MTYPE = 11
IPARM(1) = 1
IPARM(2) = 2
IPARM(3) = 8
IPARM(4) = 121
IPARM(5) = 2
IPARM(6) = 0
IPARM(7) = ITERMO
IPARM(8) = 2500
IPARM(10) = 13
IPARM(11) = 1
IPARM(13) = 1
IPARM(18) = 0
IPARM(19) = 0

where ITERMO is the unit number for output and all other elements of IPARM are 0. Before calling PARDISO, Icall OMP_SET_NUM_THREADS(8). Does this seem like my code is hanging, or is the amount of time to be expected? If the latter, is there any way to reduce the time? Thank you.

Lee

ljbetche · ‎07-06-2009

Hello,

As an update, thinking the problem may be a memory issue, I tried running with a single core and OOC PARDISO (IPARM(60) = 2). The total time for a solve of the same system for which the output is given above was substantially longer (>17 minutes),but the code hangs at the same point as before. Finally, I have noted that if I try to use IPARM(2) = 3, which is supposed to be the OpenMP version of the Metis algorithm, the code crashes with a ERROR = -1, "input inconsistent"; perhaps this suggests a further problem?

Lee

Sergey_P_Intel2 · ‎07-06-2009

Quoting - ljbetche

Hello,

As an update, thinking the problem may be a memory issue, I tried running with a single core and OOC PARDISO (IPARM(60) = 2). The total time for a solve of the same system for which the output is given above was substantially longer (>17 minutes),but the code hangs at the same point as before. Finally, I have noted that if I try to use IPARM(2) = 3, which is supposed to be the OpenMP version of the Metis algorithm, the code crashes with a ERROR = -1, "input inconsistent"; perhaps this suggests a further problem?

Lee

Hi, Lee!

Parallel version of METIS was implementedsince MKL 10.2 Beta, so "input inconsistent" error means thatversion of your MKL is (probably) 10.1 Updatexx. Could youcheck the version of MKL? Also, please try to decrease number of refinement steps (iparm(8)) from 2500 to 10, for example. It looks like PARDISO performed many steps to refine the solution.

With best regards,
Sergey

ljbetche · ‎07-07-2009

Quoting - Sergey Pudov (Intel)

Hi, Lee!

Parallel version of METIS was implementedsince MKL 10.2 Beta, so "input inconsistent" error means thatversion of your MKL is (probably) 10.1 Updatexx. Could youcheck the version of MKL? Also, please try to decrease number of refinement steps (iparm(8)) from 2500 to 10, for example. It looks like PARDISO performed many steps to refine the solution.

With best regards,
Sergey

Sergey,

Thanks, I'll try reducing the refinement steps. According to the MKLinfo pagefor SHARCNet, our network of large cluster systems, we are using version 11.0.083. Does this sound correct? Is there another way to check the version?

Lee

ljbetche · ‎07-07-2009

Sergey,

As an update, reducing the number of refinement steps, even to IPARM(8) = 1, has no effect upon the solution time.

Lee

Gennady_F_Intel · ‎07-14-2009

Quoting - ljbetche

Lee.

Compile Professional Additional build 083 contains MKL version 10.1 update 1.

Please see the KB article:Which version of Intel IPP, Intel MKL and Intel TBB is installed by the Intel Compiler Professional Edition?

The problem you are discussing into this tread was fixed into the next version 10.2

--Gennady