Pardiso_64 crashes on big problems

phollox · ‎03-22-2011

I'm using Pardiso_64 to solve a non-structured finite element problem. The linking procedure was performed according to the Link Line Advisor

-Wl,--start-group libmkl_intel_ilp64.a libmkl_gnu_thread.a libmkl_core.a -Wl,--end-group

After compiling with gcc, the code works without problems in most problems.

The machine used is a server with 8 nodes processors, each one with 8 cores, and 1.5 TB of RAM.For a fluid flow problem, in a mesh with 3 million nodes, the finite element problem generates a sparse matrix of 10.8 million of rows and columns, and Pardiso_64 starts and finish without problems. The process takes around 37% of the memory at its maximum.

For a slightly larger problem, with 3.5 million nodes, a sparse matrix of around 13 million rows and colums is generated. In this case, Pardiso_64 performs the analysis (Phase 11), and just after the start of the factorization (Phase 22), after reporting a 0% of non-zero elements calculated from the LL^T, it crashes with a signal 11.

Summary PARDISO: ( clean )

================

cleaned memory, deleted number of L&U-factorizations: 0

Percentage of computed non-zeros for LL^T factorization

0 %

Program received signal SIGSEGV, Segmentation fault.

For bigger meshes, producing matrices with over 25 millions of equations, Pardiso_64 is unable to start, due to unsufficient memory available, a behaviour expected for very large problems.

Summary PARDISO: ( clean )

================

cleaned memory, deleted number of L&U-factorizations: 0

*** Error in PARDISO ( insufficient_memory) error_num= -1100

*** Error in PARDISO memory allocation: FACT_L&U, size to allocate: 1504047336 bytes

total memory wanted here: 1813408125 kbyte

symbolic (max): 0 symbolic (permanent): 0

real(including 1 factor): 1813408125

My question is: what could be producing this signal 11 call? Is it something from Pardiso_64? Or from somewhere else?

I would like to add some files, but our finite element code is huge, with hundreds of source and header files. I will try to reproduce this error with a simpler code, or maybe dump the matrix

Konstantin_A_Intel · ‎03-25-2011

Hello,

The sizes of the tasks you solve are really impressive! Most likely, you've reached memory limitation starting from the 13M x 13M matrix.

Could you please give me a bit more details in order to try investigate the testcase remotely:

1) As far as you link with ILP64 interfaces, you have no need to use pardiso_64 as it's exactly equal to pardiso. The difference between pardiso and pardiso_64 would take place on LP64 only. But it isn't an error, It's just an info that can be useful for you.

2) Can you provide a full log file with the output, when msglvl=1?

3) Do you have a chance to use Intel compiler? If so, please try to use icc instead of gcc replacing libmkl_gnu_thread.a with libmkl_intel_thread.a as well?

Thanks,

Konstantin

phollox · ‎03-25-2011

Thanks for your answer,

1) From the manual, and the mkl_pardiso.h file, I understand that the ILP64 interface is the one recommended for pardiso_64, because it uses an integer definition of "long long int". Not sure if GCC supports this type definition. Is this correct? That means that I can link with the plain pardiso, using ILP64, and it will work the same?

http://software.intel.com/en-us/articles/a-new-ilp64-64-bit-integer-version-of-the-pardiso-solver-is-now-available/

We switch to MKL Pardiso because the UBasel Pardiso doesn't have a 64 bit interface. With matrices with more than 2*10 elements (aprox), it was not working.

2) I'm running the simulations for the succesful 10.8 million case, and the failed 13 million, in order to attach the output of both to compare. I will attach them asap.

3) I will ask to our IS department of my university to see if we have access to the Intel compiler.

phollox · ‎03-25-2011

Thanks for your answer,

1) From the manual, and the mkl_pardiso.h file, I understand that the ILP64 interface is the one recommended for pardiso_64, because it uses an integer definition of "long long int". Not sure if GCC supports this type definition. Is this correct? That means that I can link with the plain pardiso, using ILP64, and it will work the same?

http://software.intel.com/en-us/articles/a-new-ilp64-64-bit-integer-version-of-the-pardiso-solver-is-now-available/

We switch to MKL Pardiso because the UBasel Pardiso doesn't have a 64 bit interface. With matrices with more than 2*10 elements (aprox), it was not working.

2) I'm running the simulations for the succesful 10.8 million case, and the failed 13 million, in order to attach the output of both to compare. I will attach them asap.

3) I will ask to our IS department of my university to see if we have access to the Intel compiler.

Konstantin_A_Intel · ‎03-27-2011

Hi,

Some more words about pardiso/pardiso_64 (you were mostly right about it..)

pardiso interface accepts 32-bit integer data in LP64 mode (when linked with libmkl_intel_lp64 interface library) and 64-bit integers in ILP64 mode (libmkl_intel_ilp64). On the other hand, pardiso_64 was designed to always accept 64-bit integers in any mode (INTEGER*8 in Fortran or long long int in C). So, if you use ILP64 interface then all MKL function will accept 64-bit integers, including both pardiso and pardiso_64. And it should work with GCC compiler as well. So, it's Ok that you use pardiso_64, no problem :)

Ok, I'm looking for log files from you..

Thanks,

Konstantin

phollox · ‎03-28-2011

Thanks again for your answer,

For me, it would be better to use the regular pardiso, because not all of my problems require the ILP64 interface, and I read that the phase 1 of pardiso_64 is slower than the regular one.

The log file for the failed run is attached on failed.log. Three things about this log file. The first, the computer took a couple of hours between printing the 0% completed message during the factorization, and actually reporting the signal 11. The second is that the peak memory ussage reported by the time command was about 1.38 TB, 92% of the total 1.5 TB of memory. However, the peak that I observed using top and htop was around 23% of the total memory. The third thing is that a few of the lines in the log are printed by our code to monitor the solution stage. Those lines are in French: "debut de pardiso_symbolic, fin de pardiso_symbolic, etc".

This error generates a huge core dump file, that I was unable to see because this feature was disabled in our server. The file is 578.4GB, and it probably filled my user hard-drive quota on the server. Whe I read this file with gdb -c core.96687 (the proccess ID), I got the info in gdb.log.

I sent two jobs to the server. A smaller 10M unknowns, that will be solved succesfully, and a bigger 23M, that will crash by unavailable memory. I will atach those log files asap.

Thanks for your help,

Flix

phollox · ‎03-29-2011

Hello,

I'm attaching two log files.

"success.log" is a successful simulation for a problem with 10.8 million of unknowns. Is a non linear problem (fluid flow), and we only recalculate the matrix and its factorization if the convergence rate is not good. The file is slightly long because it performed 10 iterations. The converge rate is printed at the end of the Pardiso calculations, in lines like:

[bash]I   2, R 8.68e-03, RS 8.68e-03, D 5.11e+00, C 0.00e+00, SM(767.51 s) FM(398991 s)[/bash]

where "I 2" means that it is the 2nd iteration.

"memory.log" is a failed simulation, with 25.6 million of unknowns. An "insuficient memory" error is printed, and then the simulation was killed by me (signal 15). Otherwise, it will continue trying to solve the problem, until enough memory is eventualy available, or the maximum number of iterations is reached.

Konstantin_A_Intel · ‎03-29-2011

I have several of questions:

1) Do you have a data of memory consumption for "success" run reported by time command?

2) Did you intentianally switched-on two-level factorization algorithm? I would try to switch it off, just to experiment?

3) Seems you turned-on matching and scaling for the 23M test. Did you try to do so for 13M testcase?

The possibility of facing memory limit is still looks high as time reported 1.4Tb consumption. It's possible that swapping started that finally caused problems... How much memory do other parts of your program require?

Regards,

Konstantin

phollox · ‎03-30-2011

Answers:

1) Do you have a data of memory consumption for "success" run reported by time command?

Yes. It is included on the log file, at the end. It reports at maximum use of about 2.16 TB. I do not trust on that value, because the memory usage reported by "top" was around 37% of the total. Other users were also running task on the server and they did not crashed or had any problem. Consequently, I think that the actual memory usage was the value reported by "top", and not the one by the "time" command.

2) Did you intentianally switched-on two-level factorization algorithm? I would try to switch it off, just to experiment?
What is exactly the two-level factorization algorithm? Is it one of the IPARM values? These are the values I'm using (C++ code, 0-based indexing):

[cpp]/* -------------------------------------------------------------------- */
/* .. Setup Pardiso control parameters. */
/* -------------------------------------------------------------------- */
   IPARM[0] = 0;	/* (0)Use default values for IPARM */
   IPARM[1] = 2;	/* Fill-in reordering: (0)Min Degree, (2)METIS, (3)OpenMP */
   IPARM[3] = 0;	/* No iterative-direct (Krylov-Subspace) algorithm */
   IPARM[4] = 0;	/* No user fill-in reducing permutation */
   IPARM[5] = 0;	/* Write solution into (0)x, (1)b */
   IPARM[7] = 1;	/* Max numbers of iterative refinement steps. (0)=2 iteration */
   IPARM[9] = 13;	/* Perturb the pivot elements with 1E-13 */
   IPARM[10] = 1;	/* (0)Disable scaling, (1)Use nonsymmetric permutation and scaling MPS */
   IPARM[12] = 1;	/* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
   IPARM[17] = 0;	/* Report the number of nonzeros in the factor LU */
   IPARM[18] = 0;	/* Report the Mflops for LU factorization */
   IPARM[20] = 1;	/* Pivoting: (0)1x1 diagonal, (1)2x2 Bunch&Kaufman  */
   IPARM[26] = 0;	/* (0)No check matrix CSR format, (1)Check*/
   IPARM[27] = 0;	/* (0)double precision, (1)single precision*/
   IPARM[34] = 0;	/* (0)Fortran 1-based index, (1) C 0-based index*/
   IPARM[59] = 0;	/* (0)in-core Pardiso, (2)out-of-core Pardiso */[/cpp]

The remaining parameters should have its default, according to IPARM[0].

3) Seems you turned-on matching and scaling for the 23M test. Did you try to do so for 13M testcase?

According to IPARM[10] and IPARM[12], yes, I used scaling and matching. I used these parameters for all the cases, the 10M, the 13M and the 23M.

4) The possibility of facing memory limit is still looks high as time reported 1.4Tb consumption. It's possible that swapping started that finally caused problems... How much memory do other parts of your program require?
Basically, our program calculates numerical integration of a lot of things. The results of those integrations are saved in an array for calling Pardiso afterwards. So actually, our memory consumption is quite low, and the most memory intensive stage is the solution of the linear system. However, there are several users running in the server, and their process consume some of the available memory, but normally is not a significant amount. Around 100 GB in average.

Thanks for your interest in my problem. If I wasn't clear enough in my answer, please, let me know. Thanks again,
---
Flix

Konstantin_A_Intel · ‎05-11-2011

Hi Felix,

I'm sorry for not responding long time - seems I just lost track of this topic.

Are you still interested in investigation of this issue? If yes, I can provide you with a simple tool which allows you to measure memory consumption (I'm rather confident with this tool).

Regards,

Konstantin

phollox · ‎05-11-2011

Yes, I would like to solve this issue. I still don't have solution for it. We are considering to switch to an iterative solver, to compare the performance with Pardiso, and avoid the high resolution time.

Regarding the memory consumption tool, yes, I would like to have a better estimate than the ones I use right now: "htop" and the "time" command. I would like to see if there is jump in the memory usage, causing the signal 11. However, I am not the admin of the server. So I warn you: any installation or configuration process that requires root permisions will take longer than expected (I actually have a sign with that message next to my screen, like in the wing mirrors).

Thanks,

Konstantin_A_Intel · ‎05-11-2011

I've got a simpler idea! You may print values of iparm(15),iparm(16) andiparm(17) after reordering stage (phase=11). In fact,iparm(16)+iparm(17) should give us a good estimate of the memory needed for PARDISO.

Regards,

Konstantin