- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to use cluster pardiso to solve large equations with complex hermitian matrix. I have multiple (>1000) right hand side but matrix A is the same, so I do phase 11 and 22 factorization first, then want reuse the factorization to solve the equation with multiple right hand side. I know pardiso has the capability to solve all the right hand side by one call, but it had memory issues due to the very large size of my matrix. So I tried do phase 11 & 22 first, then read each right hand side from binary file and solve phase 33 in a do loop. For the first 20 or so right hand side, the code runs good and it seems calculate the correct solution, but then the program crashes. I believe there might be some memory leaks.
Any suggestions/ideas are welcome, here is the error message, I run the code on 4 nodes. line 171 is where I call pardiso in phase 33. Thanks.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
z2d.exe 0000000005A33DC5 Unknown Unknown Unknown
z2d.exe 0000000005A319E7 Unknown Unknown Unknown
z2d.exe 00000000059EDD14 Unknown Unknown Unknown
z2d.exe 00000000059EDB26 Unknown Unknown Unknown
z2d.exe 00000000059B0B36 Unknown Unknown Unknown
z2d.exe 00000000059B411E Unknown Unknown Unknown
libpthread.so.0 00000036B220F710 Unknown Unknown Unknown
z2d.exe 00000000059B4040 Unknown Unknown Unknown
libpthread.so.0 00000036B220F710 Unknown Unknown Unknown
libmpi.so.12 00002B23E8A8BEE0 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
z2d.exe 0000000005A33DC5 Unknown Unknown Unknown
z2d.exe 0000000005A319E7 Unknown Unknown Unknown
z2d.exe 00000000059EDD14 Unknown Unknown Unknown
z2d.exe 00000000059EDB26 Unknown Unknown Unknown
z2d.exe 00000000059B0B36 Unknown Unknown Unknown
z2d.exe 00000000059B411E Unknown Unknown Unknown
libpthread.so.0 0000003A8BC0F710 Unknown Unknown Unknown
libmpi.so.12 00002B21F3490FB2 Unknown Unknown Unknown
libmpi.so.12 00002B21F32D6FBC Unknown Unknown Unknown
libmpi.so.12 00002B21F33E2E39 Unknown Unknown Unknown
libmpi.so.12 00002B21F33E347A Unknown Unknown Unknown
libmpi.so.12 00002B21F32C2788 Unknown Unknown Unknown
libmpi.so.12 00002B21F32C100A Unknown Unknown Unknown
libmpi.so.12 00002B21F32C02CF Unknown Unknown Unknown
libmpi.so.12 00002B21F32C3A2B Unknown Unknown Unknown
libmpi.so.12 00002B21F32C343E Unknown Unknown Unknown
z2d.exe 00000000015DCDE2 Unknown Unknown Unknown
z2d.exe 00000000005D3CEB Unknown Unknown Unknown
z2d.exe 0000000000557699 Unknown Unknown Unknown
z2d.exe 0000000000553999 MAIN__ 171 z2d_1by1.f90
z2d.exe 0000000000552C1E Unknown Unknown Unknown
libc.so.6 0000003A8B81ED1D Unknown Unknown Unknown
z2d.exe 0000000000552B29 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
z2d.exe 0000000005A33DC5 Unknown Unknown Unknown
z2d.exe 0000000005A319E7 Unknown Unknown Unknown
z2d.exe 00000000059EDD14 Unknown Unknown Unknown
z2d.exe 00000000059EDB26 Unknown Unknown Unknown
z2d.exe 00000000059B0B36 Unknown Unknown Unknown
z2d.exe 00000000059B411E Unknown Unknown Unknown
libpthread.so.0 0000003B9840F710 Unknown Unknown Unknown
libmpi.so.12 00002B5AFB7954A0 Unknown Unknown Unknown
libmpi.so.12 00002B5AFB8C5FD0 Unknown Unknown Unknown
libmpi.so.12 00002B5AFB70BFBC Unknown Unknown Unknown
libmpi.so.12 00002B5AFB817E39 Unknown Unknown Unknown
libmpi.so.12 00002B5AFB81847A Unknown Unknown Unknown
libmpi.so.12 00002B5AFB6F7788 Unknown Unknown Unknown
libmpi.so.12 00002B5AFB6F600A Unknown Unknown Unknown
libmpi.so.12 00002B5AFB6F52CF Unknown Unknown Unknown
libmpi.so.12 00002B5AFB6F8A2B Unknown Unknown Unknown
libmpi.so.12 00002B5AFB6F843E Unknown Unknown Unknown
z2d.exe 00000000015DCDE2 Unknown Unknown Unknown
z2d.exe 00000000005D3CEB Unknown Unknown Unknown
z2d.exe 0000000000557699 Unknown Unknown Unknown
z2d.exe 0000000000553999 MAIN__ 171 z2d_1by1.f90
z2d.exe 0000000000552C1E Unknown Unknown Unknown
libc.so.6 0000003B9801ED1D Unknown Unknown Unknown
z2d.exe 0000000000552B29 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
z2d.exe 0000000005A33DC5 Unknown Unknown Unknown
z2d.exe 0000000005A319E7 Unknown Unknown Unknown
z2d.exe 00000000059EDD14 Unknown Unknown Unknown
z2d.exe 00000000059EDB26 Unknown Unknown Unknown
z2d.exe 00000000059B0B36 Unknown Unknown Unknown
z2d.exe 00000000059B411E Unknown Unknown Unknown
libpthread.so.0 000000382FC0F710 Unknown Unknown Unknown
z2d.exe 00000000059B4040 Unknown Unknown Unknown
libpthread.so.0 000000382FC0F710 Unknown Unknown Unknown
libmpi.so.12 00002AAD1016F800 Unknown Unknown Unknown
libmpi.so.12 00002AAD0FFB114B Unknown Unknown Unknown
libmpi.so.12 00002AAD100BCE39 Unknown Unknown Unknown
libmpi.so.12 00002AAD100BCB32 Unknown Unknown Unknown
libmpi.so.12 00002AAD0FF962F9 Unknown Unknown Unknown
libmpi.so.12 00002AAD0FF95D5D Unknown Unknown Unknown
libmpi.so.12 00002AAD0FF95BDC Unknown Unknown Unknown
libmpi.so.12 00002AAD0FF95B0C Unknown Unknown Unknown
libmpi.so.12 00002AAD0FF97932 Unknown Unknown Unknown
z2d.exe 00000000015DCCB9 Unknown Unknown Unknown
z2d.exe 00000000005C59D5 Unknown Unknown Unknown
z2d.exe 0000000000557A80 Unknown Unknown Unknown
z2d.exe 0000000000553999 MAIN__ 171 z2d_1by1.f90
z2d.exe 0000000000552C1E Unknown Unknown Unknown
libc.so.6 000000382F81ED1D Unknown Unknown Unknown
z2d.exe 0000000000552B29 Unknown Unknown Unknown
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Letian, are you takling about 11.3.3 version of MKL? How we may reproduce this case? can you give the example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady,
Please find the attached example, Here is what I tried:
-
Using one right hand side, got the solution correctly
-
Put NX63 matrixes into B, and solve the equations with 63 right hand sides, also got the solutions correctly
-
Put NX1000 matrix into B, system run out of memory during back substitution, program crashed
-
So I tried alternative was, put the substitution 1 by 1 and put it in a do loop, then the program crashed after several solutions. Attached is the test code I could duplicate the problem in GE Global research linux cluster:
-
Z2d_1by1_demo.f90 – source code, generate a large sparse matrix (3.5M X 3.5M), do phase=11, then phase=22, then 2000 phase=33 in a do loop
-
Z2d.out – output file, stopped at solution 245 right hand side
-
Use_script.stderr - error message
-
Rank?????.error – write out the return ERROR from cluster_pardiso after each iteration, all the returned ERROR was 0, but the code crashed at iteration 245
-
I ran this code with 4 MPI, 20openmps each MPI
-
Please test at intel side, and see if you can repeat the same error. If my code has problem, please let me know.
By the way, when I read the output during substitution phase:
Times:
======
Time spent in direct solver at solve step (solve) : 1.819070 s
Time spent in additional calculations : 16.224561 s
Total time spent : 18.043631 s
It seems the additional calculations spent much more time than solver itself, that makes the back substitution not efficient. What is the additional calculation?
Thanks.
Letian

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page