Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

PARDISO segmentation fault

Koshkarev_A_
초급자
23,819 조회수

idbc wrote after 80% of LL' factorization:

Program received signal SIGSEGV
mkl_blas_mc_sgem2vu_odd () in /mnt/storage/opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/intel64/libmkl_mc.so

in the attachment there is matrix with the program and makefile to reproduce this fault.

Matrix is CSR 3-array-variation 1-based (Upper triangle part of hermitian matrix) with about 22 000 000 nonzeros and 64000x64000 size

The same program with smaller size worked, max size tested 17280x17280.

The program executed on the: MACHTYPE=x86_64-suse-linux; HP DL580 G5 with 4x Intel Xeon 7350

0 포인트
1 솔루션
Kirill_V_Intel
22,383 조회수

Hi John!

Of course I have a personal bias but I believe you would get a more decent support in case you start using Intel oneMKL PARDISO.

I am pretty sure that you will not have an issue like you described if you follow the described ways (like how to compile and link your code with oneMKL, e.g. from here https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html). Even if smth goes wrong, on this forum you'd get an answer about what is wrong or how to do it properly.

Best,
Kirill

원본 게시물의 솔루션 보기

0 포인트
46 응답
John48
초급자
8,115 조회수

Hi Black Belt,

Thank you for this; presumably, your work proves that I have compiler or linking (as I suspect) issues?

Did you use the same matrix problem as in the code I sent you as the results are not the same as in the manual - please see attached pages 43-46.

Many thanks again,

John

 

0 포인트
mecej4
명예로운 기여자 III
8,088 조회수

John48 wrote: "Did you use the same matrix problem as in the code I sent you as the results are not the same as in the manual - please see attached pages 43-46."

I did, but there are inconsistencies in the matrix and vector values between the source code listings in the manual and the file pardiso_sym.c from the Pardiso-project.org site. Check the element in row 5, col 6, i.e.,  A(5,6) (= +1 or -1?) and the values in the array b ( 0 to 7, or 1 to 8?).

0 포인트
John48
초급자
8,056 조회수

Thank you Black Belt for all of your help.  

I am trying the MKL approach.

Best regards,

John

0 포인트
John48
초급자
8,007 조회수

Hi Black Belt,

If I run j4.c I get the output below which, by looking at your output files, is not the same as when you run it.

Please let me have any ideas you may have.

Many thanks,

John

with e.g. export PARDISOLICMESSAGE=1
***************************************************************************
[PARDISO]: License check was successful ...
[PARDISO]: Matrix type : real symmetric
[PARDISO]: Matrix dimension : 8
[PARDISO]: Matrix non-zeros : 18
[PARDISO]: Abs. coeff. range: min 0.00e+00 max 1.10e+01
[PARDISO]: RHS no. 1: min 0.00e+00 max 7.00e+00

================ PARDISO: solving a symmetric indef. system ================


Summary PARDISO 6.0.0: ( reorder to reorder )
=======================

Times:
======

Time fulladj: 0.000013 s
Time reorder: 0.000180 s
Time symbfct: 0.000066 s
Time parlist: 0.000009 s
Time malloc : -0.000285 s
Time total : 0.000793 s total - sum: 0.000810 s

Statistics:
===========
< Parallel Direct Factorization with #cores: > 1
< and #nodes: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 8
#non-zeros in A: 18
non-zeros in A (%): 28.125000
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 80
# of independent subgraphs: 0
< preprocessing with state of the art partitioning metis>
#supernodes: 5
size of largest supernode: 4
number of nonzeros in L 29
number of nonzeros in U 1
number of nonzeros in L+U 30
number of perturbed pivots 0
number of nodes in solve 8
Gflop for the numerical factorization: 0.000000

Reordering completed ...
Number of nonzeros in factors = 30
Number of factorization MFLOPS = 0
================ PARDISO: solving a symmetric indef. system ================


Summary PARDISO 6.0.0: ( factorize to factorize )
=======================

Times:
======

Time A to LU: 0.000001 s
Time numfct : 0.000069 s
Time malloc : -0.000661 s
Time total : 0.000774 s total - sum: 0.001365 s

Statistics:
===========
< Parallel Direct Factorization with #cores: > 1
< and #nodes: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 8
#non-zeros in A: 18
non-zeros in A (%): 28.125000
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 80
# of independent subgraphs: 0
< preprocessing with state of the art partitioning metis>
#supernodes: 5
size of largest supernode: 4
number of nonzeros in L 29
number of nonzeros in U 1
number of nonzeros in L+U 30
number of perturbed pivots 0
number of nodes in solve 8
Gflop for the numerical factorization: 0.000000
Gflop/s for the numerical factorization: 0.001076

Factorization completed ...
./runfile: line 14: 2802 Segmentation fault (core dumped) ./pardiso_sym
root@DESKTOP-8HR6ET2:/mnt/c/PARDISO# cp j4.c pardiso_sym.c

0 포인트
mecej4
명예로운 기여자 III
7,998 조회수

The discrepancy is caused by setting the R.H.S. vector equal to [0, 1, 2, ..., 7] in j4.c instead of [1, 2, 3, ..., 8]; the latter is what you can see on p.43 of the Pardiso 7.2 manual.

0 포인트
John48
초급자
7,991 조회수

Hi Black Belt,

Thank you for this but I was under the assumption that j4.c was the code you used for your runs?  Perhaps, I have misunderstood you?

I put b[] = i+1; in the code but it made no difference to my result.  It seems that the main matrix is causing the problems?

Regards,

John

0 포인트
mecej4
명예로운 기여자 III
7,982 조회수

Please attach a zip of the actual source file that you are using, and state the version of Pardiso that you are using. This thread is now rather long, and we now have too many versions of the code that you and I could have in mind, causing confusion.

Unlike you, I did not see access violations, only different results because of the different RHS vectors.

0 포인트
John48
초급자
7,967 조회수

Hi Black Belt,

Please see attached which should be the j4.c you sent me but with b[i]=i+1 as discussed.

Regards,

John

0 포인트
mecej4
명예로운 기여자 III
7,957 조회수

As I wrote previously, you have to set A(5,6) (in C notation, A[12]) to -1, rather than +1, to make the code match the manual. On line 47 of the source (counting starting with 1, not 0!), change "1" to "-1".

0 포인트
John48
초급자
7,943 조회수

Hi Black Belt

OK thanks; my output now agrees with the manual but not with yours.

It also still segments as before - at the solve to solve stage.  Could this be a linking problem?

Thank you for your help and patience.

Regards,

John

 

0 포인트
John48
초급자
7,934 조회수

Hi Black Belt

Would you mind please letting me have your program which we know works?

Thank you for your help again.

Regards,

John

0 포인트
mecej4
명예로운 기여자 III
7,962 조회수

Program source and results from Pardiso 6 are attached.

0 포인트
John48
초급자
7,954 조회수

Can't see an attachment.

John

0 포인트
John48
초급자
7,931 조회수

Thanks Black Belt but the job still crashes at the same place.

The following is a list of my compiler statement: 

gcc -g -o pardiso_sym pardiso_sym.c -L. -lpardiso600-GNU800-X86-64 -llapack -lrefblas -lgfortran -fopenmp -lpthread -lstdc++ -lm

Do you think the problem could be due to the one zero diagonal element?

Best regards,

John

 

0 포인트
mecej4
명예로운 기여자 III
7,923 조회수

No, I do not think that the matrix entries lead to any problem at all, because the same problem runs fine for me on Windows using two different versions of Lugano Pardiso (5 and 6), as well as the Pardiso in MKL, using drivers written in Fortran as well as C.

I do not have access to a Linux system currently, so I cannot try and see what the problems may be when the same program is compiled and run on Linux using gcc.

0 포인트
John48
초급자
7,908 조회수

Thanks again Black Belt.

It would be good if I could have outputs of the values in the matrices during your run to compare with mine.  Can this be done easily?

Regards,

John

0 포인트
mecej4
명예로운 기여자 III
7,899 조회수

I already gave you the output printed by the program. If you want to insert additional printf statements, you can do so. However, I think that you are wasting your time doing so, if you want to find what is going wrong and causing an access violation. Nor do I want to become your remote debugger.

Instead, use GDB or another debugger, and find out the line in the program that is causing the access violation.

0 포인트
John48
초급자
7,885 조회수

Hi Black Belt,

Thank you for all of you have done.  Your work has shown that my problem(s) are most probably due to linking and I will further investigate your output.

I did try the MKL option but the software would not download .

Best regards,

John

 

0 포인트
Kirill_V_Intel
8,148 조회수

Hi John,

So is there any reason you don't want to try oneMKL PARDISO?   

Best,
Kirill

0 포인트
John48
초급자
8,111 조회수

Hi Krill,

I could try if you think it would help.

Regards,

John

0 포인트
Kirill_V_Intel
22,384 조회수

Hi John!

Of course I have a personal bias but I believe you would get a more decent support in case you start using Intel oneMKL PARDISO.

I am pretty sure that you will not have an issue like you described if you follow the described ways (like how to compile and link your code with oneMKL, e.g. from here https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html). Even if smth goes wrong, on this forum you'd get an answer about what is wrong or how to do it properly.

Best,
Kirill

0 포인트
응답