Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Riccardo_Rossi
Beginner
85 Views

Problems with mkl pardiso - compiler 11.1.072

Dear list,

i am writing here to report our problems with the latest version of the intel MKL pardiso solver on linux.

The versions up to the 11.1.038 appear to work fine (we tested on different hardware, including I7, core, and AMD)

unfortunately when we tried to use the 11.1.072 we get a segmentation fault on solving in release mode.

When we attempt running using valgrind, we get clean output (in release mode) and correct results.

Our interface code can be find at the page

http://kratos.cimne.upc.es/trac/browser/kratos/applications/mkl_solvers_application/external_include...


I saw in other threads that a classical problem may be related to ILP. We checked the thing carefully and we believe that the problem is not there.

On the other hand we are using "bjam" for linking which implies that we can not prescribe the order of the libraries to be included.
I understand that this may be an issue for static libs, nevertheless should not be so in our case as we link shared libs.
am i correct in this assumptiion?

any help would be very welcome as we are stuck...

thank you
Riccardo

0 Kudos
12 Replies
Chao_Y_Intel
Employee
85 Views

Hi Riccardo,

Did the application link with MKL threaded libraries or non threaded libraries? Also in which phase does the solver report the error?

Thanks,
Chao

Riccardo_Rossi
Beginner
85 Views

Dear Chao,
thank you for your attention

our intention is to link to the threaded libs

this is the result of doing an ldd of our shared library (it is compiled to be a python module)

rrossi@rrossi-desktop:~/kratos/libs$ ldd KratosMKLSolversApplication.so
linux-vdso.so.1 => (0x00007fffe8fc8000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f7818652000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f781844d000)
libguide.so => /opt/intel/Compiler/11.1/072/lib/intel64/libguide.so (0x00007f78182a5000)
libboost_python.so.1.43.0 => /home/rrossi/compiled_libraries/boost_1_43/lib/libboost_python.so.1.43.0 (0x00007f7818052000)
libpython2.6.so.1.0 => /usr/lib/libpython2.6.so.1.0 (0x00007f7817ba0000)
libmkl_p4n.so => /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_p4n.so (0x00007f7816816000)
libmkl_lapack.so => /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_lapack.so (0x00007f7815baf000)
libmkl_mc.so => /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_mc.so (0x00007f781469e000)
libmkl_core.so => /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.so (0x00007f78142eb000)
libmkl_intel_thread.so => /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.so (0x00007f78130a7000)
libmkl_intel_ilp64.so => /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_ilp64.so (0x00007f7812cf6000)
librt.so.1 => /lib/librt.so.1 (0x00007f7812aee000)
libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so (0x00007f781275a000)
libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so (0x00007f7812543000)
libm.so.6 => /lib/libm.so.6 (0x00007f78122c0000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f7811fac000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f7811d94000)
libintlc.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00007f7811c56000)
libc.so.6 => /lib/libc.so.6 (0x00007f78118d3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7818bae000)
libutil.so.1 => /lib/libutil.so.1 (0x00007f78116cf000)
libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f781147f000)
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f78110ef000)
libz.so.1 => /lib/libz.so.1 (0x00007f7810ed7000)

i tried including iomp5 instead of guide but nothing changes

the information of the system follows:
Linux rrossi-desktop 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:20:59 UTC 2010 x86_64 GNU/Linux

the gcc version installed in the system (in case it has any importance) is:
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3

the weird thing is that it fails with a segfault when i run it normally, while it runs fine (and without leakage) in valgrind. This makes me think to a threading problem

investigating a little on this issue, i realized that if i change the number of threads i can make it to work

export OMP_NUM_THREADS=1 ---> WORKS
export OMP_NUM_THREADS=2 ---> WORKS
export OMP_NUM_THREADS=4 ---> segfault
export OMP_NUM_THREADS=8 ---> segfault (note that i have only 4 cores...the other ones are the HT...)

system size is of the order of 100k dofs.

If needed i can print the matrix and vector in .mm format

ok...thank you again for your attention
Riccardo

P.S. i forgot to say that the error appears to be after the reordering, that is, looking at the code at the link

http://kratos.cimne.upc.es/trac/browser/kratos/applications/mkl_solvers_application/external_include...

the error appears after line 252 and before line 266 (no error is issued ... simply segfault)


85 Views

Hi,

Could you print here the value ofiparm[63] after reorder step and attach initial matrix and rhs for deeper investigation of problem?
With best regards,
Alexander Kalinkin
Riccardo_Rossi
Beginner
85 Views

after printing the matrix the solver finished ... this really sounds as some sort of threading problem...

the output follows

Size of the problem: 117040
Size of index1_vector: 117041
Size of index2_vector: 1621688
pardiso_solver: line 156
pardiso_solver: line 161
number of threads: 8
pardiso_solver: line 241
pardiso_solver: line 251
Reordering completed ...
iparm[63] : 102000113
Factorization completed ...
pardiso_solver: line 267
pardiso_solver: line 277
pardiso_solver: line 285
pardiso_solver: line 289
pardiso_solver: line 294
pardiso_solver: line 296
pardiso_solver: line 298
#### SOLVER TIME: 2.25507 ####

the value of iparm[63] is
iparm[63] : 102000113

i attached the matrix...

thank's a lot for your attention
Riccardo


Artem_V_Intel
Employee
85 Views

Hello Riccardo,

Thank you for the test case. Could you please also show link line you used to link against MKL?

Best regards,
Artem
Riccardo_Rossi
Beginner
85 Views

Hi,
sorry for the delay in replying, i did not realize there was an answer

"icpc" -L"/home/rrossi/compiled_libraries/boost_1_43/lib" -L"/opt/intel/Compiler/11.1/072/mkl/lib/em64t" -o "/home/rrossi/kratos/applications/mkl_solvers_application/bin/intel-linux/release/threading-multi/KratosMKLSolversApplication.so" -Wl,-soname -Wl,KratosMKLSolversApplication.so -shared "/home/rrossi/kratos/applications/mkl_solvers_application/bin/intel-linux/release/threading-multi/add_linear_solvers_to_python.o" "/home/rrossi/kratos/applications/mkl_solvers_application/bin/intel-linux/release/threading-multi/mkl_solvers_python_application.o" "/home/rrossi/kratos/applications/mkl_solvers_application/bin/intel-linux/release/threading-multi/mkl_solvers_application.o" "../../kratos/bin/intel-linux/release/link-static/threading-multi/libkratos.a" "/home/rrossi/kratos/external_libraries/gidpost/bin/intel-linux/release/link-static/threading-multi/libgidpost.a" -lpthread -ldl -lguide -lboost_python -lpython2.6 -lmkl_p4n -lpthread -lmkl_lapack -lmkl_mc3 -lmkl_mc -lmkl_solver_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_intel_ilp64 -lrt -Wl,--strip-all -pthread



this is the command line we use, as generate by Bjam.
we can not prescribe the order of the libraries as the boost-build system does not allow to do it (at least to the best of my knowledge)

thank you
Riccardo

barragan_villanueva_
Valued Contributor I
85 Views

Hi,

If you are using ILP64 MKL libraries then -DMKL_ILP64 option is needed for C++ compiler.

Also why MKL CPU-specific libraies are used directly?
I mean: -lmkl_p4n ... -lmkl_mc

Why -pthread at the end?
Riccardo_Rossi
Beginner
85 Views

Hi Victor,
the -DMKL_ILP64 was correctly prescribed, the piece i posted was only the linking part.

indeed i can exclude -lmkl_p4n and -lmkl_mc, what i can not do and stil run (with occasional failures).

i simply can not exclude mkl_mc3

on the other hand i included pthread following the command linea advisory tool that i found in the intel website.
does it do any harm?

changing subject i have the feeling that two runs of the program are not independent from each other, as if the openmp was still active or something of the type. can something of this type happen??

in any case thank you for the attention
Riccardo

barragan_villanueva_
Valued Contributor I
85 Views

Well, you added-lpthread according to MKL Link Advosor but I still see-pthread at the end of linking line.

As to CPU-specific libraies, if you create your own shared library based on MKL-libs, then it needs to add those MKL CPU-specific libraries which needed (maybe including -lmkl-def to work on older architectures like PIII). mkl_mc3 is needed to run on Nehalem.

Also, please try using -liomp5 instead of -lguide....
Riccardo_Rossi
Beginner
85 Views

Hi,

concerning the pthreads and lpthreads i guess that the boost-build adds it authomatically, so it is out of my control. Does it have any impact?

i recompiled everything to use iomp5, the code appears to work ... most of the times. It still stops from time to time... not very differently from what it was doing before.
what is the difference between iomp5 and guide? i saw in the documentation that guide is deprecated ... but ... is there any significative difference between the two?

i just wonder if having a failed run before may corrupt something in the next runs...
what i often do is a kill of the application and then relaunch it, can this mean that the openmp lib is not correctly stopped or anything of the type?

the only "strange" thing i do in the application is that i have an array of omp_locks and i initialize it in parallel. Is this allowed?

i also tried to put a
#pragma omp barrier

straight before the call to the solver to ensure that the threads are correctly syncronized at that point (actually they should sync authomatically at the end of the parallel region and i am never using nowait...still i made the attempt)
nothign changes...





barragan_villanueva_
Valued Contributor I
85 Views

concerning the pthreads and lpthreads i guess that the boost-build adds it authomatically, so it is out of my control. Does it have any impact?


-pthreads is just incorrect option

I'd recommend you to double check that you use MKL ILP64 libs correctly:
1) All files which use MKL should have #include
2) and be compiled with -DMKL_ILP64 option

Please add -Wall compiler optioneverywhere and analyzeall compiler warnings

85 Views

Hi Ricardo,
The link you provided work with matrix A as external parameter of function solv and it isn't clear what storage format you use in this case. Could you describe it? Or it would be great if you combine reading Matrix A from file you attach early with your code to speed-up investigate of you problem.
With best regards,
Alexander Kalinkin