My final target is to implement the Intel MKL solver into my SMP program to replace the current SOR method. I hope it would be faster.
I altered the sample code of Preconditioned CG to solve a linear system of i,j,k = 100 (uploaded code)
If the code compiled with
ifort -qmkl cg_jacobi_precon.f90
it runs normally with implicit parallization as expected.
However, if I compile it with qopenmp flag (not even adding openMP directives inside code)
ifort -qmkl -qopenmp -mcmodel=large -shared-intel cg_jacobi_precon.f90
and run with ./a.out , sigmentation fault occurred.
forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source libifcoremt.so.5 0000149902209359 for__signal_handl Unknown Unknown libpthread-2.27.s 00001498F61B4980 Unknown Unknown Unknown libmkl_avx512.so. 00001498F501C64C mkl_spblas_lp64_a Unknown Unknown libmkl_intel_thre 00001498FE44E595 mkl_spblas_lp64_d Unknown Unknown libiomp5.so 00001498F944A893 __kmp_invoke_micr Unknown Unknown libiomp5.so 00001498F93BDCB3 Unknown Unknown Unknown libiomp5.so 00001498F93BEF7D __kmp_fork_call Unknown Unknown libiomp5.so 00001498F937A425 __kmpc_fork_call Unknown Unknown libmkl_intel_thre 00001498FE44E11B mkl_spblas_lp64_d Unknown Unknown libmkl_intel_thre 00001498FE1CD051 mkl_spblas_lp64_m Unknown Unknown a.out 0000000000401CC0 Unknown Unknown Unknown a.out 0000000000400F62 Unknown Unknown Unknown libc-2.27.so 00001498F5DD2C87 __libc_start_main Unknown Unknown a.out 0000000000400E6A Unknown Unknown Unknown
If I lower the size of the system to i, j = 100 , k = 10 , everything seems fine.
Back to i,j,k=100, I tried
ulimit -s unlimited
but nothing solved.
I am not familiar to Intel-MKL, please tell me if I've done anything wrong with it.
※By the way, 100^3 isn't really a large array, I do computational simulation with a 10x larger system on the same environment with openMP and no problem occur.
Seeing that you are calling MKL from a threaded application...
... are you mistakenly linking with the MKL threaded library?
Threaded applications should (generally) link with the sequential MKL library.
Sequential applications should (generally) link with the threaded MKL library.
IOW threading takes place in one or the other not both.
Edit: use -qmkl=sequential when your application is parallel (e.g. OpenMP)
thank you for replying. -qmkl=sequential will terribly slow down the program which is the opposite of my aim.
The MKL call is outside of the openMP parallel region and it should be fine (under certain condition, I think).
The prove is, the sample code I uploaded works at array-size 100x100x10,
however, it failed at array-size 100x100x100(Failed when compile with openmp flag. And sequential code works well)
Something go wrongs, why?
My idea is to call the MKL threaded library in the sequential part of my parallel program. Here is the pseudo code.
Please tell me how to do this alternatively if I cannot put it all in "ONE program"
Program Start Use library and Declare variable !$OMP PARALLEL DO calculation unrelated to MKL !$OMP END PARALLEL DO Call threaded MKL library !$OMP PARALLEL DO other calculation unrelated to MKL !$OMP END PARALLEL DO END
I (mistakenly) presumed your MKL calls were within a parallel region.
>> it failed at array-size 100x100x100(Failed when compile with openmp flag. And sequential code works well)
This sounds like: the master thread succeeds - the/an additional OpenMP threads fail.
Use OMP_STACKSIZE environment variable to specify the stack size for the OpenMP created threads.
ulimit specifies the process (i.e. master thread) stack size.
>>libmkl_avx512.so. 00001498F501C64C mkl_spblas_lp64_a Unknown Unknown
Indicated the failure is located in MKL (presumably called from the sequential part of your application)
IOW not related to the sequential part of the application.
>> it failed at array-size 100x100x100
100x100x100x8 = sizeof(array) = 8,000,000
working space for MKL call might be 3x this, or 24MB
Try KMP_STACKSIZE=100m (that failing, 50m)
Also, enable the trace line numbering to get the line number of the CALL into MKL. In front of that call place
IF(OMP_IN_PARALLEL()) STOP "OMP_IN_PARALLEL()"
This is to verify you are not blind-sided by other parts of your code.
Nothing wrong with this, your code will use 16 threads, MKL will use 16 threads (total of 31 threads with master thread of each being the same thread). When working, you might want to experiment with KMP_BLOCKTIME=0 (or some small number)