Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28914 Discussions

Segmentation fault for large array using MKL while compiled with OpenMP option

SamW1
Beginner
2,039 Views

My final target is to implement the Intel MKL solver into my SMP program to replace the current SOR method. I hope it would be faster.

I altered the sample code of Preconditioned CG to solve a linear system of i,j,k = 100 (uploaded code)

If the code compiled with

ifort -qmkl cg_jacobi_precon.f90

it runs normally with implicit parallization as expected.

However,  if I compile it with qopenmp flag (not even adding openMP directives inside code)

ifort -qmkl -qopenmp -mcmodel=large -shared-intel cg_jacobi_precon.f90

and run with ./a.out , sigmentation fault occurred.

$OMP_NUM_THREADS=16

 

 

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libifcoremt.so.5   0000149902209359  for__signal_handl     Unknown  Unknown
libpthread-2.27.s  00001498F61B4980  Unknown               Unknown  Unknown
libmkl_avx512.so.  00001498F501C64C  mkl_spblas_lp64_a     Unknown  Unknown
libmkl_intel_thre  00001498FE44E595  mkl_spblas_lp64_d     Unknown  Unknown
libiomp5.so        00001498F944A893  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        00001498F93BDCB3  Unknown               Unknown  Unknown
libiomp5.so        00001498F93BEF7D  __kmp_fork_call       Unknown  Unknown
libiomp5.so        00001498F937A425  __kmpc_fork_call      Unknown  Unknown
libmkl_intel_thre  00001498FE44E11B  mkl_spblas_lp64_d     Unknown  Unknown
libmkl_intel_thre  00001498FE1CD051  mkl_spblas_lp64_m     Unknown  Unknown
a.out              0000000000401CC0  Unknown               Unknown  Unknown
a.out              0000000000400F62  Unknown               Unknown  Unknown
libc-2.27.so       00001498F5DD2C87  __libc_start_main     Unknown  Unknown
a.out              0000000000400E6A  Unknown               Unknown  Unknown

 

 

If I lower the size of the system to i, j = 100 , k = 10 , everything seems fine.

Back to i,j,k=100, I tried

ulimit -s unlimited

export KMP_STACKSIZE=1G

but nothing solved.

I am not familiar to Intel-MKL, please tell me if I've done anything wrong with it.

 

※By the way, 100^3 isn't really a large array, I do computational simulation with a 10x larger system on the same environment with openMP and no problem occur. 

 

Labels (3)
0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
2,020 Views

Seeing that you are calling MKL from a threaded application...

... are you mistakenly linking with the MKL threaded library?

Threaded applications should (generally) link with the sequential MKL library.

Sequential applications should (generally) link with the threaded MKL library.

IOW threading takes place in one or the other not both.

 

Edit: use -qmkl=sequential when your application is parallel (e.g. OpenMP)

 

Jim Dempsey

0 Kudos
SamW1
Beginner
2,004 Views

thank you for replying.   -qmkl=sequential will terribly slow down the program which is the opposite of my aim.

The MKL call is outside of the openMP parallel region and it should be fine (under certain condition, I think).

The prove is, the sample code I uploaded works at array-size 100x100x10,

however, it failed at array-size 100x100x100(Failed when compile with openmp flag. And sequential code works well)

Something go wrongs, why?

 

My idea is to call the MKL threaded library in the sequential part of my parallel program.  Here is the pseudo code.

Please tell me how to do this alternatively if I cannot put it all in "ONE program"

 

Program Start
Use library and Declare variable

!$OMP PARALLEL DO 
    calculation unrelated to MKL
!$OMP END PARALLEL DO

Call threaded MKL library

!$OMP PARALLEL DO 
    other calculation unrelated to MKL
!$OMP END PARALLEL DO

END

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,978 Views

I (mistakenly) presumed your MKL calls were within a parallel region.

>> it failed at array-size 100x100x100(Failed when compile with openmp flag. And sequential code works well)

This sounds like: the master thread succeeds -  the/an additional OpenMP threads fail.

Use OMP_STACKSIZE environment variable to specify the stack size for the OpenMP created threads.

ulimit specifies the process (i.e. master thread) stack size.

 

Jim Dempsey

 

0 Kudos
SamW1
Beginner
1,976 Views

I export OMP_STACKSIZE=1g together with/without KMP_STACKSIZE=1g  ,

segmentation fault occured in both cases

and one poped "OMP: Warning #182: OMP_STACKSIZE: ignored because KMP_STACKSIZE has been defined"

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,973 Views

>>libmkl_avx512.so. 00001498F501C64C mkl_spblas_lp64_a Unknown Unknown

Indicated the failure is located in MKL (presumably called from the sequential part of your application)

IOW not related to the sequential part of the application.

>>..._lp64

Indicates real(8)

>> it failed at array-size 100x100x100

100x100x100x8 = sizeof(array) = 8,000,000

working space for MKL call might be 3x this, or 24MB

Try KMP_STACKSIZE=100m (that failing, 50m)

 

Also, enable the trace line numbering to get the line number of the CALL into MKL. In front of that call place

     IF(OMP_IN_PARALLEL()) STOP "OMP_IN_PARALLEL()"

This is to verify you are not blind-sided by other parts of your code.

 

>>$OMP_NUM_THREADS=16

Nothing wrong with this, your code will use 16 threads, MKL will use 16 threads (total of 31 threads with master thread of each being the same thread). When working, you might want to experiment with KMP_BLOCKTIME=0 (or some small number)

 

Jim Dempsey

 

 

 

 

0 Kudos
Barbara_P_Intel
Employee
1,955 Views

I suggest you ask about this on the Intel MKL Forum.

0 Kudos
Reply