Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2154 Discussions

Using MPI in parallel OpenMP regions

philippe_m_
Beginner
851 Views

Hi all,

I am trying to call MPI from within OpenMP regions, but I cannot have it working properly; my program compiles OK using mpiicc (4.1.1.036) and icc (13.1.2 20130514). I checked that it was linked against thread-safe libraries (libmpi_mt.so appears when I run ldd).

But when I try to run it (2 Ivybridge nodes x 2 MPI tasks x 12 OpenMP threads), I get a SIGSEGV without any backtrace :

/opt/softs/intel/impi/4.1.1.036/intel64/bin/mpirun -np 4 -ppn 2 ./mpitest.x

APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

Or with debug level set to 5 :

/opt/softs/intel/impi/4.1.1.036/intel64/bin/mpirun -np 4 -ppn 2 ./mpitest.x
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[1] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): shm and dapl data transfer modes
[2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[3] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[2] MPI startup(): shm and dapl data transfer modes
[3] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[3] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): Rank    Pid      Node name   Pin cpu
[0] MPI startup(): 0       90871    beaufix522  {0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,30,31,32,33,34,35}
[0] MPI startup(): 1       90872    beaufix522  {12,13,14,15,16,17,18,19,20,21,22,23,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): 2       37690    beaufix523  {0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,30,31,32,33,34,35}
[0] MPI startup(): 3       37691    beaufix523  {12,13,14,15,16,17,18,19,20,21,22,23,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,15,15,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx4_0:0
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 12
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

Of course, if I use a single OpenMP thread, everything works fine. I also tried to wrap calls to MPI into critical regions, which works, but is not what I want.

My program is just a small test case to figure out whether I can try this pattern inside a bigger program. For each MPI task, all OpenMP threads are used to send messages to other tasks, and afterwards, all OpenMP threads are used to receive messages from other tasks.

My questions are :

  • does my program conforms to the thread level MPI_THREAD_MULTIPLE (which btw is returned by MPI_Init_thread) ?
  • is IntelMPI supposed to run it correctly ?
  • if not, will it work someday ?
  • what can I do now (extra tests, etc...) ?

Best regards,

Philippe

0 Kudos
1 Solution
James_T_Intel
Moderator
851 Views

Hi Philippe,

One solution is to use MPI_Send or MPI_Isend instead of MPI_Bsend.  Will either of these work in your program?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

View solution in original post

0 Kudos
4 Replies
TimP
Honored Contributor III
851 Views

Among the simpler possibilities are that you need to allow more stack, either in the shell or kmp_stacksize or both.

0 Kudos
philippe_m_
Beginner
851 Views

Hello Tim,

I already had OMP_STACKSIZE=20000M and ulimit -s unlimited; I added KMP_STACKSIZE=20000M and got this:

Fatal error in MPI_Bsend: Internal MPI error!, error stack:
MPI_Bsend(195)..............: MPI_Bsend(buf=0x2ae0f1fff3ec, count=2, MPI_INT, dest=2, tag=0, MPI_COMM_WORLD) failed
MPIR_Bsend_isend(226).......:
MPIR_Bsend_check_active(456):
MPIR_Test_impl(63)..........:
MPIR_Request_complete(227)..: INTERNAL ERROR: unexpected value in case statement (value=0)
APPLICATION TERMINATED WITH THE EXIT STRING: Interrupt (signal 2)

Regards,

Philippe

0 Kudos
James_T_Intel
Moderator
852 Views

Hi Philippe,

One solution is to use MPI_Send or MPI_Isend instead of MPI_Bsend.  Will either of these work in your program?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
TimP
Honored Contributor III
851 Views

I have never seen a successful setting of KMP_STACKSIZE greater than 40M.  I guess OMP_STACKSIZE would be preferable but would mean the same thing. With 24 threads each set to KMP_STACKSIZE of 20GB you would need 480GB per node just for the thread stacks.  I haven't seen a system where ulimit -s unlimited could give you that much.

0 Kudos
Reply