Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
2275 Discussions

Intel MPI THREAD SPLIT Model causes segmentation fault

zhang__eric
Beginner
6,697 Views

I ran the thread_split_omp_for.c, thread_split_omp_task.c and thread_split_pthreads.c in example folder. However, if I set I_MPI_THREAD_SPLIT=1, these example failed.

As for thread_split_omp_for.c, I try the folloing configuration:

MPI Version (release_mt): 2019.0.117(no output), 19.1(seg fault), 2019.3.199(seg fault)

OFED version: MLNX_OFED_LINUX-4.4-2.0.7.0 (OFED-4.4-2.0.7)

Environment variables (and several combinations of these variables):

export I_MPI_THREAD_SPLIT=1
export I_MPI_THREAD_RUNTIME=openmp
export I_MPI_THREAD_MAX=2
export I_MPI_FABRICS=tcp:tcp 
export I_MPI_DEBUG=5

The output is empty or as following shows 

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 98389 RUNNING AT i1
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 98390 RUNNING AT i1
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

I don't know whether it is a configuration mistake or an MPI bug. Could you help me with this?

0 Kudos
1 Solution
Mikhail_S_Intel
Employee
6,697 Views

Hi Eric,

Please try to use IMPI 2019 U4. Till U4 we supported thread-split for Omni-Path only. Since U4 we support InfiniBand and Ethernet too. And I would suggest to start with IMB-MT as example.

source mpivars.sh release_mt

I_MPI_THREAD_SPLIT=1 OMP_NUM_THREADS=4 I_MPI_DEBUG=1 mpiexec.hydra -n 2 -ppn 1 -hosts h1,h2 IMB-MT -thread_level multiple allreducemt

...
[0] MPI startup(): THREAD_SPLIT mode is switched on, 4 endpoints in use

...
# #processes = 2 (threads: 4)
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
           16         1000        13.60        16.39        14.31
           32         1000        13.65        16.45        14.40
           64         1000        15.38        18.61        16.28
          128         1000        15.48        18.59        16.30
 

View solution in original post

0 Kudos
3 Replies
zhang__eric
Beginner
6,697 Views

What's more, some times I only get output from master threads

xxx@i1:~/examples$ I_MPI_DEBUG=5 ./mpi_run.sh
MPI startup(): tcp:tcp fabric is unknown or has been removed from the product, please use ofi or shm:ofi instead

[0] MPI startup(): libfabric version: 1.8.0a1-impi

[0] MPI startup(): libfabric provider: verbs;ofi_rxm

[0] MPI startup(): THREAD_SPLIT mode is switched on, 1 endpoints in use
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       76995    i1         {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,52,53,54,55,
                                 56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77}
[0] MPI startup(): 1       76996    i1         {26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,78
                                 ,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103}
[0] MPI startup(): I_MPI_ROOT=/opt/spack/spack-avx512/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mpi-2019.3.199-ooicvtdn7kvu2yr7dzoggbn4tizn5eri/compilers_and_libraries_2019.3.199/linux/mpi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=tcp:tcp
[0] MPI startup(): I_MPI_THREAD_SPLIT=1
[0] MPI startup(): I_MPI_THREAD_RUNTIME=openmp
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vnis: 1
[0] MPI startup(): threading: app_threads: 52
[0] MPI startup(): threading: runtime: openmp
[0] MPI startup(): threading: is_threaded: 1
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: num_pools: 64
[0] MPI startup(): threading: lock_level: nolock
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 0
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 8
[0] MPI startup(): threading: library is built with per-object thread granularity
Thread 0: allreduce returned 0
Thread 0: allreduce returned 0

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 76995 RUNNING AT i1
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 76996 RUNNING AT i1
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

with environment as

export I_MPI_THREAD_SPLIT=1
export I_MPI_THREAD_RUNTIME=openmp
export OMP_NUM_THREADS=2
export I_MPI_THREAD_MAX=2
export I_MPI_DEBUG=5

0 Kudos
Mikhail_S_Intel
Employee
6,698 Views

Hi Eric,

Please try to use IMPI 2019 U4. Till U4 we supported thread-split for Omni-Path only. Since U4 we support InfiniBand and Ethernet too. And I would suggest to start with IMB-MT as example.

source mpivars.sh release_mt

I_MPI_THREAD_SPLIT=1 OMP_NUM_THREADS=4 I_MPI_DEBUG=1 mpiexec.hydra -n 2 -ppn 1 -hosts h1,h2 IMB-MT -thread_level multiple allreducemt

...
[0] MPI startup(): THREAD_SPLIT mode is switched on, 4 endpoints in use

...
# #processes = 2 (threads: 4)
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
           16         1000        13.60        16.39        14.31
           32         1000        13.65        16.45        14.40
           64         1000        15.38        18.61        16.28
          128         1000        15.48        18.59        16.30
 

0 Kudos
zhang__eric
Beginner
6,697 Views

Success with IMPI 19 U4. Thanks a lot.

BTW, if there were version requirements in Developer Reference, it would help many others from this pitfalls. :)

0 Kudos
Reply