Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

segfaults seen with openmpi compiled against intel fortran compiler

edward_s_3
Beginner
501 Views

I compiled openmpi against the intel fortran compiler by following the instructions here:

https://software.intel.com/en-us/articles/performance-tools-for-software-developers-building-open-mpi-with-the-intel-compilers

When I use the resulting mpif90 to compile my example code below I am seeing a sigsegv segfault on the MPI_FINALIZE line.  I found that this is dependent on the size of the array I send via MPI_SEND.  The array size is set via the count variable.  If I set this at 505 or less I never get a segfault.  If I set it to 506 or more I always get a segfault.  I do not see this behavior with openmpi compiled against gfortran.

Sending and recieving messages seems to be working fine.  I have used arrays of up to 5000 elements and I printed and visually inspected the recieved data and saw no problem with it.  MPI_SEND and MPI_RECV are blocking so they must have finished before I get to the MPI_FINALIZE call.  I looked at the ierr value on MPI_SEND and MPI_RECV and it was zero for both.

I followed the instructions here:

https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors

and could not determine the cause of the segfault.  Has this behavior been observed before and if so is there a workaround?

program main 
use mpi
 
integer :: rank, size, to, from, tag, count, i, ierr 
integer :: src, dest 
integer :: st_source, st_tag, st_count 
integer :: status(MPI_STATUS_SIZE) 
double precision, allocatable :: data(:) 

 
call MPI_INIT( ierr ) 
call MPI_COMM_RANK( MPI_COMM_WORLD, rank, ierr ) 
call MPI_COMM_SIZE( MPI_COMM_WORLD, size, ierr ) 
print *, 'Process ', rank, ' of ', size, ' is alive' 
dest = size - 1 
src = 0
count  = 506
allocate(data(count))
if (rank .eq. src) then 
   to     = dest 
   tag    = 2001 
   do i=1,count
     data(i) = i 
   end do
   call MPI_SEND( data, count, MPI_DOUBLE_PRECISION, to, tag, MPI_COMM_WORLD, ierr ) 
else if (rank .eq. dest) then 
   tag   = MPI_ANY_TAG 
   from  = MPI_ANY_SOURCE 
   call MPI_RECV(data, count, MPI_DOUBLE_PRECISION, from, tag, MPI_COMM_WORLD, status, ierr )
end if

call MPI_FINALIZE( ierr )

end program

 

0 Kudos
5 Replies
Xiaoping_D_Intel
Employee
501 Views

Which version of OpenMPI and Intel compiler are you using? I tried OpenMPI 1.8.4 with Intel compiler 16.0 update3  and couldn't reproduce the segmentation fault using your sample code. 

Thanks,

Xiaoping Duan

Intel Customer Support

0 Kudos
edward_s_3
Beginner
501 Views

I used:
intel compiler 16.0.2
Open MPI 1.10.2

I will try OpenMPI 1.8.4 today.  Are newer versions of open mpi not supported?  The newest stable release is 2.0.0 but I have not tried that yet.

0 Kudos
Xiaoping_D_Intel
Employee
501 Views

Yes, 1.10 is supported. It is better than 1.8 because there is a known issue in 1.8 runtime library which may cause segmenation fault at start up for an application built with OpenMPI + Intel fortran compier. 

Just tried the same configuration as yours and found no error. Can you share the complete command lines of building and running of your sample code?

 

Thanks,

Xiaoping Duan

Intel Customer Support

0 Kudos
edward_s_3
Beginner
501 Views

i have tried openmpi 1.8.4 and 2.0.0 since my last post and both have the same issue.  I compile with:

mpif90 -g -o main main.f90

and then I run with either:

mpirun -n 3 main

or

mpirun -n 3 xterm -e gdb main

if I want to see the backtrace. 

My mpif90 is configured like this:

$ which mpif90
/usr/local/bin/mpif90
$ which ifort
/opt/intel/compilers_and_libraries_2016.2.181/linux/bin/intel64/ifort
$ mpif90 --showme
ifort -I/usr/local/include -I/usr/local/lib -Wl,-rpath -Wl,/usr/local/lib -Wl,--enable-new-dtags -L/usr/local/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi

The backtrace is below

(gdb) bt full
#0  0x00007fffefe36bd0 in ?? ()
No symbol table info available.
#1  0x00007ffff0ef0166 in mca_mpool_grdma_finalize ()
   from /usr/local/lib/openmpi/mca_mpool_grdma.so
No symbol table info available.
#2  0x00007ffff74bf249 in mca_mpool_base_close ()
   from /usr/local/lib/libmpi.so.12
No symbol table info available.
#3  0x00007ffff61df25d in mca_base_framework_close ()
   from /usr/local/lib/libopen-pal.so.13
No symbol table info available.
#4  0x00007ffff7479079 in ompi_mpi_finalize () from /usr/local/lib/libmpi.so.12
No symbol table info available.
#5  0x00007ffff778655a in pmpi_finalize__ ()
   from /usr/local/lib/libmpi_mpifh.so.12
No symbol table info available.
#6  0x00000000004085a8 in main () at main.f90:32
        st_count = 0
        st_tag = 0
        st_source = 0
        dest = 2
        src = 0
        ierr = 0
---Type <return> to continue, or q <return> to quit---
        i = 507
        tag = 2001
        size = 3
        rank = 0
        count = 506
        from = 0
        to = 2
        status = (0, 0, 0, 0, 0, 0)
        data = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, ...)
        mpi_bottom = 0
        mpi_in_place = 0
        mpi_argv_null = ('\000')
---Type <return> to continue, or q <return> to quit---
        mpi_argvs_null = ('\000')
        mpi_errcodes_ignore = (0)
        mpi_status_ignore = (0, 0, 0, 0, 0, 0)
        mpi_statuses_ignore = (( 0, 0, 0, 0, 0, 0) )
        mpi_unweighted = 0
        mpi_weights_empty = 0
#7  0x000000000040811e in main ()
No symbol table info available.

 

0 Kudos
Xiaoping_D_Intel
Employee
501 Views

Using the same command lines and still couldn't reproduce the error. I built and installed the library into my home folder and not into "/usr/local".

I even tried removing "mca_mpool_grdma.so" where the failure point is located in your dump from the lib path and only got errors like during the running:

[snb04:23072] mca: base: component_find: unable to open /home/xxx/openmpi-110-icc16/lib/openmpi/mca_mpool_grdma.so: File not found (ignored)

The code still can print output and exit without seg-fault error. 

 

Thanks,

Xiaoping Duan

Intel Customer Support

0 Kudos
Reply