- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I compiled openmpi against the intel fortran compiler by following the instructions here:
When I use the resulting mpif90 to compile my example code below I am seeing a sigsegv segfault on the MPI_FINALIZE line. I found that this is dependent on the size of the array I send via MPI_SEND. The array size is set via the count variable. If I set this at 505 or less I never get a segfault. If I set it to 506 or more I always get a segfault. I do not see this behavior with openmpi compiled against gfortran.
Sending and recieving messages seems to be working fine. I have used arrays of up to 5000 elements and I printed and visually inspected the recieved data and saw no problem with it. MPI_SEND and MPI_RECV are blocking so they must have finished before I get to the MPI_FINALIZE call. I looked at the ierr value on MPI_SEND and MPI_RECV and it was zero for both.
I followed the instructions here:
https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors
and could not determine the cause of the segfault. Has this behavior been observed before and if so is there a workaround?
program main use mpi integer :: rank, size, to, from, tag, count, i, ierr integer :: src, dest integer :: st_source, st_tag, st_count integer :: status(MPI_STATUS_SIZE) double precision, allocatable :: data(:) call MPI_INIT( ierr ) call MPI_COMM_RANK( MPI_COMM_WORLD, rank, ierr ) call MPI_COMM_SIZE( MPI_COMM_WORLD, size, ierr ) print *, 'Process ', rank, ' of ', size, ' is alive' dest = size - 1 src = 0 count = 506 allocate(data(count)) if (rank .eq. src) then to = dest tag = 2001 do i=1,count data(i) = i end do call MPI_SEND( data, count, MPI_DOUBLE_PRECISION, to, tag, MPI_COMM_WORLD, ierr ) else if (rank .eq. dest) then tag = MPI_ANY_TAG from = MPI_ANY_SOURCE call MPI_RECV(data, count, MPI_DOUBLE_PRECISION, from, tag, MPI_COMM_WORLD, status, ierr ) end if call MPI_FINALIZE( ierr ) end program
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which version of OpenMPI and Intel compiler are you using? I tried OpenMPI 1.8.4 with Intel compiler 16.0 update3 and couldn't reproduce the segmentation fault using your sample code.
Thanks,
Xiaoping Duan
Intel Customer Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used:
intel compiler 16.0.2
Open MPI 1.10.2
I will try OpenMPI 1.8.4 today. Are newer versions of open mpi not supported? The newest stable release is 2.0.0 but I have not tried that yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, 1.10 is supported. It is better than 1.8 because there is a known issue in 1.8 runtime library which may cause segmenation fault at start up for an application built with OpenMPI + Intel fortran compier.
Just tried the same configuration as yours and found no error. Can you share the complete command lines of building and running of your sample code?
Thanks,
Xiaoping Duan
Intel Customer Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i have tried openmpi 1.8.4 and 2.0.0 since my last post and both have the same issue. I compile with:
mpif90 -g -o main main.f90
and then I run with either:
mpirun -n 3 main
or
mpirun -n 3 xterm -e gdb main
if I want to see the backtrace.
My mpif90 is configured like this:
$ which mpif90 /usr/local/bin/mpif90 $ which ifort /opt/intel/compilers_and_libraries_2016.2.181/linux/bin/intel64/ifort $ mpif90 --showme ifort -I/usr/local/include -I/usr/local/lib -Wl,-rpath -Wl,/usr/local/lib -Wl,--enable-new-dtags -L/usr/local/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
The backtrace is below
(gdb) bt full #0 0x00007fffefe36bd0 in ?? () No symbol table info available. #1 0x00007ffff0ef0166 in mca_mpool_grdma_finalize () from /usr/local/lib/openmpi/mca_mpool_grdma.so No symbol table info available. #2 0x00007ffff74bf249 in mca_mpool_base_close () from /usr/local/lib/libmpi.so.12 No symbol table info available. #3 0x00007ffff61df25d in mca_base_framework_close () from /usr/local/lib/libopen-pal.so.13 No symbol table info available. #4 0x00007ffff7479079 in ompi_mpi_finalize () from /usr/local/lib/libmpi.so.12 No symbol table info available. #5 0x00007ffff778655a in pmpi_finalize__ () from /usr/local/lib/libmpi_mpifh.so.12 No symbol table info available. #6 0x00000000004085a8 in main () at main.f90:32 st_count = 0 st_tag = 0 st_source = 0 dest = 2 src = 0 ierr = 0 ---Type <return> to continue, or q <return> to quit--- i = 507 tag = 2001 size = 3 rank = 0 count = 506 from = 0 to = 2 status = (0, 0, 0, 0, 0, 0) data = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, ...) mpi_bottom = 0 mpi_in_place = 0 mpi_argv_null = ('\000') ---Type <return> to continue, or q <return> to quit--- mpi_argvs_null = ('\000') mpi_errcodes_ignore = (0) mpi_status_ignore = (0, 0, 0, 0, 0, 0) mpi_statuses_ignore = (( 0, 0, 0, 0, 0, 0) ) mpi_unweighted = 0 mpi_weights_empty = 0 #7 0x000000000040811e in main () No symbol table info available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using the same command lines and still couldn't reproduce the error. I built and installed the library into my home folder and not into "/usr/local".
I even tried removing "mca_mpool_grdma.so" where the failure point is located in your dump from the lib path and only got errors like during the running:
[snb04:23072] mca: base: component_find: unable to open /home/xxx/openmpi-110-icc16/lib/openmpi/mca_mpool_grdma.so: File not found (ignored)
The code still can print output and exit without seg-fault error.
Thanks,
Xiaoping Duan
Intel Customer Support
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page