- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
using Intel ifort 15, I compiled OpenMPI 1.8.4 using the following configure line:
../configure --prefix=<path to installdir> --with-openib --with-sge CC=icc FC=ifort CXX=icpc
Unfortunately, compiling our hybrid MPI + OpenMP code with the resulting MPI compiler wrapper results in a binary which segfaults instantaneously after startup.
Please consider the following example which shows the same behavior as our large code:program test use mpi integer :: ierr real(8) :: a call mpi_init(ierr) call random_number(a) write(*,*)"hello" call mpi_finalize(ierr) end program test
Compiler Versions:
>mpif90 --version
ifort (IFORT) 15.0.1 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
>icc --version
icc (ICC) 15.0.1 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
>icpc --version
icpc (ICC) 15.0.1 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
How to compile:
>mpif90 -openmp test.f90
Error:
>./a.out
Segmentation fault
Best,
Bastian
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not seeing any evidence in the thread you linked to that it is a compiler bug, just a "toss it over the wall" speculation. The program seems to work fine with Intel MPI. Do you have anything more concrete to suggest it is a compiler issue? Can you tell where it is getting the segfault?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
please find here the specific post http://www.open-mpi.org/community/lists/users/2014/11/25837.php that suggested to file a bug report here.
Debug flags do not reveal any additional information:
>mpif90 -check all -traceback -fp-stack-check -fpe0 -fpe-all=0 -ftrapuv -fstack-protector-all -mp1 -g -no-opt-assume-safe-padding -openmp test.f90 >./a.out Segmentation fault
However, gdb suggest that it might be a problem in init_resource ():
>gdb ./a.out GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /kernph/schaba00/programs/openmpi-1.8.4_Intel15/a.out...done. (gdb) run Starting program: /kernph/schaba00/programs/openmpi-1.8.4_Intel15/a.out [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. 0x000000000040b984 in init_resource () Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.4.x86_64 libgcc-4.4.7-4.el6.x86_64 libxml2-2.7.6-14.el6_5.2.x86_64 zlib-1.2.3-29.el6.x86_64
Best,
Bastian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I saw that - but there's no evidence it's a compiler bug. init_resource is not our routine.
By the way, don't bother using -ftrapuv for anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Out of curiosity, what happens with your #1 test program when you comment out the call random_number and write statements (make no other changes)?
What I am probing at is something that may be related to a different thread. One where the user called omp_get_wtime() before the first parallel region, then issued a fork() causing child processes to inherit an initialize OpenMP environment. His symptom was segfault.
Note, although your little test program is not calling an OpenMP library routine, it may be interacting with OpenMP in an unexpected manner. You should consider single stepping through a disassembly window, first over function calls then into, as you locate where errors occur.
I think Steve is on to something with "init_resource is not our routine" You may need to discover where this call is made from.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
when the call random_number and the write statement are commented, the program runs without any error. The segfault is onyl present, if also the call to the random number generator is present.
The below code will also produce the segfault (write statement commented). Please find below also the corresponding assembly code.
program test use mpi integer :: ierr real(8) :: a call mpi_init(ierr) call random_number(a) ! write(*,*)"hello" call mpi_finalize(ierr) end program test
# mark_description "Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.133 Build 2"; # mark_description "0141023"; .file "test.f90" .text ..TXTST0: # -- Begin MAIN__ .text # mark_begin; .align 16,0x90 .globl MAIN__ MAIN__: ..B1.1: # Preds ..B1.0 ..___tag_value_MAIN__.1: #test.f90:1.9 pushq %rbp #test.f90:1.9 ..___tag_value_MAIN__.3: # movq %rsp, %rbp #test.f90:1.9 ..___tag_value_MAIN__.4: # andq $-128, %rsp #test.f90:1.9 subq $128, %rsp #test.f90:1.9 xorl %esi, %esi #test.f90:1.9 movl $3, %edi #test.f90:1.9 call __intel_new_feature_proc_init #test.f90:1.9 # LOE rbx r12 r13 r14 r15 ..B1.10: # Preds ..B1.1 stmxcsr (%rsp) #test.f90:1.9 movl $.2.4_2_kmpc_loc_struct_pack.1, %edi #test.f90:1.9 xorl %esi, %esi #test.f90:1.9 orl $32832, (%rsp) #test.f90:1.9 xorl %eax, %eax #test.f90:1.9 ldmxcsr (%rsp) #test.f90:1.9 ..___tag_value_MAIN__.6: #test.f90:1.9 call __kmpc_begin #test.f90:1.9 ..___tag_value_MAIN__.7: # # LOE rbx r12 r13 r14 r15 ..B1.2: # Preds ..B1.10 movl $__NLITPACK_0.0.1, %edi #test.f90:1.9 call for_set_reentrancy #test.f90:1.9 # LOE rbx r12 r13 r14 r15 ..B1.3: # Preds ..B1.2 lea (%rsp), %rdi #test.f90:5.10 ..___tag_value_MAIN__.8: #test.f90:5.10 call mpi_init_ #test.f90:5.10 ..___tag_value_MAIN__.9: # # LOE rbx r12 r13 r14 r15 ..B1.4: # Preds ..B1.3 call for_random_number #test.f90:6.10 # LOE rbx r12 r13 r14 r15 ..B1.5: # Preds ..B1.4 lea (%rsp), %rdi #test.f90:8.10 ..___tag_value_MAIN__.10: #test.f90:8.10 call mpi_finalize_ #test.f90:8.10 ..___tag_value_MAIN__.11: # # LOE rbx r12 r13 r14 r15 ..B1.6: # Preds ..B1.5 movl $.2.4_2_kmpc_loc_struct_pack.12, %edi #test.f90:9.1 xorl %eax, %eax #test.f90:9.1 ..___tag_value_MAIN__.12: #test.f90:9.1 call __kmpc_end #test.f90:9.1 ..___tag_value_MAIN__.13: # # LOE rbx r12 r13 r14 r15 ..B1.7: # Preds ..B1.6 movl $1, %eax #test.f90:9.1 movq %rbp, %rsp #test.f90:9.1 popq %rbp #test.f90:9.1 ..___tag_value_MAIN__.14: # ret #test.f90:9.1 .align 16,0x90 ..___tag_value_MAIN__.16: # # LOE # mark_end; .type MAIN__,@function .size MAIN__,.-MAIN__ .data .align 4 .align 4 .2.4_2_kmpc_loc_struct_pack.1: .long 0 .long 2 .long 0 .long 0 .quad .2.4_2__kmpc_loc_pack.0 .align 4 .2.4_2__kmpc_loc_pack.0: .byte 59 .byte 117 .byte 110 .byte 107 .byte 110 .byte 111 .byte 119 .byte 110 .byte 59 .byte 77 .byte 65 .byte 73 .byte 78 .byte 95 .byte 95 .byte 59 .byte 49 .byte 59 .byte 49 .byte 59 .byte 59 .space 3, 0x00 # pad .align 4 .2.4_2_kmpc_loc_struct_pack.12: .long 0 .long 2 .long 0 .long 0 .quad .2.4_2__kmpc_loc_pack.11 .align 4 .2.4_2__kmpc_loc_pack.11: .byte 59 .byte 117 .byte 110 .byte 107 .byte 110 .byte 111 .byte 119 .byte 110 .byte 59 .byte 77 .byte 65 .byte 73 .byte 78 .byte 95 .byte 95 .byte 59 .byte 57 .byte 59 .byte 57 .byte 59 .byte 59 .section .rodata, "a" .align 8 .align 8 __NLITPACK_0.0.1: .long 0x00000002,0x00000000 .data # -- End MAIN__ .data .comm mpi_fortran_bottom_,4,32 .comm mpi_fortran_in_place_,4,32 .comm mpi_fortran_argv_null_,1,32 .comm mpi_fortran_argvs_null_,1,32 .comm mpi_fortran_errcodes_ignore_,4,32 .comm mpi_fortran_status_ignore_,24,32 .comm mpi_fortran_statuses_ignore_,24,32 .comm mpi_fortran_unweighted_,4,32 .comm mpi_fortran_weights_empty_,4,32 .section .note.GNU-stack, "" // -- Begin DWARF2 SEGMENT .eh_frame .section .eh_frame,"a",@progbits .eh_frame_seg: .align 8 .4byte 0x00000014 .8byte 0x7801000100000000 .8byte 0x0000019008070c10 .4byte 0x00000000 .4byte 0x00000034 .4byte 0x0000001c .8byte ..___tag_value_MAIN__.1 .8byte ..___tag_value_MAIN__.16-..___tag_value_MAIN__.1 .byte 0x04 .4byte ..___tag_value_MAIN__.3-..___tag_value_MAIN__.1 .2byte 0x100e .byte 0x04 .4byte ..___tag_value_MAIN__.4-..___tag_value_MAIN__.3 .4byte 0x8610060c .2byte 0x0402 .4byte ..___tag_value_MAIN__.14-..___tag_value_MAIN__.4 .8byte 0x00000000c608070c .2byte 0x0000 # End
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you do a WRITE before the call to mpi_init and see the output? My guess is that the segfault is happening inside the call to mpi_init.
RANDOM_NUMBER uses thread-local storage and maybe there's something in OpenMPI that doesn't like that - just a guess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The program crashes before mpi_init (at least the "hello" doesn't appear on the stdout):
>cat test.f90 program test use mpi integer :: ierr real(8) :: a write(*,*)"hello" call mpi_init(ierr) call random_number(a) call mpi_finalize(ierr) end program test >mpif90 -openmp test.f90 >./a.out Segmentation fault
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How about a write to unit 10, followed by a close of (10), and look for fort.10. I'm trying to figure out how far it gets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The fort.10 file isn't even created:
>cat test.f90 program test use mpi integer :: ierr write(10,*)"hello" close(10) call mpi_init(ierr) call random_number(a) call mpi_finalize(ierr) contains end program test >build/install/bin/mpif90 -openmp test.f90 >./a.out Segmentation fault >cat fort.10 cat: fort.10: No such file or directory
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have an idea why they crash happens so early?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unseen in the program listing above is the hidden call to initialize the Fortran Runtime Library. A guess at what is going on is the environment specified library load path points to an incompatible Fortran Runtime Library. IOW the .so file(s) are found, but are incompatible. What's odd about this is the program works (possibly by accident) when you comment out a few calls.
I seem to recall that there is an option to display the .so library paths as they are loaded. Can't remember what it is or if it is available on Linux.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I now tried a new version of the intel compiler. This resulted in a somewhat more detailed report:
>mpif90 --version ifort (IFORT) 15.0.2 20150121 Copyright (C) 1985-2015 Intel Corporation. All rights reserved. >cat test.f90 program test use mpi integer :: ierr real(8) :: a call mpi_init(ierr) call random_number(a) write(*,*)"hello" call mpi_finalize(ierr) end program test >mpif90 -check all -traceback -fp-stack-check -fpe0 -fpe-all=0 -fstack-protector-all -mp1 -g -no-opt-assume-safe-padding -openmp test.f90
>./a.out a.out:21535 terminated with signal 11 at PC=40bb94 SP=7fffc9b45f10. Backtrace: ./a.out[0x40bb94] ./a.out[0x40bafa] ./a.out(for__reentrancy_init+0x118)[0x40ba18] /opt/soft/apps/OpenMPI/1.8.4-iccifort-2015.2.164-GCC-4.9.2/lib/libmpi_usempif08.so.0(for_rtl_init_+0x55)[0x2add2343d405] ./a.out[0x403aa9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2add24ba1d5d] ./a.out[0x403939]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bastian,
since you're combining MPI and OpenMP, I'd suggest to use MPI_Init_thread with MPI_THREAD_FUNNELED as the required level of thread support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
this doesn't make a difference:
>cat test.f90 program test use mpi integer :: ierr, iprov real(8) :: a call mpi_init_thread(MPI_THREAD_FUNNELED,iprov,ierr) call random_number(a) write(*,*)"hello" call mpi_finalize(ierr) end program test >mpif90 --version ifort (IFORT) 15.0.2 20150121 Copyright (C) 1985-2015 Intel Corporation. All rights reserved. >mpif90 -check all -traceback -fp-stack-check -fpe0 -fpe-all=0 -fstack-protector-all -mp1 -g -no-opt-assume-safe-padding -openmp test.f90 >./a.out a.out:79331 terminated with signal 11 at PC=40bba4 SP=7fffeb4ef190. Backtrace: ./a.out[0x40bba4] ./a.out[0x40bb0a] ./a.out(for__reentrancy_init+0x118)[0x40ba28] /opt/soft/apps/OpenMPI/1.8.4-iccifort-2015.2.164-GCC-4.9.2/lib/libmpi_usempif08.so.0(for_rtl_init_+0x55)[0x2abceb27e405] ./a.out[0x403aa9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2abcec9e2d5d] ./a.out[0x403939]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although the standard specifies use of mpi_init_thread, we've seen that OpenMPI and Intel MPI support customer expectation that it's not required for MPI_THREAD_FUNNELED usage. Anyway, the quoted test case doesn't go into threading.
It used to be that running under gdb (there is a special gdb to be installed with ifort/icc) could produce a more complete backtrace. If you assert a fault in icc, you may need to find the location in your compiled code and view that.
It still may be that the OpenMPI support mail list could be more helpful. It should be possible to build OpenMPI with gcc/g++/ifort as a check to verify your belief that icc is at fault. There would not necessarily be any advantage in replacing gcc/g++ in the MPI build.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
I find it curious that for__reentrancy_init and for_rtl_init are not located within the same library.
./a.out(for__reentrancy_init+0x118)[0x40ba28]
/opt/soft/apps/OpenMPI/1.8.4-iccifort-2015.2.164-GCC-4.9.2/lib/libmpi_usempif08.so.0(for_rtl_init_+0x55)[0x2abceb27e405]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
Tim,
I find it curious that for__reentrancy_init and for_rtl_init are not located within the same library.
./a.out(for__reentrancy_init+0x118)[0x40ba28]
/opt/soft/apps/OpenMPI/1.8.4-iccifort-2015.2.164-GCC-4.9.2/lib/libmpi_usempif08.so.0(for_rtl_init_+0x55)[0x2abceb27e405]Jim Dempsey
I'm not so familiar with OpenMPI, nor would I expect others on Intel forums to know about it either. I might guess that OpenMPI has a separate Fortran library for Fortran08 bindings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an interesting observation. It suggests that the OpenMPI .so is including pieces of the Intel Fortran library. This is not good.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Possibly an overly presumptuous use (fault) of IPO.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It was also reported that a build of MVAPICH2 2.0 with the Intel 15.0 compilers caused the same issue - instant SEGV on the test case provided here.
I did a debug build from scratch of MVAPICH2-2.0.1 using ifort/icc-15.0.2. Since I don't have a host with infiniband, I configured for TCP/shared mem. For reference, my configuration:
./configure --prefix=/usr/local/MVAPICH2-2.0.1 --with-device=ch3:sock CC=icc CXX=icpc F77=ifort FC=ifort --enable-cxx --enable-hybrid --enable-fc --enable-g=all --enable-error-messages=all
Using the test case given in the description, I can't reproduce the issue:
[hacman@starch7]$ ~/bin/mpirun_rsh -hostfile ~/hosts -np 8 ./CQ367787.f90-mvapich.x
hello
hello
hello
hello
hello
hello
hello
hello
[hacman@starch7]$
Since this was also configured as a hybrid build, I checked that as well:
[hacman@starch7 LNX]$ ~/bin/mpif90 --version
ifort (IFORT) 15.0.2 20150121
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.
[hacman@starch7 LNX]$ ~/bin/mpif90 -openmp hello_omp_mpi.f90 -o hello_omp_mpi.f90-mvapich.x
[hacman@starch7 LNX]$ ~/bin/mpirun_rsh -hostfile ~/hosts -np 4 ./hello_omp_mpi.f90-mvapich.x
MPI Process number 0 of 4 is alive
MPI Process number 3 of 4 is alive
MPI Process number 1 of 4 is alive
MPI Process number 2 of 4 is alive
Number of OMP threads 12 on process 3
Number of OMP threads 12 on process 0
Number of OMP threads 12 on process 2
Process 3 got an MPI message from process 0
Process 2 got an MPI message from process 0
Number of OMP threads 12 on process 1
Process 1 got an MPI message from process 0
[hacman@starch7 LNX]$
Patrick

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page