- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The following code example compiled with `mpiifort` produces a segfault error:
module test_intel_mpi_mod implicit none integer, parameter :: dp = kind(1.0d0) type :: Container complex(kind=dp), allocatable :: arr(:, :, :) end type contains subroutine test_intel_mpi() use mpi_f08, only: & MPI_Init_thread, & MPI_THREAD_SINGLE, & MPI_Finalize, & MPI_Comm_rank, & MPI_COMM_WORLD, & MPI_COMPLEX16, & MPI_Bcast integer :: provided integer :: rank type(Container) :: cont call MPI_Init_thread(MPI_THREAD_SINGLE, provided) call MPI_Comm_rank(MPI_COMM_WORLD, rank) allocate(cont % arr(1, 1, 1)) if (rank == 0) then cont % arr(1, 1, 1) = (1.0_dp, 2.0_dp) endif ! This works fine ---> call MPI_Bcast(cont % arr(1, 1, 1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD) call MPI_Bcast(cont % arr(:, :, 1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD) print *, rank, " after Bcast: ", cont % arr(1, 1, 1) call MPI_Finalize() end subroutine test_intel_mpi end module test_intel_mpi_mod program test_mpi use test_intel_mpi_mod call test_intel_mpi() end program test_mpi
The code is compiled simply as follows: `mpiifort -o test_mpi test_mpi.f90` and executed as `mpirun -np N ./test_mpi` (N = 1, 2, ...).
The output for N=2 is the following (also `-g -traceback` was added in this case):
0 after Bcast: (1.00000000000000,2.00000000000000) 1 after Bcast: (1.00000000000000,2.00000000000000) forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source test_mpi 000000000041475A Unknown Unknown Unknown libpthread-2.17.s 00002AE5C8C0A5D0 Unknown Unknown Unknown test_mpi 000000000040941D Unknown Unknown Unknown test_mpi 0000000000409D79 Unknown Unknown Unknown test_mpi 00000000004044C0 test_intel_mpi_mo 44 test_mpi.f90 test_mpi 00000000004044E0 MAIN__ 50 test_mpi.f90 test_mpi 0000000000403BA2 Unknown Unknown Unknown libc-2.17.so 00002AE5C913B3D5 __libc_start_main Unknown Unknown test_mpi 0000000000403AA9 Unknown Unknown Unknown forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source test_mpi 000000000041475A Unknown Unknown Unknown libpthread-2.17.s 00002AB3DEFB75D0 Unknown Unknown Unknown test_mpi 000000000040941D Unknown Unknown Unknown test_mpi 0000000000409D79 Unknown Unknown Unknown test_mpi 00000000004044C0 test_intel_mpi_mo 44 test_mpi.f90 test_mpi 00000000004044E0 MAIN__ 50 test_mpi.f90 test_mpi 0000000000403BA2 Unknown Unknown Unknown libc-2.17.so 00002AB3DF4E83D5 __libc_start_main Unknown Unknown test_mpi 0000000000403AA9 Unknown Unknown Unknown
The program crashes when it tries to exit the subroutine. The problem seems to be related to passing of the array section, `cont % arr(:, :, 1)`, to MPI_Bcast, as opposed to a reference to the first element, `cont % arr(1, 1, 1)` (this version of the call is left commented in the source code provided). At the same time, my understanding of the standard is that array sections, contiguous or not, are explicitly allowed in MPI 3.x (e.g., see https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node409.htm).
Details in the source are important to reproduce the segfault:
- The crash happens only if MPI_Bcast is called -- commenting it out prevents the error
- The subroutine must be in a module
- The array must be at least 3-dimensional, allocatable, and be contained in a derived type object
- Non-blocking MPI_Ibcast, as well as other collectives implying broadcast (e.g., Allreduce) give the same result
Compiler/library versions:
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.1.217 Build 20200306
IntelMPI is from the same build: 2019.7.pre-intel-19.1.0.166-7
Output with I_DEBUG_MPI=6:
[0] MPI startup(): libfabric version: 1.9.0a1-impi [0] MPI startup(): libfabric provider: psm2 [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 278604 l49 {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23} [0] MPI startup(): 1 278605 l49 {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31} [0] MPI startup(): I_MPI_ROOT=.... [0] MPI startup(): I_MPI_MPIRUN=mpirun [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_DEBUG=6
OS: CentOS Linux release 7.6.1810 (Core)
Kernel: 3.10.0-957.10.1.el7.x86_64
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Oleg,
We tried the code and reproduced the same at our end.
Can you share a use case as to why you want to send the array to MPI_Bcast like this arr(:,:, 1)?
We will investigate this further at our end and get back to you.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alternate work around:
module test_intel_mpi_mod implicit none integer, parameter :: dp = kind(1.0d0) type :: Container complex(kind=dp), allocatable :: arr(:, :, :) end type contains subroutine test_intel_mpi() ! use mpi_f08, only: & ! MPI_Init_thread, & ! MPI_THREAD_SINGLE, & ! MPI_Finalize, & ! MPI_Comm_rank, & ! MPI_COMM_WORLD, & ! MPI_COMPLEX16, & ! MPI_Bcast use mpi integer :: provided integer :: rank type(Container) :: cont integer :: ierror call MPI_Init_thread(MPI_THREAD_SINGLE, provided, ierror) call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror) allocate(cont % arr(1, 1, 1)) if (rank == 0) then cont % arr(1, 1, 1) = (1.0_dp, 2.0_dp) endif ! This works fine ---> call MPI_Bcast(cont % arr(1, 1, 1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD) !*bug* call MPI_Bcast(cont % arr(:, :, 1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD, ierror) call my_Bcast(cont % arr(:, :, 1), ierror) print *, rank, " after Bcast: ", cont % arr(1, 1, 1) call MPI_Finalize() end subroutine test_intel_mpi subroutine my_Bcast(arr, ierror) use mpi complex(kind=dp) :: arr(*) integer :: ierror call MPI_Bcast(arr(1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD, ierror) end subroutine my_Bcast end module test_intel_mpi_mod
I used Intel's mpi module in test. in my_Bcast, arr(1) should equivalence to the lowest bound of cont%arr(:,:,1) of caller.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove (Blackbelt) wrote:Thanks for the tip. Yes, this could be an option. It is likely that any workaround with a mapping onto an effective 1D array will work.
Alternate work around:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After I wrote that, this wold be much better:
ASSOCIATE (blob => cont % arr(:, :, 1))
call MPI_Bcast(blob, 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD, ierror)
END ASSOCIAT
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove (Blackbelt) wrote:This workaround indeed works. And it is definitely more elegant. I wonder what makes the compiler fail with the explicit expression.
After I wrote that, this wold be much better:
ASSOCIATE (blob => cont % arr(:, :, 1))
call MPI_Bcast(blob, 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD, ierror)
END ASSOCIATJim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
module test_assumed_rank_mod implicit none integer, parameter :: dp = kind(1.0d0) type :: Container real(kind=dp), allocatable :: arr(:, :, :) end type interface subroutine c_fun(arr, n) bind(C, name='c_fun') use iso_c_binding, only: c_ptr, c_int type(c_ptr), value :: arr integer(c_int), value :: n end subroutine end interface contains subroutine fun(arr, n) use iso_c_binding, only: c_ptr, c_int, c_loc !> arr will be treated as an array of real(dp) type(*), intent(inout) :: arr(..) integer, intent(in) :: n type(c_ptr) :: p_arr integer(c_int) :: c_n p_arr = c_loc(arr) c_n = n call c_fun(p_arr, c_n) end subroutine fun subroutine test_assumed_rank_arg() type(Container) :: cont allocate(cont % arr(1, 1, 1)) cont % arr(1, 1, 1) = 42.0_dp print *, " before `c_fun`: ", cont % arr(1, 1, 1) !---> works call fun(cont % arr(:, 1, 1), 1) call fun(cont % arr(:, :, 1), 1) print *, " after `c_fun`: ", cont % arr(1, 1, 1) end subroutine test_assumed_rank_arg end module test_assumed_rank_mod program test_poly use test_assumed_rank_mod call test_assumed_rank_arg() end programThe C-function is defined as follows (in c_fun.c):
#include <stdio.h> void c_fun(void *p, int n) { // Assume that `p` points to an array of double double *p_arr = (double *)p; printf("n = %d\n", n); printf("`p` contains: %lf\n", *p_arr); }Compiled together, this results in the same behavior as in the MPI example. Should I re-post this to the Intel-Compiler forum instead?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Should I re-post this to the Intel-Compiler forum instead?
Prasanth should be able to take it from here.
If Prasanth shows no followup, then in this forum, where you select different threads (not a reply to this thread), there is a tool button to report a bug. You can click on that and then post your sample (or hyper link to this thread) as a bug report.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
???? The Report Bug button is now missing ???
Make a new posting on the Fortran forum. include a hyper link to this thread.
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/852075
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>but also all HTML markup buttons have disappeared.
I see that ocuring occasionally too.
When that happens, (with browser in focus) I press Ctrl-N to open a new browser window at the same URL. The new window has the buttons back (then close older browser session).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Oleg,
Since you have raised a separate thread on the Fortran forum, we are closing this thread here.
If needed you can always start a new thread in this forum.
Regards
Prasanth
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page