- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
it took me a while to compile an example which produces the same problem as in my library, but finally I got one. Complied with "ifort -qopenmp -heap-arrays -O3" the code below produces either a segfault (if pt in subroutine subset is nullified at the end), or nonsens values in tmp (if pt is not nullified in subroutine subset).
Module mod1 Type :: type1 real :: val real, Allocatable :: tmp(:,:) character(:), allocatable :: what contains procedure, pass :: set => subset procedure, pass :: sel => subselect End type type1 Type :: type1pt class(type1), Pointer :: pt End type type1pt private :: subset, subselect contains subroutine subselect(this) class(type1), Intent(InOUt) :: this Select case(this%what) Case("a") write(*,*) this%what call this%Set() End Select end subroutine subselect subroutine subset(this) class(type1), Intent(InOUt), target :: this real, Pointer :: pt(:,:)=>null() integer :: i write(*,*) "I am here" allocate(this%tmp(30000,15)) nullify(pt) pt=>this%tmp pt=0.0 Do i=1,10000 pt=this%val End Do !nullify(pt) End subroutine subset End Module mod1 Module ModMom use Mod1 Type :: mother Type(type1pt), Allocatable :: tvmyfolks(:) contains Procedure, Pass :: SetFolks=>SubSetFolks End type mother contains Subroutine SubSetFolks(this) class(mother), intent(inout) :: this integer :: i,j j=size(this%TVMyfolks) !$OMP PARALLEL DO Do i=1,j call this%TVMyfolks(i)%pt%Sel() End Do !$OMP END PARALLEL DO End Subroutine SubSetFolks End Module ModMom Program Test use mod1 use modmom Type(type1), Target :: a,b,c,d Type(mother) :: mom integer :: i allocate(mom%TVMyfolks(4)) mom%TVMyfolks(1)%pt=>a mom%TVMyfolks(2)%pt=>b mom%TVMyfolks(3)%pt=>c mom%TVMyfolks(4)%pt=>d Do i=1,4 mom%TVMyfolks(i)%pt%val=i mom%TVMyFolks(i)%pt%what="a" End Do call mom%SetFolks() Do i=1,4 write(*,*) mom%TVMyfolks(i)%pt%tmp(1,:) end Do End Program Test
The problem vanishes if -heap-array is omitted from the compiler flags. I don't know whether my code is wrong or whether this is a compiler bug. However, I don't have problems when using gfortran (I think Steve once mentioned in one of my openmp threads that gf has "-heap-arrays" on by default.
While the pointer in "subset" is not really necessary, the whole behaviour concerns me. In fact it first appeared when pt in in subset was not decleared at all, but used in an associate statement.
Any Idea
Thanks a lot
Karl
PS: ifort version 16.0.3, linux kernel 4.6.3
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe this is an optimization problem when eliminating unnecessary code. What happens when you comment out lines 31,32 and 34 (essentially do what the optimizer should do). If this "corrects" the problem, then this is a good reproducer for Intel to use in diagnosis of this problem.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
just a follow up on this. The problem vanishs when not using the pointer, and also does not show up when using associate(x=>this%tmp). However, I am wondering whether pointer construct isn't supposed to work as well.
cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to reproduce the behavior. This looks like a race condition. I saw it with the version 16 compiler, but not with earlier ones.
By default, the compiler allocates temporary variables on the stack, which automatically makes them thread safe in threaded regions where each thread has its own stack. -heap-arrays overrides this and causes temporaries to be allocated on the heap, which may require complex synchronization to make it work for parallel regions.
However, I believe your source code contains a race condition on the variable pt in subroutine subset, due to the declaration
real, Pointer :: pt(:,:) =>null()
This initialization forces the pointer to be stored statically, (as if it had a SAVE attribute), instead of on the stack. (Otherwise, it wouldn't be available for initialization at program startup). This results in each thread using the same static pointer, which leads to the race condition and overwrites. You should be able to confirm this using Intel Inspector. The fix is simply to remove the initialization from the declaration statement:
real, Pointer :: pt(:,:)
It is not needed, since you have nullify(pt) before pt is ever used. A separate copy of Pt would now get allocated on the stack for each thread at run-time; -heap-arrays converts these to allocations on the heap at run-time that remain thread safe, and the program runs correctly.
I think there is a potential race condition, whether or not you use -heap-arrays, and with whichever compiler, but it may not get exposed, depending on the actual memory layout. I was able to reproduce the problem even without the -heap-arrays switch.
Nevertheless, I do not recommend the use of -heap-arrays with threaded code in general. Apart from the risk of thread safety issues, there is a performance impact due to the additional synchronization that increases with the number of threads. Yes, without -heap-arrays, you may need to increase the shell stack limit, e.g. with ulimit -s unlimited (I do this automatically for any OpenMP program). You may sometimes need to increase the thread stack size above its default value of a few MB using the environment variable OMP_STACKSIZE. But it's a safer, more efficient way of working.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Martyn,
thank you for this very elaborate answer. The original recommendation of using -heap-arrays came somewhat out of this thread:
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/610751
While the above code does not implement making a private copy of each array for each thread (as in the above link), the tmp arrays within each child may be large (e.g. 10,000,000 x 100), and each child is assigned to a single thread. Is there any chance for runnig again into segfaults due to running out of stack??
Thanks a lot
Karl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-heap-arrays shouldn't be a problem - the pointer is kept on the stack and the allocation is done thread-safe as all other allocations are. At least that's the way it's supposed to work. I would go ahead and use heap-arrays unless you find it creates problems.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you will run out of stack unless you set OMP_STACKSIZE appropriately, as well as ulimit -s unlimited.
If you do a lot of allocations to the heap from inside a parallel region, you will pay a price in performance due to the synchronization. But if only a few, that won't be important and -heap-arrays may be easier.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page