- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
program main_tmp
implicit none
integer :: i,j,n
integer, allocatable :: x(:)
!$omp parallel do default(none) private(i,j,n,x)
do i = 1, 6
n = 10**i
allocate (x(n), source=[(j, j=1,n)])
end do
!$omp end parallel do
end program
As far as I am aware you need to deallocate before you reallocate after the first allocation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are correct. I tried to cut down the code and I shouldnt have deleted the external procedure.
The bug still arises.
program main_tmp
implicit none
external :: foo
integer :: i
!$omp parallel do default(none) private(i)
do i = 1, 6
call foo(i)
end do
!$omp end parallel do
end program
subroutine foo(i)
integer, intent(in) :: i
integer :: j, n
integer, allocatable :: x(:)
n = 10**i
allocate (x(n), source=[(j, j=1,n)])
end subroutine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Even worse!!
Just supplying large arguments to any procedure gives a segmentation fault!? (See precedure bar)
module foo_m
implicit none
contains
subroutine bar(iarr)
integer, intent(in) :: iarr(:)
integer :: k
k = iarr(1) ! just any operation. otherwise the compiler might optimize this function away?!
end subroutine
subroutine foo(i)
integer, intent(in) :: i
integer :: j, n
integer, allocatable :: x(:)
n = 10**i
allocate (x(n), source=[(j, j=1,n)])
end subroutine
end module
program main_tmp
use foo_m
implicit none
integer :: i,j
!$omp parallel do default(none) private(i,j)
do i = 1, 6
! call foo(i) ! uncomment to see that allocate(..,source=<something large>) will break
! call bar([(j, j=1,10**i)]) ! uncomment to see that precedure(<something large>) will break
end do
!$omp end parallel do
end program
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paul,
I cannot reproduce the error in Windows OS with PSXE 2020 u2 (19.1.2.254). Your example code in the post directly above works as expected in debug mode.
My command line:
/nologo /debug:full /MP /Od /Qopenmp /warn:all /module:"x64\Debug\\" /object:"x64\Debug\\" /Fd"x64\Debug\vc150.pdb" /traceback /check:all /libs:static /threads /dbglibs /c
Do you work on GNU/Linux? How does your build command line look like?
BR, Johannes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I replied already multiple times but the website doesnt show it!?!?
Anyways, I uploaded my build instructions to pastebin. See here
PS: I do use GNU/Linux.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ran it on the latest VS 2019 Preview with the latest Intel using a standard Fortran program - it rang in 5 cores and then crashed with a stack error which you fix with heaps zero in VS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the heap-array options helps, then the line
allocate (x(n), source=[(j, j=1,n)])
might creates a temporary array on the stack due to the implied loop. One could split the allocation and the initialzation with a consecutive do loop to check this. There had been a discussion some time ago regarding implied do loops, temps and stack... but I don't find it.
Further OMP creates overhead, which could have an impact on stack. But I'm unsure about this.
If speed doesn't matter, you could stick to the heap option. Stack is generally faster than heap. If you could identify the code section, which requires heap, you could maybe improve the speed. But remember, premature optimization is the root of all evil
ps: You override the defaut real and interger kind in your build command (-i8, ...) and that's twice defined. I would use this only for lagacy code and otherwise change it in the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The heap-arrays option did fix the segmentation fault and thus, it is clear that temporary arrays saved onto stack don't work nicely together with OMP.
I guess different threads get a specific stack size and if the temporary array is too large it will overwrite the stack of the next thread... Shouldn't the compiler know the stack sizes and not do that? It could automatically save them to the heap...
Now the programmer has to take care not to use implied do loops in OMP parallel regions (unless heap-arrays 0 is used)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is important to know where private copies of arrays are defined/located; being either on the shared heap or each thread's stack and then to knowing how big each thread stack needs to be.
My experience in Win-64 is that:
- private copies of allocatable arrays are placed on the heap. (seen as allocatable at the !$OMP region)
- private copies of automatic, local and argument arrays are placed on the thread stack.
- this location can be modified with compiler options.
- the master thread and other threads each have a stack size, which should be known and managed.
- for the master thread, private copies of arrays will be duplicated which means the master thread stack is likely to require a larger size.
I am learning how to use the stack in a 64-bit environment. My latest omp usage is to declare all stacks large (500MB). This only defines a virtual address space, while only the used portion of each stack is allocated physical memory. ie making the stacks much larger than "necessary" does not affect physical memory demand, but you need to have an idea of what "necessary" can be. ( You may also need to consider the virtual memory limit ). ifort has a similar approach with memory address when locating heap extensions.
This problem comes about because each thread stack is defined when each thread is initiated. They can not be extended, unlike the heap which can be extended up to the physical/virtual memory limit. If you are using !$OMP to speed up your program performance, relying on virtual memory is not a likely option.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree Paul, that the memory handling for OMP - given to the users responsibility - is not solved optimal in the current Intel compiler. There seems to be some changes recently coming with 19.0.x and newer. Maybe not always to the best?
A thread, which is loosly related to this issue, can be found here.
Maybe Intel can improve the user experience with OMP. Maybe the upcomming OneAPI compilers do a better job on this? Maybe we should use coarrays and let the compiler do the MPI/OpenMP whatever layer.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page