program main_tmp implicit none integer :: i,j,n integer, allocatable :: x(:) !$omp parallel do default(none) private(i,j,n,x) do i = 1, 6 n = 10**i allocate (x(n), source=[(j, j=1,n)]) end do !$omp end parallel do end program
As far as I am aware you need to deallocate before you reallocate after the first allocation
You are correct. I tried to cut down the code and I shouldnt have deleted the external procedure.
The bug still arises.
program main_tmp implicit none external :: foo integer :: i !$omp parallel do default(none) private(i) do i = 1, 6 call foo(i) end do !$omp end parallel do end program subroutine foo(i) integer, intent(in) :: i integer :: j, n integer, allocatable :: x(:) n = 10**i allocate (x(n), source=[(j, j=1,n)]) end subroutine
Just supplying large arguments to any procedure gives a segmentation fault!? (See precedure bar)
module foo_m implicit none contains subroutine bar(iarr) integer, intent(in) :: iarr(:) integer :: k k = iarr(1) ! just any operation. otherwise the compiler might optimize this function away?! end subroutine subroutine foo(i) integer, intent(in) :: i integer :: j, n integer, allocatable :: x(:) n = 10**i allocate (x(n), source=[(j, j=1,n)]) end subroutine end module program main_tmp use foo_m implicit none integer :: i,j !$omp parallel do default(none) private(i,j) do i = 1, 6 ! call foo(i) ! uncomment to see that allocate(..,source=<something large>) will break ! call bar([(j, j=1,10**i)]) ! uncomment to see that precedure(<something large>) will break end do !$omp end parallel do end program
I cannot reproduce the error in Windows OS with PSXE 2020 u2 (126.96.36.199). Your example code in the post directly above works as expected in debug mode.
My command line:
/nologo /debug:full /MP /Od /Qopenmp /warn:all /module:"x64\Debug\\" /object:"x64\Debug\\" /Fd"x64\Debug\vc150.pdb" /traceback /check:all /libs:static /threads /dbglibs /c
Do you work on GNU/Linux? How does your build command line look like?
If the heap-array options helps, then the line
allocate (x(n), source=[(j, j=1,n)])
might creates a temporary array on the stack due to the implied loop. One could split the allocation and the initialzation with a consecutive do loop to check this. There had been a discussion some time ago regarding implied do loops, temps and stack... but I don't find it.
Further OMP creates overhead, which could have an impact on stack. But I'm unsure about this.
If speed doesn't matter, you could stick to the heap option. Stack is generally faster than heap. If you could identify the code section, which requires heap, you could maybe improve the speed. But remember, premature optimization is the root of all evil
ps: You override the defaut real and interger kind in your build command (-i8, ...) and that's twice defined. I would use this only for lagacy code and otherwise change it in the code.
The heap-arrays option did fix the segmentation fault and thus, it is clear that temporary arrays saved onto stack don't work nicely together with OMP.
I guess different threads get a specific stack size and if the temporary array is too large it will overwrite the stack of the next thread... Shouldn't the compiler know the stack sizes and not do that? It could automatically save them to the heap...
Now the programmer has to take care not to use implied do loops in OMP parallel regions (unless heap-arrays 0 is used)?
It is important to know where private copies of arrays are defined/located; being either on the shared heap or each thread's stack and then to knowing how big each thread stack needs to be.
My experience in Win-64 is that:
- private copies of allocatable arrays are placed on the heap. (seen as allocatable at the !$OMP region)
- private copies of automatic, local and argument arrays are placed on the thread stack.
- this location can be modified with compiler options.
- the master thread and other threads each have a stack size, which should be known and managed.
- for the master thread, private copies of arrays will be duplicated which means the master thread stack is likely to require a larger size.
I am learning how to use the stack in a 64-bit environment. My latest omp usage is to declare all stacks large (500MB). This only defines a virtual address space, while only the used portion of each stack is allocated physical memory. ie making the stacks much larger than "necessary" does not affect physical memory demand, but you need to have an idea of what "necessary" can be. ( You may also need to consider the virtual memory limit ). ifort has a similar approach with memory address when locating heap extensions.
This problem comes about because each thread stack is defined when each thread is initiated. They can not be extended, unlike the heap which can be extended up to the physical/virtual memory limit. If you are using !$OMP to speed up your program performance, relying on virtual memory is not a likely option.
I agree Paul, that the memory handling for OMP - given to the users responsibility - is not solved optimal in the current Intel compiler. There seems to be some changes recently coming with 19.0.x and newer. Maybe not always to the best?
A thread, which is loosly related to this issue, can be found here.
Maybe Intel can improve the user experience with OMP. Maybe the upcomming OneAPI compilers do a better job on this? Maybe we should use coarrays and let the compiler do the MPI/OpenMP whatever layer.