We have been observing issues with allocation of thread-private data in OpenMP loops recently, specifically on Macs. I have not been very successful in creating a good reproducer, but the example below demonstrates some of the problems. We are using XCODE SDK 11.7, compiled with -qopenmp and -O2. The following problem demonstrates the issues we are observing:
```Fortran
Link Copied
Confirmed that we are seeing the same issues with Ifort 21.1
This is an area where macOS differs from Linux. macOS has hard (and small) default stacksizes. ulimit and OMP_STACKSIZE cannot overcome this limit. Observe:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1392
virtual memory (kbytes, -v) unlimited
clin17-mobl1:~ rwgreen$ ulimit -s unlimited
-bash: ulimit: stack size: cannot modify limit: Operation not permitted
Some years back I wrote THIS ARTICLE
on macOS you need to increase the stacksize through the linker
ifort -Wl,-stack_size,0x10000000
Try this and let us know if it fixes the issue.
Ronald,
I think Tony already responded to your response from user support, so apologies for a double response. I just thought I would post a response here as well, so there is a record of it on the Forum for others.
I compiled the above minimal example with the command:
ifort -qopenmp -Wl,-stack_size,0x10000000 test.F90
on Intel Fortran 19.1.3.
I then ran it 100 times as follows:
for i in {1..100}; do ./a.out; done
Roughly 3/100 times:
Repeating this with Intel Fortran 19.0.8, no errors occur.
I'm trying to understand the example. I suspect it is because the real code is hopefully different.
First the allocation with the 3rd dimension bounds 1:0
allocate(arr(1:3,1:8,1:0), source=zero)
This allocation will succeed and create a 0-sized array. Does your real application do this also? Is this what you expected? Here's a sample with f(:,:,:) allocated first
f(1:3,1:8,1:1). ! reasonable
then
f(1:3,1:8,1:0). ! empty array
more foo.f90
program foo
implicit none
real(8), parameter :: zero = 0.0_8
real(8), allocatable :: f(:,:,:)
integer :: allocate_status
allocate( f(1:3,1:8,1:1), source=zero,stat=allocate_status )
print*, "allocate( f(1:3,1:8,1:1)"
print*, "size f", size(f), " alloc stat=",allocate_status
print*, "shape f ", shape(f)
print*, ""
deallocate( f )
allocate( f(1:3,1:8,1:0), source=zero,stat=allocate_status )
print*, "allocate( f(1:3,1:8,1:0)"
print*, "size f", size(f), " alloc stat=",allocate_status
print*, "shape f ", shape(f)
end program foo
depepper-MOBL1:q04958028 rwgreen$
depepper-MOBL1:q04958028 rwgreen$ ifort -O0 -g -check all -o foo foo.f90
depepper-MOBL1:q04958028 rwgreen$ ./foo
allocate( f(1:3,1:8,1:1)
size f 24 alloc stat= 0
shape f 3 8 1
allocate( f(1:3,1:8,1:0)
size f 0 alloc stat= 0
shape f 3 8 0
So I just changed your sample to 1:1 to remove anomalies with that strange array allocation. I assume it's a fluke.
I did see the error on the exit once, so I believe there is something there. I'm trying to see if I can reproduce it more consistently.
Thanks for looking into this.
In some cases the allocation really is `(1:n)` with n=0 in the real application. There is logic elsewhere in the code to avoid accessing any of the zero-sized arrays. We certainly could (and probably should) avoid this, but it never showed up as an issue in the past, so I imagine the developers left it in there.
I only used those dimensions because it was the one that showed up most consistently in the real application. I think in reality it is showing up consistently at that location because it is the site of the first allocate we have inside an OpenMP loop in the real application. We have observed failures (both during the allocation and during the program close out) with allocations of a reasonable size.
I have confirmed that I see the failure occasionally with an allocation with 1:2 for the third dimension in the reproducer. I was able to get the code to fail a little more consistently (1 in 20 or so, not sure it actually changed) if I modified the code by calling the alloc and dealloc, and looping over it multiple times. I attached the changed input.
The resulting error is:
test-openmp 0000000108D2D2D4 for__signal_handl Unknown Unknown
libsystem_platfor 00007FFF688755FD _sigtramp Unknown Unknown
test-openmp 0000000108D4249C for_alloc_allocat Unknown Unknown
test-openmp 0000000108D071A2 MAIN__ Unknown Unknown
Simon
For more complete information about compiler optimizations, see our Optimization Notice.