Community
cancel
Showing results for 
Search instead for 
Did you mean: 
sbolding
Beginner
322 Views

Ifort 19.1.3, OpenMP SEGFAULT for Allocate and deallocate on macOSX 10.15.6 Catalina

We have been observing issues with allocation of thread-private data in OpenMP loops recently, specifically on Macs.  I have not been very successful in creating a good reproducer, but the example below demonstrates some of the problems.  We are using XCODE SDK 11.7, compiled with -qopenmp and -O2.   The following  problem demonstrates the issues we are observing:

```Fortran

module memory

use iso_fortran_env, only : real64

implicit none

real(real64), public, allocatable :: arr(:,:,:)
real(real64), parameter :: zero = 0.0_real64
!$OMP threadprivate(arr)

interface
module subroutine alloc()
end subroutine

module subroutine dealloc()
end subroutine
end interface
end module


submodule (memory) basic
contains
module subroutine alloc()
integer :: i(10000)
!$OMP parallel
! These dimensions caused problem in real code, but don't seem to be the root cause
allocate(arr(1:3,1:8,1:0), source=zero)
!$OMP end parallel
end subroutine

module subroutine dealloc()
!$OMP parallel
if (allocated(arr)) deallocate(arr)
!$OMP end parallel
end subroutine
end submodule

program test_openmp
!$ use omp_lib, only: omp_set_num_threads, omp_get_thread_num
use iso_fortran_env, only : real64
use memory, only : alloc, dealloc
implicit none

integer :: N, i, nthreads

N = 100000
nthreads = 25

call omp_set_num_threads(nthreads)

call alloc

! alternatively, tried doing many allocations and deallocations, don't see failures very often
!do i=1,N
!   call alloc()
!  call dealloc() 
!end do
end program
```
 
I occasionally see failures during cleanup at the end of the program running this example (in the real program there are many thread-private allocations that are not cleaned up manually):

```
test-openmp 000000010E9AD344 for__signal_handl Unknown Unknown
libsystem_platfor 00007FFF6AC565FD _sigtramp Unknown Unknown
test-openmp 000000010E9C250C for_alloc_allocat Unknown Unknown
test-openmp 000000010E987418 MAIN__ Unknown Unknown
```

I couldn't get it to reproduce on this file, but in the real program we also occasionally see a failure during the allocation (I commented out some lines that try to mimic this by doing many allocations and deallocations).  The error is usually something like the following:
 
```
<exe-name> 0000000101864DC4 Unknown Unknown Unknown
libsystem_platfor 00007FFF6AC565FD Unknown Unknown Unknown
<exe-name> 0000000101898F11 Unknown Unknown Unknown
libsystem_pthread 00007FFF6AC60009 Unknown Unknown Unknown
libsystem_pthread 00007FFF6AC62512 Unknown Unknown Unknown
libsystem_pthread 00007FFF6AC62114 Unknown Unknown Unknown
<lines where allocate was called>
```

Adding an OMP CRITICAL is a workaround that seems to fix the issue during allocations issue, but wanted to check on the issue because we have c++ code that also allocates inside of OMP loops that would be more difficult to wrap in a critical statement.
 
 This does not show up on Linux with the same compiler version.  We do have a lot of large stack arrays in the real application, but setting -heap-arrays and adjusting OMP_STACKSIZE seem to have no effect.  Setting -pthreads also had no effect, and removing the submodule had no effect.

Any guidance on what is going wrong or a more general workaround would be greatly appreciated!  
 
Simon

 

0 Kudos
7 Replies
sbolding
Beginner
279 Views

Confirmed that we are seeing the same issues with Ifort 21.1

Ronald_G_Intel
Moderator
254 Views

This is an area where macOS differs from Linux.  macOS has hard (and small) default stacksizes.  ulimit and OMP_STACKSIZE cannot overcome this limit.  Observe:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1392
virtual memory          (kbytes, -v) unlimited
clin17-mobl1:~ rwgreen$ ulimit -s unlimited
-bash: ulimit: stack size: cannot modify limit: Operation not permitted

 

Some years back I wrote THIS ARTICLE

on macOS you need to increase the stacksize through the linker 

ifort -Wl,-stack_size,0x10000000  

Try this and let us know if it fixes the issue.

sbolding
Beginner
234 Views

Ronald,

I think Tony already responded to your response from user support, so apologies for a double response.  I just thought I would post a response here as well, so there is a record of it on the Forum for others. 

I compiled the above minimal example with the command:

ifort -qopenmp -Wl,-stack_size,0x10000000 test.F90

on Intel Fortran 19.1.3.

 

I then ran it 100 times as follows:

for i in {1..100}; do ./a.out; done

 

Roughly 3/100 times:

 

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
a.out              0000000109673404  for__signal_handl     Unknown  Unknown
libsystem_platfor  00007FFF6AC565FD  _sigtramp             Unknown  Unknown
a.out              00000001096885CC  for_alloc_allocat     Unknown  Unknown
a.out              000000010964D4D4  MAIN__                Unknown  Unknown

 

 

Repeating this with Intel Fortran 19.0.8, no errors occur.

Ronald_G_Intel
Moderator
193 Views

I'm trying to understand the example.  I suspect it is because the real code is hopefully different.

First the allocation with the 3rd dimension bounds 1:0

allocate(arr(1:3,1:8,1:0), source=zero)

This allocation will succeed and create a 0-sized array.  Does your real application do this also?  Is this what you expected?  Here's a sample with f(:,:,:) allocated first

f(1:3,1:8,1:1).  ! reasonable

then

f(1:3,1:8,1:0). ! empty array

more foo.f90
program foo
implicit none
real(8), parameter :: zero = 0.0_8
real(8), allocatable :: f(:,:,:)
integer :: allocate_status

allocate( f(1:3,1:8,1:1), source=zero,stat=allocate_status )
print*, "allocate( f(1:3,1:8,1:1)"
print*, "size f", size(f), " alloc stat=",allocate_status 
print*, "shape f ", shape(f)
print*, ""
deallocate( f )

allocate( f(1:3,1:8,1:0), source=zero,stat=allocate_status )
print*, "allocate( f(1:3,1:8,1:0)"
print*, "size f", size(f), " alloc stat=",allocate_status
print*, "shape f ", shape(f)

end program foo
depepper-MOBL1:q04958028 rwgreen$ 
depepper-MOBL1:q04958028 rwgreen$ ifort -O0 -g -check all -o foo foo.f90 
depepper-MOBL1:q04958028 rwgreen$ ./foo
 allocate( f(1:3,1:8,1:1)
 size f          24  alloc stat=           0
 shape f            3           8           1
 
 allocate( f(1:3,1:8,1:0)
 size f           0  alloc stat=           0
 shape f            3           8           0

   

So I just changed your sample to 1:1 to remove anomalies with that strange array allocation.  I assume it's a fluke.

I did see the error on the exit once, so I believe there is something there.  I'm trying to see if I can reproduce it more consistently.

sbolding
Beginner
183 Views

Thanks for looking into this.

In some cases the allocation really is `(1:n)` with n=0 in the real application.  There is logic elsewhere in the code to avoid accessing any of the zero-sized arrays.  We certainly could (and probably should) avoid this, but it never showed up as an issue in the past, so I imagine the developers left it in there.

I only used those dimensions because it was the one that showed up most consistently in the real application.  I think in reality it is showing up consistently at that location because it is the site of the first allocate we have inside an OpenMP loop in the real application.  We have observed failures (both during the allocation and during the program close out) with allocations of a reasonable size.  

I have confirmed that I see the failure occasionally with an allocation with 1:2 for the third dimension in the reproducer.  I was able to get the code to fail a little more consistently (1 in 20 or so, not sure it actually changed) if I modified the code by calling the alloc and dealloc, and looping over it multiple times.  I attached the changed input.

The resulting error is:

test-openmp 0000000108D2D2D4 for__signal_handl Unknown Unknown
libsystem_platfor 00007FFF688755FD _sigtramp Unknown Unknown
test-openmp 0000000108D4249C for_alloc_allocat Unknown Unknown
test-openmp 0000000108D071A2 MAIN__ Unknown Unknown

Simon




Ronald_G_Intel
Moderator
109 Views

Good news on this one - a bug report came in on this before this thread and the 2 Premier issues were opened from this thread.  There is a fix.

I have tested the fix - 10,000 iterations with ZERO failures with a nightly build of the 2021 compiler.  This fix will come out in the next oneAPI Release.  I hate to commit to dates on when these will come out, as we can slip. We have publically said we release oneAPI updates once a quarter.  Well that ends in about 2-3 weeks so that is all I can say.    Bottom-line SOON.  Look for this release, it's numbering will 2021.2

Ronald_G_Intel
Moderator
57 Views

the 2021.2 compiler is available now.   I have tested and can no longer get the runtime segfault.

Can others confirm the fix?

Reply