Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Possible bug in ifort + OMP

Luckner__Paul
Novice
4 313 Visites

The allocate statement seems to be broken within an OMP loop.

Setup:

  1. create an OMP loop
  2. use allocate and supply something large into the optional source argument
  3. ERROR segmentation fault.

 

Compiler version:
$ ifort -v
ifort version 19.1.1.217

 

 

0 Compliments
12 Réponses
JohnNichols
Contributeur émérite I
4 302 Visites
program main_tmp

  implicit none

  integer              :: i,j,n
  integer, allocatable :: x(:)

  !$omp parallel do default(none) private(i,j,n,x)
  do i = 1, 6
    n = 10**i
    allocate (x(n), source=[(j, j=1,n)])
  end do
  !$omp end parallel do

end program

 

As far as I am aware you need to deallocate before you reallocate after the first allocation

0 Compliments
Luckner__Paul
Novice
4 299 Visites

You are correct. I tried to cut down the code and I shouldnt have deleted the external procedure.

 

The bug still arises.

program main_tmp

  implicit none

  external :: foo
  integer :: i

  !$omp parallel do default(none) private(i)
  do i = 1, 6
    call foo(i)
  end do
  !$omp end parallel do
end program

subroutine foo(i)
  integer, intent(in) :: i

  integer              :: j, n
  integer, allocatable :: x(:)

  n = 10**i
  allocate (x(n), source=[(j, j=1,n)])
end subroutine
0 Compliments
Luckner__Paul
Novice
4 293 Visites

Even worse!!

Just supplying large arguments to any procedure gives a segmentation fault!? (See precedure bar)

 

module foo_m
  implicit none
contains
  subroutine bar(iarr)
    integer, intent(in) :: iarr(:)

    integer :: k
    k = iarr(1)                    ! just any operation. otherwise the compiler might optimize this function away?!
  end subroutine

  subroutine foo(i)
    integer, intent(in) :: i

    integer              :: j, n
    integer, allocatable :: x(:)

    n = 10**i
    allocate (x(n), source=[(j, j=1,n)])
  end subroutine
end module

program main_tmp
  use foo_m
  implicit none

  integer :: i,j

  !$omp parallel do default(none) private(i,j)
  do i = 1, 6
    ! call foo(i)                     ! uncomment to see that   allocate(..,source=<something large>)   will break
    ! call bar([(j, j=1,10**i)])      ! uncomment to see that   precedure(<something large>)            will break
  end do
  !$omp end parallel do
end program

 

 

0 Compliments
Johannes_Rieke
Nouveau contributeur III
4 273 Visites

Hi Paul,

I cannot reproduce the error in Windows OS with PSXE 2020 u2 (19.1.2.254). Your example code in the post directly above works as expected in debug mode.

My command line:

/nologo /debug:full /MP /Od /Qopenmp /warn:all /module:"x64\Debug\\" /object:"x64\Debug\\" /Fd"x64\Debug\vc150.pdb" /traceback /check:all /libs:static /threads /dbglibs /c

 

Do you work on GNU/Linux? How does your build command line look like?

 

BR, Johannes

 

0 Compliments
Luckner__Paul
Novice
4 239 Visites

I replied already multiple times but the website doesnt show it!?!?

Anyways, I uploaded my build instructions to pastebin. See here

https://pastebin.com/j8AYtQun

 

PS: I do use GNU/Linux.

0 Compliments
JohnNichols
Contributeur émérite I
4 215 Visites

it works in Intel Fortran - heap set to 0

Luckner__Paul
Novice
4 210 Visites

I guess you are talking about that option?

$ ifort -heap-arrays 0

https://software.intel.com/......  

JohnNichols
Contributeur émérite I
4 194 Visites

I ran it on the latest VS 2019 Preview with the latest Intel using a standard Fortran program - it rang in 5 cores and then crashed with a stack error which you fix with heaps zero in VS. 

0 Compliments
Johannes_Rieke
Nouveau contributeur III
4 177 Visites

If the heap-array options helps, then the line

allocate (x(n), source=[(j, j=1,n)])

might creates a temporary array on the stack due to the implied loop. One could split the allocation and the initialzation with a consecutive do loop to check this. There had been a discussion some time ago regarding implied do loops, temps and stack... but I don't find it.

Further OMP creates overhead, which could have an impact on stack. But I'm unsure about this.

If speed doesn't matter, you could stick to the heap option. Stack is generally faster than heap. If you could identify the code section, which requires heap, you could maybe improve the speed. But remember, premature optimization is the root of all evil

ps: You override the defaut real and interger kind in your build command (-i8, ...) and that's twice defined. I would use this only for lagacy code and otherwise change it in the code.

0 Compliments
Luckner__Paul
Novice
4 167 Visites

The heap-arrays option did fix the segmentation fault and thus, it is clear that temporary arrays saved onto stack don't work nicely together with OMP.

I guess different threads get a specific stack size and if the temporary array is too large it will overwrite the stack of the next thread... Shouldn't the compiler know the stack sizes and not do that? It could automatically save them to the heap...

Now the programmer has to take care not to use implied do loops in OMP parallel regions (unless heap-arrays 0 is used)?

0 Compliments
John_Campbell
Nouveau contributeur II
4 157 Visites

It is important to know where private copies of arrays are defined/located; being either on the shared heap or each thread's stack and then to knowing how big each thread stack needs to be.

My experience in Win-64 is that:

  • private copies of allocatable arrays are placed on the heap. (seen as allocatable at the !$OMP region)
  • private copies of automatic, local and argument arrays are placed on the thread stack.
  • this location can be modified with compiler options.
  • the master thread and other threads each have a stack size, which should be known and managed.
  • for the master thread, private copies of arrays will be duplicated which means the master thread stack is likely to require a larger size.

I am learning how to use the stack in a 64-bit environment. My latest omp usage is to declare all stacks large (500MB). This only defines a virtual address space, while only the used portion of each stack is allocated physical memory. ie making the stacks much larger than "necessary" does not affect physical memory demand, but you need to have an idea of what "necessary" can be. ( You may also need to consider the virtual memory limit ). ifort has a similar approach with memory address when locating heap extensions.

This problem comes about because each thread stack is defined when each thread is initiated. They can not be extended, unlike the heap which can be extended up to the physical/virtual memory limit. If you are using !$OMP to speed up your program performance, relying on virtual memory is not a likely option.

0 Compliments
Johannes_Rieke
Nouveau contributeur III
4 144 Visites

I agree Paul, that the memory handling for OMP - given to the users responsibility - is not solved optimal in the current Intel compiler. There seems to be some changes recently coming with 19.0.x and newer. Maybe not always to the best?

A thread, which is loosly related to this issue, can be found here.

Maybe Intel can improve the user experience with OMP. Maybe the upcomming OneAPI compilers do a better job on this? Maybe we should use coarrays and let the compiler do the MPI/OpenMP whatever layer.

 

0 Compliments
Répondre