Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29563 Diskussionen

Memory leak of allocatable arrays inside OpenMP sections

foxtran
Neuer Beitragender III
7.886Aufrufe


Hello! The following simple example of allocation of temporary blocks inside of OpenMP sections has a memory leaks since array `a` is not deallocated at the end of scope, as it usually happens in Fortran. gfortran does not have memory leak here.

subroutine test(n, q, v)
  integer :: n
  real :: q(n, n), v(n, n, n)
  real, allocatable :: a(:, :)
  !$omp parallel default(none) &
  !$omp private(a) &
  !$omp shared(n, q, v)
  allocate(a(n, n))

  !$omp do
  do i = 1, n
    a = a + v(:, :, i)
  end do
  !$omp end do nowait

  !$omp critical (sum_q)
  q = q + a
  !$omp end critical (sum_q)

  !$omp end parallel
end subroutine test

program main
  integer, parameter :: n = 1000
  real, allocatable :: q(:, :), v(:, :, :)
  integer :: i
  allocate(q(n, n), v(n, n, n))
  q = 0.0
  call random_number(q)
  do i = 1, n
    call test(n, q, v)
  end do
  print *, sum(q)
end program main


Tested ifx: 2025.2.0

Compilation flags: ifx -fopenmp alloc.f90 -O3

gfortran solves this task in 8 threads in approximately 40 seconds.

 

0 Kudos
15 Antworten
Andrew_Smith
Geschätzter Beitragender I
7.783Aufrufe

I don't see any openMP sections, only openMP parallel do.

I tested in IFX 2025.3.1 on Windows and got no obvious increases in memory during progress.

 

Mark_Lewy
Geschätzter Beitragender I
7.777Aufrufe

I think you need to explicitly deallocate a before the end of the parallel region.   This is still leaking in IFX 2025.3, otherwise

foxtran
Neuer Beitragender III
7.726Aufrufe

Yep, I have to do deallocation, but I would be much better to be consistent with other Fortran implementations. I am still not sure is it a problem with Intel compilers or other compilers are too clever... 

Considering rewriting this code in C++:

#pragma omp parallel
{
  std::vector v{100};
  …
  // Fortran/C++ RAII kills v
}

 So, all threads should deallocate v at the end of scope. The same should happen with Fortran too.

jimdempseyatthecove
Geehrter Beitragender III
7.758Aufrufe

I believe @Mark_Lewy is correct, though Steve could comment on this.

I believe the auto deallocation (when realloc lhs is in effect) is required to be performed at procedure exit.

At the point of procedure exit, only the main thread is running, thus only its allocation is required to be deallocated.

gfortran performing the deallocation (for non-master threads) when exiting a parallel region may be a non-standard feature.

Do not rely on non-standard (vendor specific) behavior.

 

Jim Dempsey

foxtran
Neuer Beitragender III
7.729Aufrufe

I suppose procedure exit is at !$omp end parallel. At least as I understand how OMP sections are produced in binary.

I've checked other compilers:
nvfortran 23.11
sun fortran compiler 12.6
Cray Fortran 18

All of these compilers do deallocation at the end of scope in all threads.

jimdempseyatthecove
Geehrter Beitragender III
7.709Aufrufe

This is because std:vector has a dtor

#pragma omp parallel
{
float* array = (float*)malloc(1234);
...
}

Would have a memory leak if you don't include the free.

In reading this section, there is no mention of exiting a parallel region.

I couldn't locate a Fortran section in the OpenMP reference addressing this subject.

 

Jim Dempsey

foxtran
Neuer Beitragender III
7.680Aufrufe

That is what is see in OpenMP 5.2:

> Finalization of a list item of a finalizable type or subobjects of a list item of a finalizable type occurs at the end of the region.

at CHAPTER 5. DATA ENVIRONMENT; 5.3 List Item Privatization; p 107, lines 20-22.


So, allocatable arrays should be finalized at the end of region.


OpenMP 3.0 does specify behaviour in the other words:

> The value and/or allocation status of the original list item will change only:
>  - if accessed and modified via pointer,
>  - if (possibly) accessed in the region but outside of the construct, or
>  - as a side effect of directives or clauses.

at 
Chapter 2. Directives;  2.9.3.3 private clause, p 90, lines 4-7.

Actually, for main thread, there is deallocation for ifx 2025.2.0, but not for other threads.

jimdempseyatthecove
Geehrter Beitragender III
7.523Aufrufe

The issue may relate to the definition of "finalizable type".

While a User Defined Type can have a final procedure(s), it is not clear that an allocatable array (descriptor) has a final procedure.

This is Fortran, an array descriptor is not necessarily the same as a C++ container.

As to if the underlying construct is a C++ container, or not, this may be a vendor specific implementation choice.

 

It would be nice if Steve could comment on this.

 

Jim Dempsey

foxtran
Neuer Beitragender III
7.429Aufrufe

@Igor_V_Intel , what do you think about this bug (or maybe it is not a bug)? Could you please have a look?

Igor_V_Intel
Moderator
7.288Aufrufe

In principle, I agree with Jim here. OpenMP only defines when the scope of a private variable begins and ends.
The compiler decides when storage for the private copy is created and destroyed. OpenMP does not constrain the implementation’s handling of Fortran allocatables and Fortran requires deallocation only at procedure exit (by main thread). So, it appears to be a standard-conforming behavior of the compiler. However, I agree with you and Ron that it is not what the user may expect. The fact that other compiler implementations deallocate the memory shows that we should discuss this topic and consider aligning it.

foxtran
Neuer Beitragender III
7.221Aufrufe

requires deallocation only at procedure exit

As I can understand, OMP generates a new procedure for each OMP region (at least that is one of possible ways to implement OpenMP pragmas), so, technically, at the end of parallel region there is an exit from procedure, so deallocate should happen in all threads.

Ron_Green
Moderator
7.307Aufrufe

There is definitely a leak of memory, as seen on Linux using 'top' to watch in a 2nd window.  This is interesting also, in that ifx and ifort are using stack for the allocation whereas gfortran is probably using heap.  Using the option -heap-arrays you will find ifx runs as fast a gfortran.  Without that option, ifx seems quite slow by comparision.

 

So the question is whether this is legal OMP or not.  For sure your syntax sets each thread with a private version of the array descriptor for 'a'.  but the allocation is done per thread.  So at the end of the OMP parallel region is it the programmer's responsibility to release the allocation or should Fortran scoping rules apply.  Good question.  If this were C, it would be illegal without a free() I would think.  So why would the OMP rules for Fortran differ?  Don't know.  My expectation is like yours - I would expect Fortran to free the allocation on each thread at the end of the region.  Expect does not mean compliant to Standards.  

I'll write up a bug report and we'll have a discussion internally as to the legality of this example.  This is a good example, thank you for sending this in.

BTW - ifort gives the same behavior so it's something in our Fortran front-end I think. 

 

Ron

Ron_Green
Moderator
7.299Aufrufe

 bug ID is CMPLRLLVM-71966

jimdempseyatthecove
Geehrter Beitragender III
7.271Aufrufe

This is something that the standards committee has to address (OpenMP standard, assuming it is not addressed already).

You can also address this with the following (assuming you wish for automatic deallocation (untested code).

subroutine test(n, q, v)
  integer :: n
  real :: q(n, n), v(n, n, n)
  !$omp parallel default(none) &
  !$omp shared(n, q, v)
  !$omp do
  block
  real, allocatable :: a(:, :)
  allocate(a(n, n))
  do i = 1, n
    a = a + v(:, :, i)
  end do
  !$omp end do nowait

  !$omp critical (sum_q)
  q = q + a
  !$omp end critical (sum_q)
  end block
  !$omp end parallel
end subroutine test

Jim Dempsey

jimdempseyatthecove
Geehrter Beitragender III
7.170Aufrufe

>>As I can understand, OMP generates a new procedure for each OMP region...

Do you have anything to substantiate this?

subroutine foo(array)
real :: array(:)
integer :: i
do i=1,size(array)
   ...
end do
end subroutine foo

The procedure is foo

subroutine foo(array)
real :: array(:)
integer :: i
!$omp parallel do
do i=1,size(array)
   ...
end do
!$omp end parallel do
end subroutine foo

the procedure is foo, and by definition, it contains a parallel region. 

 

Jim Dempsey

Antworten