Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29556 Discussions

Memory leak of allocatable arrays inside OpenMP sections

foxtran
New Contributor III
7,872 Views


Hello! The following simple example of allocation of temporary blocks inside of OpenMP sections has a memory leaks since array `a` is not deallocated at the end of scope, as it usually happens in Fortran. gfortran does not have memory leak here.

subroutine test(n, q, v)
  integer :: n
  real :: q(n, n), v(n, n, n)
  real, allocatable :: a(:, :)
  !$omp parallel default(none) &
  !$omp private(a) &
  !$omp shared(n, q, v)
  allocate(a(n, n))

  !$omp do
  do i = 1, n
    a = a + v(:, :, i)
  end do
  !$omp end do nowait

  !$omp critical (sum_q)
  q = q + a
  !$omp end critical (sum_q)

  !$omp end parallel
end subroutine test

program main
  integer, parameter :: n = 1000
  real, allocatable :: q(:, :), v(:, :, :)
  integer :: i
  allocate(q(n, n), v(n, n, n))
  q = 0.0
  call random_number(q)
  do i = 1, n
    call test(n, q, v)
  end do
  print *, sum(q)
end program main


Tested ifx: 2025.2.0

Compilation flags: ifx -fopenmp alloc.f90 -O3

gfortran solves this task in 8 threads in approximately 40 seconds.

 

0 Kudos
15 Replies
Andrew_Smith
Valued Contributor I
7,769 Views

I don't see any openMP sections, only openMP parallel do.

I tested in IFX 2025.3.1 on Windows and got no obvious increases in memory during progress.

 

0 Kudos
Mark_Lewy
Valued Contributor I
7,763 Views

I think you need to explicitly deallocate a before the end of the parallel region.   This is still leaking in IFX 2025.3, otherwise

0 Kudos
foxtran
New Contributor III
7,712 Views

Yep, I have to do deallocation, but I would be much better to be consistent with other Fortran implementations. I am still not sure is it a problem with Intel compilers or other compilers are too clever... 

Considering rewriting this code in C++:

#pragma omp parallel
{
  std::vector v{100};
  …
  // Fortran/C++ RAII kills v
}

 So, all threads should deallocate v at the end of scope. The same should happen with Fortran too.

0 Kudos
jimdempseyatthecove
Honored Contributor III
7,744 Views

I believe @Mark_Lewy is correct, though Steve could comment on this.

I believe the auto deallocation (when realloc lhs is in effect) is required to be performed at procedure exit.

At the point of procedure exit, only the main thread is running, thus only its allocation is required to be deallocated.

gfortran performing the deallocation (for non-master threads) when exiting a parallel region may be a non-standard feature.

Do not rely on non-standard (vendor specific) behavior.

 

Jim Dempsey

0 Kudos
foxtran
New Contributor III
7,715 Views

I suppose procedure exit is at !$omp end parallel. At least as I understand how OMP sections are produced in binary.

I've checked other compilers:
nvfortran 23.11
sun fortran compiler 12.6
Cray Fortran 18

All of these compilers do deallocation at the end of scope in all threads.

0 Kudos
jimdempseyatthecove
Honored Contributor III
7,695 Views

This is because std:vector has a dtor

#pragma omp parallel
{
float* array = (float*)malloc(1234);
...
}

Would have a memory leak if you don't include the free.

In reading this section, there is no mention of exiting a parallel region.

I couldn't locate a Fortran section in the OpenMP reference addressing this subject.

 

Jim Dempsey

0 Kudos
foxtran
New Contributor III
7,666 Views

That is what is see in OpenMP 5.2:

> Finalization of a list item of a finalizable type or subobjects of a list item of a finalizable type occurs at the end of the region.

at CHAPTER 5. DATA ENVIRONMENT; 5.3 List Item Privatization; p 107, lines 20-22.


So, allocatable arrays should be finalized at the end of region.


OpenMP 3.0 does specify behaviour in the other words:

> The value and/or allocation status of the original list item will change only:
>  - if accessed and modified via pointer,
>  - if (possibly) accessed in the region but outside of the construct, or
>  - as a side effect of directives or clauses.

at 
Chapter 2. Directives;  2.9.3.3 private clause, p 90, lines 4-7.

Actually, for main thread, there is deallocation for ifx 2025.2.0, but not for other threads.

0 Kudos
jimdempseyatthecove
Honored Contributor III
7,509 Views

The issue may relate to the definition of "finalizable type".

While a User Defined Type can have a final procedure(s), it is not clear that an allocatable array (descriptor) has a final procedure.

This is Fortran, an array descriptor is not necessarily the same as a C++ container.

As to if the underlying construct is a C++ container, or not, this may be a vendor specific implementation choice.

 

It would be nice if Steve could comment on this.

 

Jim Dempsey

0 Kudos
foxtran
New Contributor III
7,415 Views

@Igor_V_Intel , what do you think about this bug (or maybe it is not a bug)? Could you please have a look?

0 Kudos
Igor_V_Intel
Moderator
7,274 Views

In principle, I agree with Jim here. OpenMP only defines when the scope of a private variable begins and ends.
The compiler decides when storage for the private copy is created and destroyed. OpenMP does not constrain the implementation’s handling of Fortran allocatables and Fortran requires deallocation only at procedure exit (by main thread). So, it appears to be a standard-conforming behavior of the compiler. However, I agree with you and Ron that it is not what the user may expect. The fact that other compiler implementations deallocate the memory shows that we should discuss this topic and consider aligning it.

foxtran
New Contributor III
7,207 Views

requires deallocation only at procedure exit

As I can understand, OMP generates a new procedure for each OMP region (at least that is one of possible ways to implement OpenMP pragmas), so, technically, at the end of parallel region there is an exit from procedure, so deallocate should happen in all threads.

0 Kudos
Ron_Green
Moderator
7,293 Views

There is definitely a leak of memory, as seen on Linux using 'top' to watch in a 2nd window.  This is interesting also, in that ifx and ifort are using stack for the allocation whereas gfortran is probably using heap.  Using the option -heap-arrays you will find ifx runs as fast a gfortran.  Without that option, ifx seems quite slow by comparision.

 

So the question is whether this is legal OMP or not.  For sure your syntax sets each thread with a private version of the array descriptor for 'a'.  but the allocation is done per thread.  So at the end of the OMP parallel region is it the programmer's responsibility to release the allocation or should Fortran scoping rules apply.  Good question.  If this were C, it would be illegal without a free() I would think.  So why would the OMP rules for Fortran differ?  Don't know.  My expectation is like yours - I would expect Fortran to free the allocation on each thread at the end of the region.  Expect does not mean compliant to Standards.  

I'll write up a bug report and we'll have a discussion internally as to the legality of this example.  This is a good example, thank you for sending this in.

BTW - ifort gives the same behavior so it's something in our Fortran front-end I think. 

 

Ron

Ron_Green
Moderator
7,285 Views

 bug ID is CMPLRLLVM-71966

jimdempseyatthecove
Honored Contributor III
7,257 Views

This is something that the standards committee has to address (OpenMP standard, assuming it is not addressed already).

You can also address this with the following (assuming you wish for automatic deallocation (untested code).

subroutine test(n, q, v)
  integer :: n
  real :: q(n, n), v(n, n, n)
  !$omp parallel default(none) &
  !$omp shared(n, q, v)
  !$omp do
  block
  real, allocatable :: a(:, :)
  allocate(a(n, n))
  do i = 1, n
    a = a + v(:, :, i)
  end do
  !$omp end do nowait

  !$omp critical (sum_q)
  q = q + a
  !$omp end critical (sum_q)
  end block
  !$omp end parallel
end subroutine test

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
7,156 Views

>>As I can understand, OMP generates a new procedure for each OMP region...

Do you have anything to substantiate this?

subroutine foo(array)
real :: array(:)
integer :: i
do i=1,size(array)
   ...
end do
end subroutine foo

The procedure is foo

subroutine foo(array)
real :: array(:)
integer :: i
!$omp parallel do
do i=1,size(array)
   ...
end do
!$omp end parallel do
end subroutine foo

the procedure is foo, and by definition, it contains a parallel region. 

 

Jim Dempsey

0 Kudos
Reply