- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello! The following simple example of allocation of temporary blocks inside of OpenMP sections has a memory leaks since array `a` is not deallocated at the end of scope, as it usually happens in Fortran. gfortran does not have memory leak here.
subroutine test(n, q, v)
integer :: n
real :: q(n, n), v(n, n, n)
real, allocatable :: a(:, :)
!$omp parallel default(none) &
!$omp private(a) &
!$omp shared(n, q, v)
allocate(a(n, n))
!$omp do
do i = 1, n
a = a + v(:, :, i)
end do
!$omp end do nowait
!$omp critical (sum_q)
q = q + a
!$omp end critical (sum_q)
!$omp end parallel
end subroutine test
program main
integer, parameter :: n = 1000
real, allocatable :: q(:, :), v(:, :, :)
integer :: i
allocate(q(n, n), v(n, n, n))
q = 0.0
call random_number(q)
do i = 1, n
call test(n, q, v)
end do
print *, sum(q)
end program main
Tested ifx: 2025.2.0
Compilation flags: ifx -fopenmp alloc.f90 -O3
gfortran solves this task in 8 threads in approximately 40 seconds.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see any openMP sections, only openMP parallel do.
I tested in IFX 2025.3.1 on Windows and got no obvious increases in memory during progress.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think you need to explicitly deallocate a before the end of the parallel region. This is still leaking in IFX 2025.3, otherwise
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep, I have to do deallocation, but I would be much better to be consistent with other Fortran implementations. I am still not sure is it a problem with Intel compilers or other compilers are too clever...
Considering rewriting this code in C++:
#pragma omp parallel
{
std::vector v{100};
…
// Fortran/C++ RAII kills v
}So, all threads should deallocate v at the end of scope. The same should happen with Fortran too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe @Mark_Lewy is correct, though Steve could comment on this.
I believe the auto deallocation (when realloc lhs is in effect) is required to be performed at procedure exit.
At the point of procedure exit, only the main thread is running, thus only its allocation is required to be deallocated.
gfortran performing the deallocation (for non-master threads) when exiting a parallel region may be a non-standard feature.
Do not rely on non-standard (vendor specific) behavior.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose procedure exit is at !$omp end parallel. At least as I understand how OMP sections are produced in binary.
I've checked other compilers:
nvfortran 23.11
sun fortran compiler 12.6
Cray Fortran 18
All of these compilers do deallocation at the end of scope in all threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is because std:vector has a dtor
#pragma omp parallel
{
float* array = (float*)malloc(1234);
...
}Would have a memory leak if you don't include the free.
In reading this section, there is no mention of exiting a parallel region.
I couldn't locate a Fortran section in the OpenMP reference addressing this subject.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is what is see in OpenMP 5.2:
> Finalization of a list item of a finalizable type or subobjects of a list item of a finalizable type occurs at the end of the region.
at CHAPTER 5. DATA ENVIRONMENT; 5.3 List Item Privatization; p 107, lines 20-22.
So, allocatable arrays should be finalized at the end of region.
OpenMP 3.0 does specify behaviour in the other words:
> The value and/or allocation status of the original list item will change only:
> - if accessed and modified via pointer,
> - if (possibly) accessed in the region but outside of the construct, or
> - as a side effect of directives or clauses.
at Chapter 2. Directives; 2.9.3.3 private clause, p 90, lines 4-7.
Actually, for main thread, there is deallocation for ifx 2025.2.0, but not for other threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue may relate to the definition of "finalizable type".
While a User Defined Type can have a final procedure(s), it is not clear that an allocatable array (descriptor) has a final procedure.
This is Fortran, an array descriptor is not necessarily the same as a C++ container.
As to if the underlying construct is a C++ container, or not, this may be a vendor specific implementation choice.
It would be nice if Steve could comment on this.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Igor_V_Intel , what do you think about this bug (or maybe it is not a bug)? Could you please have a look?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In principle, I agree with Jim here. OpenMP only defines when the scope of a private variable begins and ends.
The compiler decides when storage for the private copy is created and destroyed. OpenMP does not constrain the implementation’s handling of Fortran allocatables and Fortran requires deallocation only at procedure exit (by main thread). So, it appears to be a standard-conforming behavior of the compiler. However, I agree with you and Ron that it is not what the user may expect. The fact that other compiler implementations deallocate the memory shows that we should discuss this topic and consider aligning it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> requires deallocation only at procedure exit
As I can understand, OMP generates a new procedure for each OMP region (at least that is one of possible ways to implement OpenMP pragmas), so, technically, at the end of parallel region there is an exit from procedure, so deallocate should happen in all threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is definitely a leak of memory, as seen on Linux using 'top' to watch in a 2nd window. This is interesting also, in that ifx and ifort are using stack for the allocation whereas gfortran is probably using heap. Using the option -heap-arrays you will find ifx runs as fast a gfortran. Without that option, ifx seems quite slow by comparision.
So the question is whether this is legal OMP or not. For sure your syntax sets each thread with a private version of the array descriptor for 'a'. but the allocation is done per thread. So at the end of the OMP parallel region is it the programmer's responsibility to release the allocation or should Fortran scoping rules apply. Good question. If this were C, it would be illegal without a free() I would think. So why would the OMP rules for Fortran differ? Don't know. My expectation is like yours - I would expect Fortran to free the allocation on each thread at the end of the region. Expect does not mean compliant to Standards.
I'll write up a bug report and we'll have a discussion internally as to the legality of this example. This is a good example, thank you for sending this in.
BTW - ifort gives the same behavior so it's something in our Fortran front-end I think.
Ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bug ID is CMPLRLLVM-71966
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is something that the standards committee has to address (OpenMP standard, assuming it is not addressed already).
You can also address this with the following (assuming you wish for automatic deallocation (untested code).
subroutine test(n, q, v)
integer :: n
real :: q(n, n), v(n, n, n)
!$omp parallel default(none) &
!$omp shared(n, q, v)
!$omp do
block
real, allocatable :: a(:, :)
allocate(a(n, n))
do i = 1, n
a = a + v(:, :, i)
end do
!$omp end do nowait
!$omp critical (sum_q)
q = q + a
!$omp end critical (sum_q)
end block
!$omp end parallel
end subroutine testJim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>As I can understand, OMP generates a new procedure for each OMP region...
Do you have anything to substantiate this?
subroutine foo(array)
real :: array(:)
integer :: i
do i=1,size(array)
...
end do
end subroutine fooThe procedure is foo
subroutine foo(array)
real :: array(:)
integer :: i
!$omp parallel do
do i=1,size(array)
...
end do
!$omp end parallel do
end subroutine foothe procedure is foo, and by definition, it contains a parallel region.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page