- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a memory leak problem with nested OpenMP parallel regions when both regions run with multiple threads. At the same time, when only one region have multiple threads and other one is 1-threaded, there is no memory leak.
There is C# program that calls Fortran dll multiple times. In Fortran dll I have nested parallel regions like that:
subroutine Sub1()
*some work*
!$ call OMP_SET_NESTED(.TRUE.)
!$OMP parallel do num_threads(n1) shared(...) private(...)
do i=1,2
*allocation of private dynamic arrays*
*some work*
call Sub2(args)
*some work*
*deallocation of private arrays*
end do
!$OMP end parallel do
end subroutine
subroutine Sub2(args)
*allocation dynamic arrays*
*some work*
!$OMP parallel do num_threads(n2) shared(...) private(...)
do i=1,2
*allocation of private dynamic arrays*
*some work*
*deallocation of private dynamic arrays*
end do
!$OMP end parallel do
*deallocation of dynamic arrays*
end subroutine
So I have 2 loops each has 2 iterations. When I run the code with n1=2, n2=1, or n1=1, n2=2 (number of threads in Sub1 and Sub2), everything is fine, but when I run it with n1=2, n2=2 then I have memory leak and program crashes after some time when memory usage reaches 2 Gb (I build it as 32-bit app).
VMMAP tool shows that most memory as taken by "Private data", where I can see a lot of memory blocks with size 1024 Kb and total WS 1000 Kb, the amount of such blocks is increasing with time. Because of such even size (exactly 1 Mb) I suspect that these are some system blocks, maybe stacks of OpenMP threads?
I tried it both on 15.0 and 16.0 (beta) compilers, the behavior is the same.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you add code to the beginning of sub1 to assure that the C# program is not calling sub1 from one C# thread while it is concurrently running sub1 from a different C# thread.
Note, it is not necessarily wrong to do so, however, if the multiple C# call rate exceeds the throughput rate, you will produce backlogs of work (each consuming stack and thread resources).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
It appears that this issue is unrelated to C# and DLL. I've managed to create simple example of Fortran console application that has nested OpenMP loops and memory leak, the code is below. Again, if I disable OMP_NESTED or set number of threads in at least one of the OpenMP loops to 1, everything works fine. But as soon as I have 2 threads in both inner and outer loops, I have memory leak and see a lot of 1 Mb memory blocks in Private Data in VMMap.
program OmpNestedMemLeak implicit none double precision h double precision estimate integer i integer n,n1,n2 double precision sum2 double precision x n1=100000 n2=1000000 h = 1.0D+00 / dble ( 2 * n1 ) sum2 = 0.0D+00 write (*,*), "Entering main loop.." !$ call OMP_SET_NESTED(.TRUE.) !$omp parallel shared ( h, n1,n2 ) private ( i, x ) num_threads(2) !$omp do reduction ( + : sum2 ) do i = 1, n1 call nested_sub(n2) x = h * dble ( 2 * i - 1 ) sum2 = sum2 + 1.0D+00 / ( 1.0D+00 + x**2 ) end do !$omp end do !$omp end parallel estimate = 4.0D+00 * sum2 / dble ( n1 ) write (*,*), "Main sub result", estimate end program OmpNestedMemLeak subroutine nested_sub(n) double precision h double precision estimate integer i integer n double precision sum2 double precision x h = 1.0D+00 / dble ( 2 * n ) sum2 = 0.0D+00 !$omp parallel shared ( h, n ) private ( i, x ) num_threads(2) !$omp do reduction ( + : sum2 ) do i = 1, n x = h * dble ( 2 * i - 1 ) sum2 = sum2 + 1.0D+00 / ( 1.0D+00 + x**2 ) end do !$omp end do !$omp end parallel end subroutine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using that small test case built with the 16.0 compiler and running under Inspector, for 32-bit only, there does appear to be continually increasing memory usage when OMP_NESTED is enabled. The same does not occur for Intel 64.
I directed this to the attention of our OpenMP Development team for some deeper analysis and will let you know what I hear back.
(Internal tracking id: DPD200375133)
(Resolution Update on 12/09/2015): This defect is fixed in the Intel® Parallel Studio XE 2016 Update 1 Release (PSXE 2016.1.051 / CnL 2016.1.146 - Windows)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the example post above. I was interested in understanding nested !$OMP, so I have taken this example and tested it with and without omp_nested
I also have investigated providing some reporting statistics, while running for mult- threaded runs. I have learnt something new for initialising and accumulating statistics. in a parallel region.
Finally, by comparing nested to non-nested performance: Having the outer loop "n1" large makes the nested approach to be a poor alternative. Based on these tests, I would expect there are few situations where OMP_NESTED is a good approach. I would expect this could be where n1 is small or there is poor load balance between threads. Once n1 is greater than the number of threads available, I would expect nesting to be unfavourable. Or am I missing something here ?
I have derived two examples :omp_nest_v5.f90 is nested, while omp_nest_v4.f90 is not.
For my testing, they are dramatically different in run times, although I am not sure what optimise has done to the loop in nested_sub for v4.
John

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page