- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Colleagues,
I have code with two nested loops, the start of the 2nd (inner) loop is separated by a considerable amount of work/code from the start of the 1st (outer) loop. This work/code contains arrays that, depending on the problem and available memory, will not permit threading the outer loop. In that case I would like to thread the inner loop. This structure does not line up with what is usually described as nested loops and I have not been able to get the inner loop to thread,
Code extract of the nested loops follows. Before dropping into the nested loops, I know whether there is sufficient memory to thread the outer loop. If that is NOT the case, then NumSurfToProcess is set to 1. In that case I would like the outer loop to have 1 thread from the team, and the inner loop to be threaded with the rest of the threads. If there IS enough memory, then the outer loop should be threaded with all the threads and the inner loop not threaded. I had thought that the the clauses involving NumSurfToProesss in the loops' parallel setups would turn threading on/off as required, but that doesn't work.
I must have not things set correctly (yet), since regardless of the value of NumSurfProcess, the inner loop never gets threaded.
if( NumSurfToProcess == 1) then NumThreadsSendingLoop = 1 else NumThreadsSendingLoop = omp_get_num_threads() end if call omp_set_nested(.true.) !$OMP parallel do default(private), & !$omp & shared( Variables), & !$omp & num_threads(NumThreadsSendingLoop), & !$omp & schedule(dynamic), & !$omp & if(NumSurfToProcess > 1 ) sending_loop: & do SendLoopCounter = 1,NumSurfToProcess ...code... if( NumSurfToProcess == 1) then NumThreadsReceivingLoop = omp_get_num_threads()-1 else NumThreadsReceivingLoop = 1 end if !$OMP parallel do default(private), & !$omp & shared( Variables ), & !$omp & num_threads(NumThreadsReceivingLoop), & !$omp & schedule(dynamic), & !$omp & if(NumSurfToProcess == 1 ) receiving_loop: & do NumDICU = 1,NumDoISeeYou ...code... end do receiving_loop !$OMP end parallel do end do sending_loop !$OMP end parallel do
Thanks in advance for any help.
David
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem you (may) have is a case of oversubscription and, as a result, potentially unnecessary memory loss due to unnecessary stack allocation.
In order to relieve this, instead of using !$OMP ... num_threads(... to control the point of threading, simply CALL either the threaded loop, or unthreaded loop as a subroutine.
subroutine YourOuterScopedSubroutine(...) ... if( NumSurfToProcess == 1) then do SendLoopCounter = 1,NumSurfToProcess call ContainedSubroutineToProcessSurface(SendLoopCounter) !$omp parallel do do NumDICU = 1,NumDoISeeYou call ContainedSubroutineToProcessISeeYou(NumDIC) end do !$OMP end parallel do end do else !$omp parallel do do SendLoopCounter = 1,NumSurfToProcess call ContainedSubroutineToProcessSurface(SendLoopCounter) do NumDICU = 1,NumDoISeeYou call ContainedSubroutineToProcessISeeYou(NumDIC) end do end do !$OMP end parallel do end if ... ! in same subroutine as above code contains subroutine ContainedSubroutineToProcessSurface(SendLoopCounter) implicit none integer :: SendLoopCounter ... ! note, outer (caller) scoped variables visible to contained subroutine end subroutine ContainedSubroutineToProcessSurface subroutine ContainedSubroutineToProcessISeeYou(NumDIC) implicit none integer :: NumDIC ... end subroutine ContainedSubroutineToProcessISeeYou end subroutine YourOuterScopedSubroutine
You may also want to consider using OpenMP Task method of programming.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
subroutine YourOuterScopedSubroutine(...) ... !$omp parallel !$omp master do SendLoopCounter = 1,NumSurfToProcess !$omp task private(SendLoopCounter) call ContainedSubroutineToProcessSurface(SendLoopCounter) ! or code block do NumDICU = 1,NumDoISeeYou !$omp task private(NumDIC) call ContainedSubroutineToProcessISeeYou(NumDIC) ! or code block !$omp end task end do !$omp end task end do !$omp end master !$omp end parallel ... ! in same subroutine as above code contains subroutine ContainedSubroutineToProcessSurface(SendLoopCounter) implicit none integer :: SendLoopCounter ... ! note, outer (caller) scoped variables visible to contained subroutine end subroutine ContainedSubroutineToProcessSurface subroutine ContainedSubroutineToProcessISeeYou(NumDIC) implicit none integer :: NumDIC ... end subroutine ContainedSubroutineToProcessISeeYou end subroutine YourOuterScopedSubroutine
You will have to work out what needs to be shared or private.
The benefit of using the task structure is experienced on the boundary situations where you (formerly) decided whether to parallelize the outer region or inner region, the task method automatically balances the load choice. If the inner level has relatively small processing load, then you might consider having the inner loop take a stride of 2 or more (then conditionally executing up to a stride full of iterations without tasking).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim, I'll try both suggestions. Surely there's a way to make this work.
David
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page