Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Threading nested loops in OpenMP

David_DiLaura1
834 Views

Colleagues,

I have code with two nested loops, the start of the 2nd (inner) loop is separated by a considerable amount of work/code from the start of the 1st (outer) loop. This work/code contains arrays that, depending on the problem and available memory, will not permit threading the outer loop. In that case I would like to thread the inner loop. This structure does not line up with what is usually described as nested loops and I have not been able to get the inner loop to thread,

Code extract of the nested loops follows.  Before dropping into the nested loops, I know whether there is sufficient memory to thread the outer loop. If that is NOT the case, then NumSurfToProcess is set to 1. In that case I would like the outer loop to have 1 thread from the team, and the inner loop to be threaded with the rest of the threads. If there IS enough memory, then the outer loop should be threaded with all the threads and the inner loop not threaded. I had thought that the the clauses involving NumSurfToProesss in the loops' parallel setups would turn threading on/off as required, but that doesn't work.

I must have not things set correctly (yet), since regardless of the value of NumSurfProcess, the inner loop never gets threaded.

if( NumSurfToProcess == 1) then
	NumThreadsSendingLoop = 1
else
	NumThreadsSendingLoop =  omp_get_num_threads()
end if
call omp_set_nested(.true.)
!$OMP parallel do	default(private),					&
!$omp &				shared( Variables),					&
!$omp &				num_threads(NumThreadsSendingLoop),	&
!$omp &				schedule(dynamic),					&
!$omp &             if(NumSurfToProcess > 1 )
sending_loop:	&
do SendLoopCounter = 1,NumSurfToProcess
	...code...
	if( NumSurfToProcess == 1) then
		NumThreadsReceivingLoop = omp_get_num_threads()-1
	else
		NumThreadsReceivingLoop =	1
	end if
	!$OMP parallel do	default(private),						&
	!$omp &				shared( Variables ),					&
	!$omp &				num_threads(NumThreadsReceivingLoop),	&
	!$omp &				schedule(dynamic),						&
	!$omp &				if(NumSurfToProcess == 1 )
	receiving_loop: &
	do NumDICU = 1,NumDoISeeYou
		...code...
	end do receiving_loop
	!$OMP end parallel do
end do sending_loop
!$OMP end parallel do

Thanks in advance for any help.

David

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
834 Views

The problem you (may) have is a case of oversubscription and, as a result, potentially unnecessary memory loss due to unnecessary stack allocation.

In order to relieve this, instead of using !$OMP ... num_threads(... to control the point of threading, simply CALL either the threaded loop, or unthreaded loop as a subroutine.

subroutine YourOuterScopedSubroutine(...)
  ...
  if( NumSurfToProcess == 1) then
    do SendLoopCounter = 1,NumSurfToProcess
      call ContainedSubroutineToProcessSurface(SendLoopCounter)
      !$omp parallel do
      do NumDICU = 1,NumDoISeeYou
        call ContainedSubroutineToProcessISeeYou(NumDIC)
      end do
      !$OMP end parallel do
    end do
  else
    !$omp parallel do
    do SendLoopCounter = 1,NumSurfToProcess
      call ContainedSubroutineToProcessSurface(SendLoopCounter)
      do NumDICU = 1,NumDoISeeYou
        call ContainedSubroutineToProcessISeeYou(NumDIC)
      end do
    end do
    !$OMP end parallel do
  end if
  ...
  ! in same subroutine as above code
contains
  subroutine ContainedSubroutineToProcessSurface(SendLoopCounter)
    implicit none
    integer :: SendLoopCounter
    ...
    ! note, outer (caller) scoped variables visible to contained subroutine
  end subroutine ContainedSubroutineToProcessSurface
  subroutine ContainedSubroutineToProcessISeeYou(NumDIC)
    implicit none
    integer :: NumDIC
    ...
  end subroutine ContainedSubroutineToProcessISeeYou
end subroutine YourOuterScopedSubroutine

You may also want to consider using OpenMP Task method of programming.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
834 Views
subroutine YourOuterScopedSubroutine(...)
  ...
  !$omp parallel
  !$omp master
  do SendLoopCounter = 1,NumSurfToProcess
    !$omp task private(SendLoopCounter)
    call ContainedSubroutineToProcessSurface(SendLoopCounter) ! or code block
    do NumDICU = 1,NumDoISeeYou
      !$omp task private(NumDIC)
      call ContainedSubroutineToProcessISeeYou(NumDIC) ! or code block
      !$omp end task
    end do
    !$omp end task
  end do
  !$omp end master
  !$omp end parallel
  ...
  ! in same subroutine as above code
contains
  subroutine ContainedSubroutineToProcessSurface(SendLoopCounter)
    implicit none
    integer :: SendLoopCounter
    ...
    ! note, outer (caller) scoped variables visible to contained subroutine
  end subroutine ContainedSubroutineToProcessSurface
  subroutine ContainedSubroutineToProcessISeeYou(NumDIC)
    implicit none
    integer :: NumDIC
    ...
  end subroutine ContainedSubroutineToProcessISeeYou
end subroutine YourOuterScopedSubroutine

You will have to work out what needs to be shared or private.

The benefit of using the task structure is experienced on the boundary situations where you (formerly) decided whether to parallelize the outer region or inner region, the task method automatically balances the load choice. If the inner level has relatively small processing load, then you might consider having the inner loop take a stride of 2 or more (then conditionally executing up to a stride full of iterations without tasking).

Jim Dempsey

0 Kudos
David_DiLaura1
834 Views

Thanks Jim, I'll try both suggestions. Surely there's a way to make this work.

David

0 Kudos
Reply