Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP deadlock in Taskwait

Tue_B_
Novice
1,057 Views

I have a piece of code that ends in deadlock whenever I throw many threads after it.

Essentially the code looks like the one below. Now what happens when I run this code with 20 threads is that 1 thread will get through the routine and be stuck at the next barrier, while the remaining 19 threads get stuck at the last $OMP TASKWAIT. What I do not understand is how these task can get stuck at the TASKWAIT?

The way I understand a taskwait is that, when encountered by a thread. The thread checks whether any task have been created and if there are any tasks, it starts to execute them, once no more tasks are encountered it moves past the taskwait. Now obviously this isn't what happens here, so I'm hoping someone can illuminate me in what I'm doing wrong and what I'm misunderstanding about taskwait.

 

!$OMP PARALLEL       
do 
  !$OMP CRITICAL (refine)
    !Locate NTask that need to be done
  !$OMP END CRITICAL (refine)   
  !Now we create all the tasks that need to be done.
  if (NTask.eq.0)  ExitAutosampling=.true.
  do i=1,NTask
    !$OMP TASK    
      call Refinement(Task(i))
    !$OMP END TASK
  end do
   
  !$OMP TASKWAIT    !execute whatever tasks are defined
  if(ExitAutosampling) then
    !$OMP CRITICAL (done)
      ThreadsDone=ThreadsDone+1 !Counts the number of threads that are done spawning tasks
    !$OMP END CRITICAL (done)  
    do 
      if(ThreadsDone.ge.NCPUs) then !Once all threads reaches this point, we just need to finish the remaining tasks and we are done.
        !$OMP TASKWAIT   
        exit          
      else
        call sleep(0.1)  !Not all threads are done spawning tasks, so we wait a bit and try again
        !$OMP TASKWAIT   !<--- This is where most threads get stuck
      end if
    end do
    exit
  end if
end do   

 

0 Kudos
13 Replies
jimdempseyatthecove
Honored Contributor III
1,057 Views

Generally you program like this:

!$OMP PARALLEL
! all threeads running here
!$OMP MASTER
! only master running here
do 
    !Locate NTask that need to be done
    ...
  !Now we create all the tasks that need to be done.
  if (NTask.eq.0)  Exit
  do i=1,NTask
    !$OMP TASK    
      call Refinement(Task(i))
    !$OMP END TASK
  end do
   
  !$OMP TASKWAIT    !execute whatever tasks are defined
end do   
!$OMP END MASTER
!$OMP END PARALLEL

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,057 Views

Also, your original code (unusual as it was), assumes (ThreadsDone.ge.NCPUs) will be true at some point.

You did not show that ThreadsDone was initialized to 0, and you are assuming that NCPUs (presumably obtained somewhere else) is the same number of threads used by this parallel region (which is not necessarily the case).

Jim Dempsey

0 Kudos
Tue_B_
Novice
1,057 Views

jimdempseyatthecove wrote:

Generally you program like this:

I'm aware that is the normal way to do it, but the reason why I program it my way is that I want all threads to be able to spawn more tasks, mainly because each task has to possibility of leading to more tasks, and I don't want to keep the master thread idle, just spawning tasks, or engaging it and risk all the other treads ending up idle, while waiting for the master thread to finish so it can spawn more tasks.

ThreadsDone is initialised to zero, and NCPUs is also set correctly, and this is clear to be working since the last thread will actually trigger the exit and leave the subroutine, while the remaining threads will be stuck at the last TASKWAIT. 

 

0 Kudos
Andrew_Smith
New Contributor III
1,056 Views

The taskwait statement is a barrier. You have two taskwait statements with only the final task stopping at the alternate taskwait. At this point the other task threads are patiently waiting for the final task thread to hit thier barrier but it it never does.

0 Kudos
Tue_B_
Novice
1,056 Views

Andrew Smith wrote:

The taskwait statement is a barrier. You have two taskwait statements with only the final task stopping at the alternate taskwait. At this point the other task threads are patiently waiting for the final task thread to hit thier barrier but it it never does.

I guess this is the part I do not understand. Is it a barrier where no threads can pass before all threads hit it or is it a barrier where no thread can pass as long as there are scheduled tasks remaining, my understanding was that it was the latter, but what you are saying makes sense with the deadlock I'm currently getting.

Is there any way of telling all the idle threads to pick up any remaining tasks then, since taskwait does not get the job done?

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,056 Views

Added to Andrew's comment, in your original post:

a) all threads were execuiting the critical section, doing something including specifying a value for Ntasks. Your !$OMP PARALLEL does not state that NTasks is private, so presumably, all threads will end up with the same value for NTasks  .AND. that they will not modify NTasks after the first thread makes it through the critical section. It does not seem plausible that this requirement would be met.

b) The do i=1,NTasks loop is also executed by all threads. IOW call Refinement(Task(i)) is called nThreads*NTasks number of times (each specific i executed nThreads times). The only way (I don't like using "only") for this to be meaningful is if the array Task is in Thread Private storage. If this is true, then you should state this in your first post. Also, if this be true, then each thread could have potentially generated a different set of tasks, i.e. different NTasks, but NTasks is shared. Something isn't right here.

c) in the existing design, you have no master task spawning thread. IOW each thread has its own "root" level. Thus the taskwait in the main loop level is executed by each(all)l thread(s), thus permitting each thread to pass out of the taskwait when all tasks that that thread spawned complete. IOW the first taskwait does not act like a barrier to all threads, it only acts like a barrier to all tasks it spawns and tasks that those tasks spawned etc...

Now, before I continue critiquing your code, you need to resolve or explain a), b), and c).

I think you should relook a the code sample I provided. You may find that it meets your needs.

Jim Dempsey

0 Kudos
Tue_B_
Novice
1,055 Views

jimdempseyatthecove wrote:

Added to Andrew's comment, in your original post:

a) all threads were execuiting the critical section, doing something including specifying a value for Ntasks. Your !$OMP PARALLEL does not state that NTasks is private, so presumably, all threads will end up with the same value for NTasks  .AND. that they will not modify NTasks after the first thread makes it through the critical section. It does not seem plausible that this requirement would be met.

I did not specify any of my variables, because I wanted to keep the code simple and clean, and talk more about the conceptual idea behind the code example than the actual code. 

But yes, all threads run the critical section and generate tasks, and this is done in such a way that once one thread locks in a task, the other threads cannot lock in the same task. 

So yes NTask is private.

jimdempseyatthecove wrote:

b) The do i=1,NTasks loop is also executed by all threads. IOW call Refinement(Task(i)) is called nThreads*NTasks number of times (each specific i executed nThreads times). The only way (I don't like using "only") for this to be meaningful is if the array Task is in Thread Private storage. If this is true, then you should state this in your first post. Also, if this be true, then each thread could have potentially generated a different set of tasks, i.e. different NTasks, but NTasks is shared. Something isn't right here.

Task is indeed thread private as well

jimdempseyatthecove wrote:

c) in the existing design, you have no master task spawning thread. IOW each thread has its own "root" level. Thus the taskwait in the main loop level is executed by each(all)l thread(s), thus permitting each thread to pass out of the taskwait when all tasks that that thread spawned complete. IOW the first taskwait does not act like a barrier to all threads, it only acts like a barrier to all tasks it spawns and tasks that those tasks spawned etc...

Now, before I continue critiquing your code, you need to resolve or explain a), b), and c).

And this is really the crucial issue here, It was my expectation that the taskwait would force all tasks to be completed, but I can now see that this is not the case.

jimdempseyatthecove wrote:

I think you should relook a the code sample I provided. You may find that it meets your needs.

The problem with the coding sample is that it will perform suboptimal, as previously noted. That being said perhaps I can make mixing the two methods which performs well, I just can't afford to only spawn tasks from the master thread.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,056 Views

>> because I wanted to keep the code simple and clean

Sometimes (as in this case) making it too simple complicates the effort required of your responders.

With your #8 response, I will list my assumptions about your code (I can only base my assumptions on the information provided).

a) Each thread in the parallel region (line 1 in your original example) runs the same do until done loop, and produces a per thread task list (in private array Task.

b) As each thread completes their initial task list, it checks to see if those tasks generated attritional tasks to be performed subsequent to the current run of the thread's task list.

c) When the thread does not generate additional tasks, then it is free to work on other threads task list (this is a stretched assumption on my paret).

Potential problems (due to lack of detail in your description):

It is unclear as to if each spawned task (line 9 in #1) enters additional tasks into a common pool that is later picked from via the critical section. Or if the additional tasks are entered into a list that is unique to the thread. Should the latter be true, then there would be no need for the critical section.

IIF the case is for the common pool of additional tasks (generated from initial spawned threads) then you have the issue of the taskwait (line 14 #1) being a per-thread barrier. To correct for this, add !$OMP BARRIER following this taskwait.

Now comes the additional problem at if(ExitAutosampling)

It is possible that on some iteration that NTasks is 0 for some threads and non-0 for other threads, yet because of common pool of next iteration generated threads, it would have been possible for that next iteration to have generated sufficient tasks to keep all threads of the parallel region active. It appears (my assumption) that your intention is for this thread to cross over to work on task spawned by other threads.

For this to remotely work as I think you want it to work, consider (untested)

NTaskNECount = 0
!$OMP PARALLEL shared(NTaskNECount)
do 
  !$OMP CRITICAL (refine)
    ... !Locate NTask that need to be done
    if(NTask > 0) NTaskNECount = NTaskNECount + 1 ! shared count of things to do found by this thread
  !$OMP END CRITICAL (refine)
  if(NTaskNECount == 0) exit
  if(NTask == 0) then
    call sleepqq(1000) ! 1ms (change as required)
    ! Note taskwait is meaningless here (nothing pending for this thread)
    ! This thread cannot perform other threads root
    ! nor root siblings nor rood sibling decendent tasks
    ! at this point in time.
    ! however on next/future iteration above, it may receive a non-zero NTask
    ! generated by other threads
  else   
    !Now we create all the (selected) tasks that need to be done.
    do i=1,NTask
      !$OMP TASK    
        call Refinement(Task(i))
      !$OMP END TASK
    end do
    !$OMP TASKWAIT    !execute whatever tasks are defined *** by this thread ***
    !$OMP CRITICAL (refine)
      ! now decriment NTask .ne. 0 counter (*** use same named critical section as above)
      NTaskNECount = NTaskNECount - 1 ! do not use atomic, as it would not be coordinated with first critical seciton
    !$OMP END CRITICAL (refine)
  endif
end do   

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,056 Views

RE: (NTask == 0) sleepqq

You might want to explore using untied tasks (clause).

Jim Dempsey

 

0 Kudos
Tue_B_
Novice
1,056 Views

jimdempseyatthecove wrote:

a) Each thread in the parallel region (line 1 in your original example) runs the same do until done loop, and produces a per thread task list (in private array Task.

b) As each thread completes their initial task list, it checks to see if those tasks generated attritional tasks to be performed subsequent to the current run of the thread's task list.

c) When the thread does not generate additional tasks, then it is free to work on other threads task list (this is a stretched assumption on my paret).

Yes to all 3.

 

jimdempseyatthecove wrote:

It is unclear as to if each spawned task (line 9 in #1) enters additional tasks into a common pool that is later picked from via the critical section. Or if the additional tasks are entered into a list that is unique to the thread. Should the latter be true, then there would be no need for the critical section.

Essentially my code needs to find f(x) for 169 values of x, now around half of these values can be interpolated if the x values being calculated are done smartly. So each task is a computation of f(x) for an additional x value. 
The reason for the critical section is to make sure no threads locks in the same values of x for calculation.
 
So the initial idea was that all threads should put tasks in a common pool, which all threads should then be able to operate on. 
 
 
jimdempseyatthecove wrote:
NTaskNECount = 0
!$OMP PARALLEL shared(NTaskNECount)
do 
  !$OMP CRITICAL (refine)
    ... !Locate NTask that need to be done
    if(NTask > 0) NTaskNECount = NTaskNECount + 1 ! shared count of things to do found by this thread
  !$OMP END CRITICAL (refine)
  if(NTaskNECount == 0) exit
  if(NTask == 0) then
    call sleepqq(1000) ! 1ms (change as required)
    ! Note taskwait is meaningless here (nothing pending for this thread)
    ! This thread cannot perform other threads root
    ! nor root siblings nor rood sibling decendent tasks
    ! at this point in time.
    ! however on next/future iteration above, it may receive a non-zero NTask
    ! generated by other threads
  else   
    !Now we create all the (selected) tasks that need to be done.
    do i=1,NTask
      !$OMP TASK    
        call Refinement(Task(i))
      !$OMP END TASK
    end do
    !$OMP TASKWAIT    !execute whatever tasks are defined *** by this thread ***
    !$OMP CRITICAL (refine)
      ! now decriment NTask .ne. 0 counter (*** use same named critical section as above)
      NTaskNECount = NTaskNECount - 1 ! do not use atomic, as it would not be coordinated with first critical seciton
    !$OMP END CRITICAL (refine)
  endif
end do   

I'm about to head home now, but I will try to incorporate your suggestion tomorrow and see whether it pans out.

 

0 Kudos
Tue_B_
Novice
1,056 Views

Based on your suggestions Jim, I decided to start simple and then try to optimize further afterwards, but even my simple case is behaving unexpected. (Note that the code is once again simplified, and not actual code) 

The following should be about as close to a classic example as I can get, and yet it still ends in a deadlock, does anyone know why?

It runs fine, when I throw only a few threads at it 1-10 threads, but once I throw 15-20 threads at it, it always ends in a deadlock, and I don't understand why.

 When I run it in debugging mode in visual studio, it ends in a deadlock, and if I then pause and check, all 20 active threads appear to be at the last omp barrier, but seems unable to go past it. Any suggestions for what I'm doing wrong?

!$OMP PARALLEL    
   do j=1,15! Just make sure to run enough times to get them all
     !$OMP MASTER 
       do k=1,30 !Just enough times to make sure plenty of tasks are spawned for all threads    
         !$OMP CRITICAL (refine)
           !Locate and lock in task
         !$OMP END CRITICAL (refine)
         !$OMP TASK         
           !Compute f(x) 
         !$OMP END TASK
       end do      
     !$OMP END MASTER
     !$OMP TASKWAIT
    end do    
    !$OMP BARRIER !For some reason all threads reach this point in debugging, but none are able to move past it.
!$OMP END PARALLEL    

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,056 Views

The taskwait belongs inside master (for the above code).

The barrier (in above code) will inhibit non-master threads (reaching the barrier) from servicing tasks submitted by the master thread.
The end parallel has an implicit barrier (excepting that the threads reaching the end of the region are permitted to join in task processing).

Jim Dempsey

 

0 Kudos
Tue_B_
Novice
1,056 Views

jimdempseyatthecove wrote:

The taskwait belongs inside master (for the above code).

I'm still not sure why that would be the case, but I tried it and it had no influence on the deadlock however.
 
 

jimdempseyatthecove wrote:

The barrier (in above code) will inhibit non-master threads (reaching the barrier) from servicing tasks submitted by the master thread.
The end parallel has an implicit barrier (excepting that the threads reaching the end of the region are permitted to join in task processing).

This is not what I see, when I run my example, all the threads appear to be working on the tasks like they are supposed to. But even if the barrier prevented the non-master threads, everything should still be able to move past the barrier once the master thread is done with all the tasks no? (Which is not what happens)

The reason for the barrier is that the whole code is parallelized much further out, and needs to remain in parallelization after this refinement is done, and before I can start the next job I need to ensure that all tasks are completed here. 

0 Kudos
Reply