- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a piece of code that ends in deadlock whenever I throw many threads after it.
Essentially the code looks like the one below. Now what happens when I run this code with 20 threads is that 1 thread will get through the routine and be stuck at the next barrier, while the remaining 19 threads get stuck at the last $OMP TASKWAIT. What I do not understand is how these task can get stuck at the TASKWAIT?
The way I understand a taskwait is that, when encountered by a thread. The thread checks whether any task have been created and if there are any tasks, it starts to execute them, once no more tasks are encountered it moves past the taskwait. Now obviously this isn't what happens here, so I'm hoping someone can illuminate me in what I'm doing wrong and what I'm misunderstanding about taskwait.
!$OMP PARALLEL do !$OMP CRITICAL (refine) !Locate NTask that need to be done !$OMP END CRITICAL (refine) !Now we create all the tasks that need to be done. if (NTask.eq.0) ExitAutosampling=.true. do i=1,NTask !$OMP TASK call Refinement(Task(i)) !$OMP END TASK end do !$OMP TASKWAIT !execute whatever tasks are defined if(ExitAutosampling) then !$OMP CRITICAL (done) ThreadsDone=ThreadsDone+1 !Counts the number of threads that are done spawning tasks !$OMP END CRITICAL (done) do if(ThreadsDone.ge.NCPUs) then !Once all threads reaches this point, we just need to finish the remaining tasks and we are done. !$OMP TASKWAIT exit else call sleep(0.1) !Not all threads are done spawning tasks, so we wait a bit and try again !$OMP TASKWAIT !<--- This is where most threads get stuck end if end do exit end if end do
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Generally you program like this:
!$OMP PARALLEL ! all threeads running here !$OMP MASTER ! only master running here do !Locate NTask that need to be done ... !Now we create all the tasks that need to be done. if (NTask.eq.0) Exit do i=1,NTask !$OMP TASK call Refinement(Task(i)) !$OMP END TASK end do !$OMP TASKWAIT !execute whatever tasks are defined end do !$OMP END MASTER !$OMP END PARALLEL
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, your original code (unusual as it was), assumes (ThreadsDone.ge.NCPUs) will be true at some point.
You did not show that ThreadsDone was initialized to 0, and you are assuming that NCPUs (presumably obtained somewhere else) is the same number of threads used by this parallel region (which is not necessarily the case).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
Generally you program like this:
I'm aware that is the normal way to do it, but the reason why I program it my way is that I want all threads to be able to spawn more tasks, mainly because each task has to possibility of leading to more tasks, and I don't want to keep the master thread idle, just spawning tasks, or engaging it and risk all the other treads ending up idle, while waiting for the master thread to finish so it can spawn more tasks.
ThreadsDone is initialised to zero, and NCPUs is also set correctly, and this is clear to be working since the last thread will actually trigger the exit and leave the subroutine, while the remaining threads will be stuck at the last TASKWAIT.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The taskwait statement is a barrier. You have two taskwait statements with only the final task stopping at the alternate taskwait. At this point the other task threads are patiently waiting for the final task thread to hit thier barrier but it it never does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrew Smith wrote:
The taskwait statement is a barrier. You have two taskwait statements with only the final task stopping at the alternate taskwait. At this point the other task threads are patiently waiting for the final task thread to hit thier barrier but it it never does.
I guess this is the part I do not understand. Is it a barrier where no threads can pass before all threads hit it or is it a barrier where no thread can pass as long as there are scheduled tasks remaining, my understanding was that it was the latter, but what you are saying makes sense with the deadlock I'm currently getting.
Is there any way of telling all the idle threads to pick up any remaining tasks then, since taskwait does not get the job done?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Added to Andrew's comment, in your original post:
a) all threads were execuiting the critical section, doing something including specifying a value for Ntasks. Your !$OMP PARALLEL does not state that NTasks is private, so presumably, all threads will end up with the same value for NTasks .AND. that they will not modify NTasks after the first thread makes it through the critical section. It does not seem plausible that this requirement would be met.
b) The do i=1,NTasks loop is also executed by all threads. IOW call Refinement(Task(i)) is called nThreads*NTasks number of times (each specific i executed nThreads times). The only way (I don't like using "only") for this to be meaningful is if the array Task is in Thread Private storage. If this is true, then you should state this in your first post. Also, if this be true, then each thread could have potentially generated a different set of tasks, i.e. different NTasks, but NTasks is shared. Something isn't right here.
c) in the existing design, you have no master task spawning thread. IOW each thread has its own "root" level. Thus the taskwait in the main loop level is executed by each(all)l thread(s), thus permitting each thread to pass out of the taskwait when all tasks that that thread spawned complete. IOW the first taskwait does not act like a barrier to all threads, it only acts like a barrier to all tasks it spawns and tasks that those tasks spawned etc...
Now, before I continue critiquing your code, you need to resolve or explain a), b), and c).
I think you should relook a the code sample I provided. You may find that it meets your needs.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
Added to Andrew's comment, in your original post:
a) all threads were execuiting the critical section, doing something including specifying a value for Ntasks. Your !$OMP PARALLEL does not state that NTasks is private, so presumably, all threads will end up with the same value for NTasks .AND. that they will not modify NTasks after the first thread makes it through the critical section. It does not seem plausible that this requirement would be met.
I did not specify any of my variables, because I wanted to keep the code simple and clean, and talk more about the conceptual idea behind the code example than the actual code.
But yes, all threads run the critical section and generate tasks, and this is done in such a way that once one thread locks in a task, the other threads cannot lock in the same task.
So yes NTask is private.
jimdempseyatthecove wrote:
b) The do i=1,NTasks loop is also executed by all threads. IOW call Refinement(Task(i)) is called nThreads*NTasks number of times (each specific i executed nThreads times). The only way (I don't like using "only") for this to be meaningful is if the array Task is in Thread Private storage. If this is true, then you should state this in your first post. Also, if this be true, then each thread could have potentially generated a different set of tasks, i.e. different NTasks, but NTasks is shared. Something isn't right here.
Task is indeed thread private as well
jimdempseyatthecove wrote:
c) in the existing design, you have no master task spawning thread. IOW each thread has its own "root" level. Thus the taskwait in the main loop level is executed by each(all)l thread(s), thus permitting each thread to pass out of the taskwait when all tasks that that thread spawned complete. IOW the first taskwait does not act like a barrier to all threads, it only acts like a barrier to all tasks it spawns and tasks that those tasks spawned etc...Now, before I continue critiquing your code, you need to resolve or explain a), b), and c).
And this is really the crucial issue here, It was my expectation that the taskwait would force all tasks to be completed, but I can now see that this is not the case.
jimdempseyatthecove wrote:
I think you should relook a the code sample I provided. You may find that it meets your needs.
The problem with the coding sample is that it will perform suboptimal, as previously noted. That being said perhaps I can make mixing the two methods which performs well, I just can't afford to only spawn tasks from the master thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> because I wanted to keep the code simple and clean
Sometimes (as in this case) making it too simple complicates the effort required of your responders.
With your #8 response, I will list my assumptions about your code (I can only base my assumptions on the information provided).
a) Each thread in the parallel region (line 1 in your original example) runs the same do until done loop, and produces a per thread task list (in private array Task.
b) As each thread completes their initial task list, it checks to see if those tasks generated attritional tasks to be performed subsequent to the current run of the thread's task list.
c) When the thread does not generate additional tasks, then it is free to work on other threads task list (this is a stretched assumption on my paret).
Potential problems (due to lack of detail in your description):
It is unclear as to if each spawned task (line 9 in #1) enters additional tasks into a common pool that is later picked from via the critical section. Or if the additional tasks are entered into a list that is unique to the thread. Should the latter be true, then there would be no need for the critical section.
IIF the case is for the common pool of additional tasks (generated from initial spawned threads) then you have the issue of the taskwait (line 14 #1) being a per-thread barrier. To correct for this, add !$OMP BARRIER following this taskwait.
Now comes the additional problem at if(ExitAutosampling)
It is possible that on some iteration that NTasks is 0 for some threads and non-0 for other threads, yet because of common pool of next iteration generated threads, it would have been possible for that next iteration to have generated sufficient tasks to keep all threads of the parallel region active. It appears (my assumption) that your intention is for this thread to cross over to work on task spawned by other threads.
For this to remotely work as I think you want it to work, consider (untested)
NTaskNECount = 0 !$OMP PARALLEL shared(NTaskNECount) do !$OMP CRITICAL (refine) ... !Locate NTask that need to be done if(NTask > 0) NTaskNECount = NTaskNECount + 1 ! shared count of things to do found by this thread !$OMP END CRITICAL (refine) if(NTaskNECount == 0) exit if(NTask == 0) then call sleepqq(1000) ! 1ms (change as required) ! Note taskwait is meaningless here (nothing pending for this thread) ! This thread cannot perform other threads root ! nor root siblings nor rood sibling decendent tasks ! at this point in time. ! however on next/future iteration above, it may receive a non-zero NTask ! generated by other threads else !Now we create all the (selected) tasks that need to be done. do i=1,NTask !$OMP TASK call Refinement(Task(i)) !$OMP END TASK end do !$OMP TASKWAIT !execute whatever tasks are defined *** by this thread *** !$OMP CRITICAL (refine) ! now decriment NTask .ne. 0 counter (*** use same named critical section as above) NTaskNECount = NTaskNECount - 1 ! do not use atomic, as it would not be coordinated with first critical seciton !$OMP END CRITICAL (refine) endif end do
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RE: (NTask == 0) sleepqq
You might want to explore using untied tasks (clause).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
a) Each thread in the parallel region (line 1 in your original example) runs the same do until done loop, and produces a per thread task list (in private array Task.
b) As each thread completes their initial task list, it checks to see if those tasks generated attritional tasks to be performed subsequent to the current run of the thread's task list.
c) When the thread does not generate additional tasks, then it is free to work on other threads task list (this is a stretched assumption on my paret).
Yes to all 3.
jimdempseyatthecove wrote:
It is unclear as to if each spawned task (line 9 in #1) enters additional tasks into a common pool that is later picked from via the critical section. Or if the additional tasks are entered into a list that is unique to the thread. Should the latter be true, then there would be no need for the critical section.
jimdempseyatthecove wrote:
!$OMP PARALLEL shared(NTaskNECount) do !$OMP CRITICAL (refine) ... !Locate NTask that need to be done if(NTask > 0) NTaskNECount = NTaskNECount + 1 ! shared count of things to do found by this thread !$OMP END CRITICAL (refine) if(NTaskNECount == 0) exit if(NTask == 0) then call sleepqq(1000) ! 1ms (change as required) ! Note taskwait is meaningless here (nothing pending for this thread) ! This thread cannot perform other threads root ! nor root siblings nor rood sibling decendent tasks ! at this point in time. ! however on next/future iteration above, it may receive a non-zero NTask ! generated by other threads else !Now we create all the (selected) tasks that need to be done. do i=1,NTask !$OMP TASK call Refinement(Task(i)) !$OMP END TASK end do !$OMP TASKWAIT !execute whatever tasks are defined *** by this thread *** !$OMP CRITICAL (refine) ! now decriment NTask .ne. 0 counter (*** use same named critical section as above) NTaskNECount = NTaskNECount - 1 ! do not use atomic, as it would not be coordinated with first critical seciton !$OMP END CRITICAL (refine) endif end do
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on your suggestions Jim, I decided to start simple and then try to optimize further afterwards, but even my simple case is behaving unexpected. (Note that the code is once again simplified, and not actual code)
The following should be about as close to a classic example as I can get, and yet it still ends in a deadlock, does anyone know why?
It runs fine, when I throw only a few threads at it 1-10 threads, but once I throw 15-20 threads at it, it always ends in a deadlock, and I don't understand why.
When I run it in debugging mode in visual studio, it ends in a deadlock, and if I then pause and check, all 20 active threads appear to be at the last omp barrier, but seems unable to go past it. Any suggestions for what I'm doing wrong?
!$OMP PARALLEL do j=1,15! Just make sure to run enough times to get them all !$OMP MASTER do k=1,30 !Just enough times to make sure plenty of tasks are spawned for all threads !$OMP CRITICAL (refine) !Locate and lock in task !$OMP END CRITICAL (refine) !$OMP TASK !Compute f(x) !$OMP END TASK end do !$OMP END MASTER !$OMP TASKWAIT end do !$OMP BARRIER !For some reason all threads reach this point in debugging, but none are able to move past it. !$OMP END PARALLEL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The taskwait belongs inside master (for the above code).
The barrier (in above code) will inhibit non-master threads (reaching the barrier) from servicing tasks submitted by the master thread.
The end parallel has an implicit barrier (excepting that the threads reaching the end of the region are permitted to join in task processing).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
The taskwait belongs inside master (for the above code).
jimdempseyatthecove wrote:
The barrier (in above code) will inhibit non-master threads (reaching the barrier) from servicing tasks submitted by the master thread.
The end parallel has an implicit barrier (excepting that the threads reaching the end of the region are permitted to join in task processing).
This is not what I see, when I run my example, all the threads appear to be working on the tasks like they are supposed to. But even if the barrier prevented the non-master threads, everything should still be able to move past the barrier once the master thread is done with all the tasks no? (Which is not what happens)
The reason for the barrier is that the whole code is parallelized much further out, and needs to remain in parallelization after this refinement is done, and before I can start the next job I need to ensure that all tasks are completed here.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page