- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a context where I first setup a bunch of custom tasks which all have a member object saying if the task should be rescheduled or not.
Basically, my custom task type execute() function looks like this:
[cpp]
tbb::task* execute() override
{
// ... here the code that execute some task specific work
switch( task_result )
{
case( RESCHEDULE ):
{
// Make sure this task is re-executed as required.
// See: http://stackoverflow.com/questions/11847689/how-to-reschedule-a-concurrent-task/11849373#11849373
this->increment_ref_count();
this->recycle_as_safe_continuation();
break;
}
case( END ):
{
// No need for rescheduling.
break;
}
default:
{
UCX_ASSERT_IMPOSSIBLE( "Unmanaged task result : '" << to_string(task_result) << "' Task: " << to_string(m_task.id()) );
break;
}
}
return nullptr;
}
[/cpp]
All my task instances are doing this. To instantiate them, I'm using the trick described in the doc to have the main thread waiting for all tasks to end. So I have an emty root task and each time I create a task I allocate it as a child of the root task, then I spawn it in the root task.
Currently I'm having a behaviour, where a task that is setup to reschedule until some specific time has passed is "blocking" the execution of other tasks.
Once this task is not rescheduling anymore, it ends and the other tasks are executed.
However, I need the other tasks to be executed in parallel. The current load isn't much
but I suspect my way of rescheduling tasks to be wrong
I don't have much data, and I'm not a multithreading specialist yet so maybe I'm wrong.
Anyway, I'm looking for a way to make sure that when I reschedule a task, it is put as last of the task queue of the worker thread, to make it easily stealable by other threads.
Each one of these tasks I'm spawning, when rescheduled, shouldn't be continuations of themselves as the current code is written,
they should just be rescheduled to be executed later, not immediately.
Unfortunately I tweaked the rescheduling code but didn't find any way to do this so far as any other code than the present one will assert in TBB code if I try a variant.
So I'm asking if there is a way safe way to reschedule tasks not as continuation but as if it was just spawned and put in some worker thread queue?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Instead of (atomically) incrementing the reference count before calling recycle_as_safe_continuation(), simply set it to 1. The meaning of continuation is not about when it will be executed next but about having the same position in the hierarchy of dependencies as an unordered tree, and its timing will not be any different from that of a newly allocated and spawned task, as tasks are always added to the side of the deque where the worker thread takes the next task to execute (other threads steal from the other side). If you want near-FIFO scheduling, use enqueue() instead of spawn().I did both suggestions with no apparent change of behaviour. I suppose setting 1 as ref count and using enqueue() instead of spawn() is indeed what I originally wanted so I'll keep the code this way for now. I still have a task blocking other tasks anyway.
Maybe it's also useful to explain what you want to achieve, for a second opinion about this arrangement.Other than just having tasks that are guaranteed to not be interdependant in execution thread (which I supposed TBB would avoid as much as possible), I'm not sure what more to add. One important thing is that the rescheduling system (in my custom task class) is used to "delay" the execution of a taks too. What I mean is that the execution code will first check if it is "time" (based on a virtual time -that can be slower or paused- or on real time) to really execute the task. If not it just reschedule (after checking a potential other condition to not reschedule). Basically I want my tasks to be able to 1. delay their execution 2. be rescheduled an undefined times For 1) I don't know how to tell TBB that it can forget about the task for some time (is there one?), so I used the simpler way for now: reschedule until execution. For 2) the current code would be ok if it wasn't apparently blocking tasks when the task is just rescheduling until real execution. Strangely, I have other tasks that reschedule the same way than the blocking task but it's the only one that block some others. Maybe it's in my code, I'm actively trying to find the source of this but it's not obvious so far. Any suggestions on reorganizing this code to achieve this in an efficient way is welcome.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One important thing is that the rescheduling system (in my custom task class) is used to "delay" the execution of a taks too. What I mean is that the execution code will first check if it is "time" (based on a virtual time -that can be slower or paused- or on real time) to really execute the task. If not it just reschedule (after checking a potential other condition to not reschedule).This means that you have many tasks constantly polling, creating a lot of overhead. Even worse, as I neglected to point out before, when you are rescheduling as a safe continuation with a reference count of one, at the end of execute() the scheduler would have to assume that this is a task that probably had children/predecessors that were spawned, stolen and have all finished executing, so I would expect that it would immediately spawn the task (putting it at the head of the queue where it would be picked up next) or even bypass that step as a possible or actual (?) internal optimisation when execute() returns nullptr, in either case effectively causing the worker thread to busy-wait for the right "time". But then I don't see how changing to enqueue() in run_and_wait() could possibly make any difference; maybe someone else could point out something I have overlooked? (Added) Is a previously enqueued task that is rescheduled as a continuation always enqueued again instead of spawned, perhaps?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(Added) Is a previously enqueued task that is rescheduled as a continuation always enqueued again instead of spawned, perhaps?I'm not sure but I tried to enqueue explicitely but it can't work before returning from execute() so I guess that it gets re-enqueued as you say. We really need a specialist to clarify this because I don't understand all the TBB code...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:I'm using a Core i7 M 620, it have 4 core in the task manager (hyper-threading seem deactivated).
Another part of the explanation would be that you have only two hardware threads that are kept busy polling, leaving no resources available to steal and execute a third task. I don't assume you are trying this on a single core, are you? That would be even more problematic. TBB encourages optional parallelism without mandatory concurrency (of any degree), and this is quite the opposite of that.
I will try a variant: I have a task that update virtual clocks. I could make the clock update check if there are tasks to enqueue. I believe that would be equivalent. Transfering the task body from one task instance to the other seems possible, I'll check exactly what it implies.If such is the case (and even if it's not), the solution is to get rid of the busy-waiting.
At least at first you should probably try without recycling. As an experiment, just enqueue() (not spawn()) a copy of the new task with the same polling logic (which can be allocated as an additional child of the dummy root task), and let the current task be automatically deleted at the end of the execute().
Next, instead of immediately enqueue()'ing the new task, give it to an independent thread that will enqueue() or perhaps spawn() it at the appropriate time, and stop polling inside the task.
Once that works, you can start thinking about recycling again. If you're worried about the cost of not recycling in the interim, consider embedding a pointer to an allocated structure that you hand over from one task to the next. Otherwise, you'll have to have some way to assure that a task is not executed again while it is still executing from the previous run, which could happen owing to unfortunate timing related to preemption unless you either have an external thread poll for the reference count to drop or use a dummy child that is spawned at the appropriate time (which implies automatic spawn() of its parent), which may defeat any gains from recycling anyway. It would be nice to have some more hooks into the scheduler, but these are the workarounds familiar to me.
This is not a pipeline I'm setting up. Several different system have updating tasks that need to be executed in repetition. It's ok if they are busy, what's not ok is if one task blocks all the others. All the tasks are designed to be non-blocking. I first tried the Flow Graph as it indeed looks like what I was trying to do (but it's a bit less flexible). I hit a problem at the time after much experiments. I don't remember exactly yet what I figured but it made me think it was not the appropriate construct for my case.That's only if you really really want to roll your own task-based solution, of course, because I still don't know why you wouldn't just use one of the existing algorithms instead (a pipeline comes to mind, where the input stage could pick the next item from a time-sorted heap of future events after consuming them from a concurrent queue).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using a Core i7 M 620, it have 4 core in the task manager (hyper-threading seem deactivated).Then it remains a mystery to me how one out of three tasks (as the exact grand total?) could possibly be blocked.
This is not a pipeline I'm setting up. Several different system have updating tasks that need to be executed in repetition.It all depends on what you call a task: TBB tasks are more of an underlying mechanism for implementing higher-level algorithms, and recycling is more of an optimisation thing than a semantics thing, so you could definitely consider another mapping than one-to-one between concepts that merely happen to share the name "task". I initially thought of parallel_do/while(), but managing the order in a separate input stage made me consider a pipeline; if the fixed number of tokens in flight is no obstacle, why not (rhetorical)? But I don't know enough about the situation to say more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:It is not the grand total: there are more tasks, much of them being executed only once, some of them rescheduling until the end of the application (which is a game - but not architectured in a classical way).Quote:
I'm using a Core i7 M 620, it have 4 core in the task manager (hyper-threading seem deactivated).Then it remains a mystery to me how one out of three tasks (as the exact grand total?) could possibly be blocked.
Let me try to clarify then. First, I will use algorithms from tbb over this system. I'm not avoiding high level constructs, I just needed something different in addition of them. Second, the system in question is just a layer over the Task Scheduler. It's goal is to abstract it, in case I want to spawn a task manually, because I know I will have to port the system in other platforms that TBB can't compile in. Third, I'm in a context where the workloads widely differs in the second to second. However, I'm using a differnt task system outside my task scheduler abstraction. Forget about TBB for a minute, I'll get back to it. My task types are described lilke that: Task‹T› the template parameter being the kind of input it will work with. These Task objects can be put (moved, to be precise) into two kind of systems: - my task scheduler abstraction (which only accept Task‹void›): in this case it is assumed that the task will be executed by an unknown thread and that if rescheduled, there is no guarantee that it will always be the same thread. In practice, once moved into my task scheduler abstraction, I create a custom TBB-derived task instance, called AsyncTask_TBB, which can hold the Task and knows how to execute it correctly (taking into account it's infos, if it should be rescheduled, until when, if it have a end condition etc.). - TaskChain‹T› : it's basically a container which contain Task‹T›. It's thread-safe (at least In designed it this way but I don't have another pair of eyes to check... so far it seems to be correct). It have a execute( const T& ) function that will go through all the contained tasks in order, execute them (if enough time passed since last execute() call), remove the tasks that don't need to be rescheduled. I almost used a pipeline for this but a pipeline suppose that the tasks are interdepenent and don't have special behaviour like I just described, so I had to setup my own system. Now, the strategy I've setup is meant to be used this way: - I have several unrelated systems that need to be constantly updated, so they each have one Task‹void› which is moved in the task scheduler. This Task‹void› body will just call the execute() of a TaskChain‹WhateverInputIsNeeded›. This is like setting up different parallel pipelines that loop. - I have other systems that rely on the previously described system. They need to be updated in sync with their parent system. To do this, the parent system expose a way to insert other tasks into it's TaskChain, which means all the tasks related to this parent system will be executed in sync, on the same thread per update, in a specific order. It is important to understand that this TaskChain, the more or less pipeline, will have a variety of tasks being inserted and removed quite frequentely, but not at all update, so it becomes the synchronization bottleneck. - There is also a messaging system where needed but most messaging happen through network. Anyway, I use some message queues. Intuitively, the Flow Graph system was the most obvious choicce to setup this system, but it didn't fill all my needs. - The case with the task that is not executed immediately, delayed, is an exception and I really think it should be one of the parent systems that should keep a queue of these to be executed later, as you described in fact. I tried with rescheduling because I don't have yet such system in place and wanted to setup a delayed task to try. Part of this strategy is to avoid having to expose thread-safe interface for all my systems. Instead, interdependent systems just sync their tasks by using the same TaskChain, which means there is only a mutex lock when inserting a new task (and I believe it can be removed but I'm not sure how yet). The important thing I want to achieve is indeed that the workload between parent systems is automatically distributed by TBB. Which fails apparently sometime because of the way I'm rescheduling? Scheduling a clone instead of rescheduling a task might be a better solution. Semantically, it is. What I fear is the cost of instantiating this clone, which is why I did this. I which I could open source this system to show you and get more expert advice, but it's not possible at the moment. I'm not sure this description is clearer than looking at the code unfortunately.Quote:
This is not a pipeline I'm setting up. Several different system have updating tasks that need to be executed in repetition.It all depends on what you call a task: TBB tasks are more of an underlying mechanism for implementing higher-level algorithms, and recycling is more of an optimisation thing than a semantics thing, so you could definitely consider another mapping than one-to-one between concepts that merely happen to share the name "task". I initially thought of parallel_do/while(), but managing the order in a separate input stage made me consider a pipeline; if the fixed number of tokens in flight is no obstacle, why not (rhetorical)? But I don't know enough about the situation to say more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is not the grand total: there are more tasks, much of them being executed only once, some of them rescheduling until the end of the application (which is a game - but not architectured in a classical way).OK… I thought you might have isolated a minimal reproducer with just 3 tasks. That's another relief! :-)
I know I will have to port the system in other platforms that TBB can't compile inAny hint which one? Maybe it's just "not yet"...
Scheduling a clone instead of rescheduling a task might be a better solution. Semantically, it is. What I fear is the cost of instantiating this clone, which is why I did this.It doesn't have to be a full clone, though. It's like the new C++ move (vs. copy), because the old task's only remaining job is to get deleted, and it's cheap enough to move a pointer. With the scalable allocator, overhead is certainly very manageable, and negligible compared to polling. It would be nicer, though, to have more ways of recycling, because now you can't re-enqueue without polling for the end of the current execute (as far as I know?).
I'm not sure this description is clearer than looking at the code unfortunately.Thanks for trying. Perhaps it will get clearer when I reread it tomorrow morning. :-) (Correction after next posting) i->I
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:Indeed. ^__^;
OK… I thought you might have isolated a minimal reproducer with just 3 tasks. That's another relief! :-)
Well Android, and potentially some console platforms. Frankly it's the least important of the reasons, so you can ignore it.Quote:
I know I will have to port the system in other platforms that TBB can't compile inAny hint which one? Maybe it's just "not yet"...
It looks like it's not a big change in my code so I will first try this see if whatever the order I get good results. Fortunately I isolated the code enough to try different TBB use without changing the semantic of the program (order of task execution is not important in the case they are asynchronous tasks).Quote:
Scheduling a clone instead of rescheduling a task might be a better solution. Semantically, it is. What I fear is the cost of instantiating this clone, which is why I did this.It doesn't have to be a full clone, though. It's like the new C++ move (vs. copy), because the old task's only remaining job is to get deleted, and it's cheap enough to move a pointer. With the scalable allocator, overhead is certainly very manageable, and negligible compared to polling. It would be nicer, though, to have more ways of recycling, because now you can't re-enqueue without polling for the end of the current execute (as far as i know?).
Sorry about that. ;____;Quote:
I'm not sure this description is clearer than looking at the code unfortunately.Thanks for trying. Perhaps it will get clearer when I reread it tomorrow morning. :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me try to clarify then.I hope I picked up a little more on a second reading. It's true that a pipeline always runs items through to the end. I don't know how many applications would benefit from being able to have them dropped somewhere in the middle, perhaps quite a few. Now you have to simulate that (using a special pointer value, or perhaps something like "bool m_zombie"), which is not very elegant and also suboptimal. More seriously, pipelines are also quite unfair to each other, so you would have to run each from a separate application thread (giving them their own arenas to avoid starvation-causing entanglement), but this could lead to oversubscription instead. That limitation definitely warrants an overhaul to allow several pipelines to be run together as a group. Meanwhile, you should use your own solution (especially if you also want to insert additional "stages" while the "pipeline" is running). TBB was originally conceived to optimise throughput, where it didn't matter that sometimes the scheduling was very unfair. In your system, take care to avoid spawn() where it could lead to high-level unfairness, so use enqueue() instead in the right places. Note that running an enqueued task takes priority over stealing a spawned task (last time I looked anyway), in an effort to improve locality (good for performance), but that also means it still does not entirely prioritise minimising backlog, so, while starvation is now avoided, there's still some lingering unfairness that may be relevant in a real-time application, although I haven't seen any complaints about this yet so either it's not very serious or I'm just seeing ghosts. Recycling as a continuation internally does the same as spawn()'ing the recycled task, and you may have to work around that, as you have already experienced. Perhaps others can suggest more general patterns for architecting this kind of application?
Well Android, and potentially some console platforms. Frankly it's the least important of the reasons, so you can ignore it.Android is currently being targeted: check out (as in have a look at) the latest development release.
m_task_scheduler.schedule( std::move( m_task ) );I only used move/copy as an analogy. Maybe there are situations where perhaps locality considerations make move the winner, but don't forget about pointers.
This apparently work, whatever the order of the different tasks I'm trying. :DGreat!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:It is actually certainly the most important feature to add and remove work from a "pipeline" in my case as it changes depending on the context from second to second.
Meanwhile, you should use your own solution (especially if you also want to insert additional "stages" while the "pipeline" is running).
Good news!Android is currently being targeted: check out (as in have a look at) the latest development release.
I didn't move it because of your analogy, but because the object is move-only semantic. I agree pointers are supposed to be faster (this object is bigger than a pointer as it hold all extra information about the task) but it makes the code simpler to understand so far and wasn't a bottleneck so I'll keep it that way for now. It also avoid me to have to use new, so makes things more predictable. I'll switch to unique pointers if I see it as a performance bottleneck.Quote:
m_task_scheduler.schedule( std::move( m_task ) );I only used move/copy as an analogy. Maybe there are situations where perhaps locality considerations make move the winner, but don't forget about pointers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to use 'recycle_as_safe_continuation' to reuse a task and reschedule it at a later time, you should set the ref-count to at least 2 and when the time comes, you need to explicitly decrement the count, and if the ref-count is 0, then spawn it again.This was already considered, but the problem is that scheduling (re-)spawned tasks is done in the "wrong" order relative to "time" (by the same thread at least), while reliably re-enqueuing requires a polling loop instead of a simple conditional enqueue() (which would otherwise make the task re-spawned or re-enqueued based on execution timing issues): the reference count must be seen to have dropped to 1 already (to avoid re-spawn at the end of execution) before it is set to 0 (just to avoid any possible complaints during debugging, because it is not needed here) and the task then unconditionally enqueued. Because polling can easily lead to problems, I suggested to abandon the idea of recycling the TBB tasks altogether. On the other hand, if a central thread does the timed spawning, the tasks will always be stolen by the workers, so they don't need to be enqueued for scheduling in the right order, and the problem described above goes away. But routine stealing sounds like a bad lifestyle, so then I would wonder how costly it is relative to taking from the enqueue() queue (is it more expensive or not?), and also whether it wouldn't be better to consider affinity (and what that would mean for relative timing?).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page