Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

serial_in_order task_group

nagy
New Contributor I
326 Views
I have some work that I would like to remove from a thread. This work must be done "serial_in_order".
One solution would be to just simply create a thread and have it poll from a concurrent_queue. However this way I cant take advantage of tasks.
Another solution would be to create a pipeline. But that would require alot of code relative to what I want to do. In this case I would also have to create another thread and have thepipelineinput filter poll from a concurrent_queue?
Right now I'm going for the ordinary task_group but with some extra code to guarantee order.
Here is an example i quickly wrote together, untested, more todemonstratewhat i want.
template
class ordered_task_result // could use std::future
{
ordered_task_result()
{
completed = false;
}
tbb::atomic completed;
T result;
};
typedef std::shared_ptrordered_task_result_ptr;
tbb::concurrent_bounded_queue taskResults_; // doesnt have to be concurrent
ordered_task_result_ptr front_;
/* Parent thread */
void Step()
{
ordered_task_result_ptrpFrameResult = std::make_shared();
taskResults_.push(pFrameResult);
taskGroup_.run([=]
{
/* Do work */
pFrameResult->completed = true;
SendCompletedFrames();
});
}
void SendCompletedFrames()
{
/* Invoke on parent thread */DoSendCompletedFrames();-
}
void DoSendCompletedFrames()
{
while(front_== nullptr || front_->completed)
{
if(front_ == nullptr && !taskResults_.try_pop(front_))
return;
else
{
SendFrame(front_->result);
front_.reset();
}
}
}
What I'd like to do however is something like.
task_group taskGroup_(serial_in_order);
Suggestions?
EDIT: Fixed an error
0 Kudos
1 Solution
Alexey-Kukanov
Employee
326 Views
Well, I think there might be a reasonable TBB solution, though not as easy to use as you wish.

Let's start with what seems most natural solution for the problem. The main thread should produce some work, some other thread(s) should process the work in serial order, and there are several independent streams of work. So the solution seems simple - each work stream is processed by its own thread, and data are passed through the queue to guarantee in-order processing. It could be either tbb::concurrent_queue or a serial queue plus a lock that protect operations with the queue. We havea classical producer-consumer pattern, with a limited degree of parallelism achieved by multiple independent streams.
What is the problem however? I think it's oversubscription; if the number of work streams exceeds the number of cores, threads processing the streams will compete for CPU. And what is the solution? Yes, create a pool of threads, as many as there are cores.

Each thread within the pool should be able to process any work stream. When a thread becomes idle, it chooses any non-empty workstream, takes an item from its queue,and starts processing. While in-order semantics is kept by using queues as before, serial semantics should be re-ensured. And blocking the whole queue during processing would be bad because the work producer would get blocked as well. A better solution would be to usea binary semaphore that allows only single thread in processing stage for a given work stream. The semaphore should have a non-blocking try-acquire semantics however, to avoid workers get blocked while the job exists in another stream. Combining all pieces together, we have: a pool of worker threads, each worker polls all work streams in round-robin fashion; if it fails to acquire the semaphore for the stream it immediately moves to the next one, otherwise pops a work item (if any) from the queue, processes it, releases the semaphore, and goes to checkthe next stream.
What is the problem in this case? Idle spinning. Try-acquire semantics is required, but then workers should known to stop spinning when there is no job anywhere. There could be different solutions for the problem, but since we are in TBB forum, let me choose tasks :)

Indeed, if there are just as many tasks created as there are work items, no tasks means no work, and worker threads can stop spinning and go sleep until a supplied piece of work wakes them up. The mechanisms for thatalready exist in TBB. Additionally, the task can carry information about which stream has the work, so no need for round robin. So the producer thread puts an item into a work queue (which is still necessary to ensure in-order semantics) and spawns a task to process that item. A worker thread takes the task and executes it. The first thing a task should do is still checking the semaphore however, to preserve serial semantics (because there might be many tasks in flight that process the same work stream). But now what does it do if the semaphore is closed for the moment? Probably this task should be re-spawn to execute later, and another task should be taken, ideally for another work stream.
Here the next issue lies.A re-spawned TBB task is put into the local task pool of the current worker; and the very first thing an idle worker would do is to take a task from its own pool - i.e. the same task that it just postponed for later.

With TBB 2.2, the only solution I see to thislast problem is to step back a bit, and make tasks agnostic of what stream they should process. Instead, each task would poll all work streams until a piece of work is found (and there should be one, because there should be as many work items as tasks). With the next TBB version (for which we have a pre-release in open-source downloads), there is another solution: a task can be enqueue()'d, instead of spawn()'d. The difference is that enqueued tasks are put into a global container, instead of the local pool. And the container preserves some fairness (though has no FIFO guarantees), so older tasks tend to be taken earlier than newly added ones. Exactly what is required!
Remember thatnon-blocking semaphores and work queues are still there, to preserve serial-in-order semantics you need. Tasks don't give you that, as Andrey mentioned. Oh, and I should say that any TBB mutex would work just fine as the non-blocking binary semaphore, because they all have try_acquire method.

What's the next problem? Well, I think it would be idle spinning again, this time due to limited degree of parallelism. Imagine there are idle workers, and there are tasks, but all non-empty work streams are already acquired. Due to the serial semantics, free workers would shuffle available tasks until some previous work is completed and the next item can be taken from that queue. It would be good to have threads sleep in this case too; and that should be possible but I will stop for now :) It's probably worth a blog entry already; might be I will publish it later as such.

View solution in original post

0 Kudos
5 Replies
Andrey_Marochko
New Contributor III
326 Views
Since you need to execute offloaded work strictly in-order, I do not see how can you benefit from TBB tasks. Tasks (as well as task_groups) are means to enable concurrent execution, which is evidently not your case. Thus your first option with tbb_thread communicating with the main one via concurrent_queue looks good enough for your scenario.
0 Kudos
nagy
New Contributor I
326 Views
I might be misinformed. But since the tasks should be in order they could still be executed on different threads in the tbb taskpool and give load balancing + it would cause less context switches?

I will have probably 6-7 of these objects running. Creating 6-7 extra threads just for this compared to running all on a taskpool is not very good as far as i know...? Doesn't sound very cache friendly to me either.
Please correct me if I am wrong.
Thank you
0 Kudos
Andrey_Marochko
New Contributor III
326 Views
Spawned TBB tasks are not ordered in any way. This is why I said that using them does not suit your case.

> I will have probably 6-7 of these objects running

If this means that you have 6-7 streams of in-order work chunks, then again using tasks can be even harmful in case you are thinking about 6-7 tasks each polling concurrent_queue of its own. The problem is that TBB does not ensure a particular level of concurrency, and if the TBB task pool was initialized with say 3 worker threads only 3 of you work streams will be making progress, while other ones will be stalled indefinitely. Besides waiting in TBB tasks is harmful by itself, as it prevents TBB workers from doing other work that may be available at the moment.

Thus you either need a separate thread per your work stream (if work chunks from different streams can be processed concurrently), or a single thread that polls several concurrent queues in a round-robbin fashion using non-blocking try_pop method.

At last, if your work chunks are large enough you can use TBB parallel algorithms to process them.
0 Kudos
Alexey-Kukanov
Employee
327 Views
Well, I think there might be a reasonable TBB solution, though not as easy to use as you wish.

Let's start with what seems most natural solution for the problem. The main thread should produce some work, some other thread(s) should process the work in serial order, and there are several independent streams of work. So the solution seems simple - each work stream is processed by its own thread, and data are passed through the queue to guarantee in-order processing. It could be either tbb::concurrent_queue or a serial queue plus a lock that protect operations with the queue. We havea classical producer-consumer pattern, with a limited degree of parallelism achieved by multiple independent streams.
What is the problem however? I think it's oversubscription; if the number of work streams exceeds the number of cores, threads processing the streams will compete for CPU. And what is the solution? Yes, create a pool of threads, as many as there are cores.

Each thread within the pool should be able to process any work stream. When a thread becomes idle, it chooses any non-empty workstream, takes an item from its queue,and starts processing. While in-order semantics is kept by using queues as before, serial semantics should be re-ensured. And blocking the whole queue during processing would be bad because the work producer would get blocked as well. A better solution would be to usea binary semaphore that allows only single thread in processing stage for a given work stream. The semaphore should have a non-blocking try-acquire semantics however, to avoid workers get blocked while the job exists in another stream. Combining all pieces together, we have: a pool of worker threads, each worker polls all work streams in round-robin fashion; if it fails to acquire the semaphore for the stream it immediately moves to the next one, otherwise pops a work item (if any) from the queue, processes it, releases the semaphore, and goes to checkthe next stream.
What is the problem in this case? Idle spinning. Try-acquire semantics is required, but then workers should known to stop spinning when there is no job anywhere. There could be different solutions for the problem, but since we are in TBB forum, let me choose tasks :)

Indeed, if there are just as many tasks created as there are work items, no tasks means no work, and worker threads can stop spinning and go sleep until a supplied piece of work wakes them up. The mechanisms for thatalready exist in TBB. Additionally, the task can carry information about which stream has the work, so no need for round robin. So the producer thread puts an item into a work queue (which is still necessary to ensure in-order semantics) and spawns a task to process that item. A worker thread takes the task and executes it. The first thing a task should do is still checking the semaphore however, to preserve serial semantics (because there might be many tasks in flight that process the same work stream). But now what does it do if the semaphore is closed for the moment? Probably this task should be re-spawn to execute later, and another task should be taken, ideally for another work stream.
Here the next issue lies.A re-spawned TBB task is put into the local task pool of the current worker; and the very first thing an idle worker would do is to take a task from its own pool - i.e. the same task that it just postponed for later.

With TBB 2.2, the only solution I see to thislast problem is to step back a bit, and make tasks agnostic of what stream they should process. Instead, each task would poll all work streams until a piece of work is found (and there should be one, because there should be as many work items as tasks). With the next TBB version (for which we have a pre-release in open-source downloads), there is another solution: a task can be enqueue()'d, instead of spawn()'d. The difference is that enqueued tasks are put into a global container, instead of the local pool. And the container preserves some fairness (though has no FIFO guarantees), so older tasks tend to be taken earlier than newly added ones. Exactly what is required!
Remember thatnon-blocking semaphores and work queues are still there, to preserve serial-in-order semantics you need. Tasks don't give you that, as Andrey mentioned. Oh, and I should say that any TBB mutex would work just fine as the non-blocking binary semaphore, because they all have try_acquire method.

What's the next problem? Well, I think it would be idle spinning again, this time due to limited degree of parallelism. Imagine there are idle workers, and there are tasks, but all non-empty work streams are already acquired. Due to the serial semantics, free workers would shuffle available tasks until some previous work is completed and the next item can be taken from that queue. It would be good to have threads sleep in this case too; and that should be possible but I will stop for now :) It's probably worth a blog entry already; might be I will publish it later as such.
0 Kudos
nagy
New Contributor I
326 Views
awned TBB tasks are not ordered in any way. This is why I said that using them does not suit your case.

> I will have probably 6-7 of these objects running

If this means that you have 6-7 streams of in-order work chunks, then again using tasks can be even harmful in case you are thinking about 6-7 tasks each polling concurrent_queue of its own. The problem is that TBB does not ensure a particular level of concurrency, and if the TBB task pool was initialized with say 3 worker threads only 3 of you work streams will be making progress, while other ones will be stalled indefinitely. Besides waiting in TBB tasks is harmful by itself, as it prevents TBB workers from doing other work that may be available at the moment.

Thus you either need a separate thread per your work stream (if work chunks from different streams can be processed concurrently), or a single thread that polls several concurrent queues in a round-robbin fashion using non-blocking try_pop method.

At last, if your work chunks are large enough you can use TBB parallel algorithms to process them.
None of the "tasks" are polling or blocking. They perform work they were assigned when spawned and then put their result in a queue. But the result must come in the same order as the tasks were spawned.
0 Kudos
Reply