Need help with pipeline deadlock - Page 3

benz__alexander · ‎11-11-2008

Hi everyone,

I have a problem concerning proper termination of pipelines and I'm running out of ideas on where the problem might be.

The setup is as follows. I have one class that handles loading and decompressing of video streams. This process is realised via the TBB pipeline mechanism. Once the start() method of my class is called, it begins loading and decompressing frames and delivers them to a follow-up component.

Calling start on the class spawns a thread which initialized the scheduler and calls pipeline.run() (which blocks until the pipeline is done). Calling stop on the class tells the first tbb::filter to stop loading frames and to return with NULL (to terminate the pipeline). There are multiple instances of this class running (with different input streams).

The problem I have is that once in a while when calling stop, the first filter returns with NULL but the pipeline is not stopped which results in the main thread method (the one that called pipeline::run()) not returning.

Inspecting the threads, there are a couple of tasks waiting :

tbb::internal::GenericScheduler::wait_while_pool_is_empty ()

but none of the threads hangs in the instance that blocks.

Any help is apreciated,

- ALex

Dmitry_Vyukov · ‎12-04-2008

Quoting - Arch Robison (Intel)

We won't know the answer to that until we try it out. If the stall issue turns out to be significant, we may have to create >P software threads on a system with P hardware threads, and somehow block/unblock the software threads so that there are only P running at at time. Or maybe it will turn out that a little occasional oversubscription is not a big problem. There's much to be learned by implementing and trying it out.

Context switches are indeed cheap now. So limited oversubscription can be the answer. But there are other problems - most notably per thread memory cosnumption in scalable alloc (it seems that then you nevertheless will have to add something like scalable_alloc_flush()), and other per-thread resources (stack, maybe some user per-thread resources).

Dmitry_Vyukov · ‎12-04-2008

Quoting - Dmitriy Vyukov

A way to implementthe policyis to prevent threads from simultaneously having two employers ("moonlighting"). The formal rules are:

A thread is either employed by a thread (possibly itself) or unemployed.

A master thread is always employed by itself.

A worker thread is unemployed if it is not running any tasks and its task pool is empty. Otherwise a worker is employed.

A worker can change from unemployed to employed by stealing work. The employer of the thief is the employer of the victim. This rule also covers tasks obtained via advertisements in a thread's mailbox.

For two distinct employers X and Y,a thief employed by X cannot steal a task from avictim employed by Y.

And why not just add additional check into inner-most master-thread loop? Now it looks like:

void master_thread_loop()
{
for (;;)

{
while (task* t = pop_from_my_stack())
{
process(t);
}

if (parent_task->reference_counter == 0)

break;

steal_and_block_and_etc();

}
}

Loop must be modified this way:

void master_thread_loop()
{
for (;;)

{
while (task* t = pop_from_my_stack())
{
process(t);

if (parent_task->reference_counter == 0)

break;

}

if (parent_task->reference_counter == 0)

break;

steal_and_block_and_etc();

}

if (my_stack_is_not_empty)

transfer_my_stack_to_somewhere();
}

I believe this will also fix the problem. But... hmmm... this looks simpler, and additional overheads in inner-most loop are negligible. And there will be no induced stalls, and no oversubscription, and no additional memory consumption in scalable alloc.

Master thread is just free to exit the game as soon as it wants.

What do you think?

IIRC I've seen something similar in Doug Lea's Java Fork/Join Framework (which is also Cilk clone).

Dmitry_Vyukov · ‎12-04-2008

Well, worker threads can (must?) also check parent tasks' counter more frequently, but only in recursive scheduler loops. And master threads must transfer their uncompleted tasks only in non-recursive scheduler loops. So here is patched version:

void scheduler_loop()
{
for (;;)

{
while (task* t = pop_from_my_stack())
{
process(t);

if ((recursive || master) && parent_task->reference_counter == 0)

break;

}

if (parent_task->reference_counter == 0)

break;

steal_and_block_and_etc();

}

if (recursive && my_stack_is_not_empty)

transfer_my_stack_to_somewhere();
}

I've not worker out all the details, but I beleive the idea is working.

RafSchietekat · ‎12-05-2008

"What do you think?" I think that I don't understand what you're trying to do, but maybe I'm the only one? (I am a bit distracted by very strange problems on the way to floating-point atomics, where I only recently managed to solve a crash in optimised builds by using unions instead of reinterpret_cast(), even before any floating-point type was in sight, so that g++ version I'm using is looking mighty suspicious to me now.)