Hi Alexey

Lucian_T_ · ‎10-10-2017

Hello

I'm having problems shutting down a task arena while there are tasks running into it. The specific problem I have is that I hit a TBB assert while I'm trying to shutdown and an already-existing task is trying to enque other work into the same arena.

The program has the following requirements with respect to the task arena:

the task arena can be shut down while we still have tasks enqueued to it
while shutting down the task arena, we cancel all the tasks from it
while shutting down the task arena, we must wait for all the in-flight tasks to continue
the tasks on the arena can be as complex as they can be (using all TBB's mechanism for creating concurrent work)
no asserts, no crashes

To illustrate the problem, I wrote a small snippet:

tbb::task_scheduler_init defInit;
tbb::task_arena* myTaskArena = new tbb::task_arena(4);
tbb::task_group* myTaskGroup = new tbb::task_group;

static const int numSmallTasks = 10000;
bool executed[numSmallTasks] = { 0 };

// Enqueue a big-task in the arena
myTaskArena->execute([&] {
    myTaskGroup->run([&] {

        // This task will use a parallel_for to spawn a lot of other small tasks
        tbb::parallel_for(tbb::blocked_range<int>(0, numSmallTasks, 1), [&] (const tbb::blocked_range<int>& r) {
            for (int i = r.begin(); i < r.end(); i++)
            {
                // enqueue a small task into the arena
                myTaskArena->execute([&] {
                    myTaskGroup->run([&] {
                        // really small task
                        executed = true;
                    });
                });
            }
        });

        // end big-task
    });
});
sleep(10); // make sure the big-task started to execute

// other tasks are continuously enqueued into my arena

// At some point, we want to shutdown the arena

// First, cancel all the in-flight tasks
myTaskGroup->cancel();
// Now, wait for the existing tasks to complete (we do the wait inside the arena)
myTaskArena->execute([&] {
    myTaskGroup->wait();
});

// We are done
delete myTaskArena;
delete myTaskGroup;

// expected behavior: not all the small tasks are executed
int countExecuted = 0;
for ( int i=0; i<numSmallTasks; i++ )
{
    if ( executed )
        countExecuted++;
}
printf("%d < %d\n", countExecuted, numSmallTasks);

After I run the above code, I hit a TBB assert:

File: d:\myrepo\tbb\src\tbb\custom_scheduler.h
Line: 706
Expression: !is_worker() || !CancellationInfoPresent(*my_dummy_task)

(I'm using TBB 2017 update 1, interface version 9101)

Looking at the TBB code, I don't see how this problem can be avoided. Whenever I try to cancel tasks I will hit this assert. And I do need to cancel, to ensure that the shutdown process is as fast as possible.

Am I missing something? Is there another way to make this work?

Or is this a TBB bug?

Thank you very much

jimdempseyatthecove · ‎10-10-2017

At lines 11 and 16 try adding:

if(myTaskGroup.is_canceling()) return;

Jim Dempsey

Lucian_T_ · ‎10-10-2017

Tried this. Still doesn't work.

The assert is occurring less often, but it still occurs. The reason is that parallel_for can spawn tasks "under the hood", and there is no way for me to inject the cancellation check inside parallel_for.

(Other reason: it would be a race condition: if passed successfully, and as soon as that happens cancellation occurs, just before the actual enqueue)

Thank you very much

Alexey-Kukanov · ‎10-13-2017

Hi Lucian,

I have tested a slightly modified version of your sample with recent TBB and two compilers: VS2015 and gcc 6.3. In both cases the sample worked just as expected: I ran it for 1000 times in a row and got no failures.

The changes I made are not related to task_arena, task_group or cancellation:

    tbb::task_scheduler_init defInit;
    tbb::task_arena* myTaskArena = new tbb::task_arena(4);
    tbb::task_group* myTaskGroup = new tbb::task_group;

    static const int numSmallTasks = 10000;
    static const int repeat = 1;
    bool* executed = new bool[numSmallTasks*repeat];
    memset(executed,0,numSmallTasks*repeat);

    for (int k = 0; k < repeat; ++k) {
        // Enqueue a big-task in the arena
        myTaskArena->execute([&,k] {
            myTaskGroup->run([&,k] {
                // This task will use a parallel_for to spawn a lot of other small tasks
                tbb::parallel_for(tbb::blocked_range<int>(0, numSmallTasks, 1),
                [&,k](const tbb::blocked_range<int>& r) {
                    for (int i = r.begin(); i < r.end(); i++)
                    {
                        // if (myTaskGroup->is_canceling()) return;
                        myTaskGroup->run([&,k,i] {
                            // really small task
                            executed[k*numSmallTasks +i] = true;
                        });
                    }
                });
                // end big-task
            });
        });
    }
    Sleep(10); // make sure the big-task started to execute

    // At some point, we want to shutdown the arena
    // First, cancel all the in-flight tasks
    myTaskGroup->cancel();
    // Now, wait for the existing tasks to complete (we do the wait inside the arena)
    myTaskArena->execute([&] {
        myTaskGroup->wait();
    });

    // We are done
    delete myTaskArena;
    delete myTaskGroup;

    // expected behavior: not all the small tasks are executed
    int countExecuted = 0;
    for (int i = 0; i < numSmallTasks*repeat; i++)
    {
        if (executed)
            countExecuted++;
    }
    printf("%d < %d\n", countExecuted, numSmallTasks*repeat);
    delete[] executed;

Alexey-Kukanov · ‎10-13-2017

Do you build TBB on your own, or use a pre-built version? If you build on your own, what is the platform, and what is your command(s) to build TBB?

Lucian_T_ · ‎10-13-2017

Hi Alexey

I have a custom TBB build (we added support for WinCE, and some profiling code), with asserts enabled, and tasks group context enabled.

I cannot find the reason for which your code would work, and mine not. Do you have asserts enabled in your build? (I'll make sure I'll also play around with your code).

Thank you very much!

Alexey-Kukanov · ‎10-13-2017

Yes, I used TBB debug builds with assertions enabled.

Lucian_T_ · ‎10-16-2017

Hi Alexey. Your answer made me clarify the problem I was running into. My above code was containing an memory error, that lead me to believe that the simplified test actually reproduced the "original" problem.

In my original (non-simplified) code, I was trying to pass the cancel directly to the context of the task_arena. (yes, the one that's protected). That context behaves differently from the user-defined contexts. The assert in custom_scheduler.h was checking (indirectly) whether cancellation was set over the context of the entire task_arena (it actually checks the context of the root my_dummy_task hasn't been cancelled; and this is set to be the context of my_context from the task_arena).

I know that I shouldn't try to cancel the top level task arena context, but I can cancel any children contexts. Than means that I have to run every operations through a custom task_group (like in the example above).

Thank you very much.

Lucian_T_ · ‎10-16-2017

Hi again

I realize now some of the downsides of the approach of coupling task_arena/task_group for proper cancellation:

I can't enqueue anymore in my arena. This is because, I always need to go through the task_group, and task_group doesn't allow me any enquing functionality (only spawning)
I can't associate priorities with my tasks anymore. Again, a consequence of always going through the task_group.
Each time I need to add a new task in the arena, I have to go through this task_group; this means that I have to change all the client code that uses only a task_arena, to also use a task_group

Is there a way for me to be able to cancel all tasks and don't go through a task_group?

Thank you very much

Alexey-Kukanov · ‎10-17-2017

Hi Lucian,

Let me first understand what you do or want to be able to do (and please expand the list if I miss important things):

Use an explicit task_arena and enqueue some "big" tasks to it;
These tasks might use TBB algorithms as well as might create independent "small" tasks (as in the reproducer)
You want to cancel at once everything that was submitted to the arena.
You want also to be able to set priorities.

I also have some questions to that:

Why do you prefer to use task_arena::enqueue()? Is it solely because of asynchronous execution, or there are other considerations?
Is there a benefit from submitting innermost small tasks, as opposed to direct execution by a current thread? And do you want to go through the task_arena interface for these tasks because of certain design limitations, or just because you think of it as an easy way to submit an independent task?
With regard to priorities: do you want to set/change the same priority for everything submitted into the arena, or do you want to prioritize some tasks over the rest?

Overall, I tend to think that possibly the right solution for you would be to inherit task_arena and customize it to add some of the task_group properties/behavior. In TBB these two classes, though somewhat similar semantically, were designed to serve different purposes and have separate responsibilities. But in your use case it appears you consider task_arena as one big task_group, so having an interface that gives the benefits of both, and possibly also addresses some shortcomings, could be most appropriate.

Lucian_T_ · ‎10-17-2017

Hi Alexey

A little bit of background: we have a large application that we want to migrate from a thread+locks model into a task-based system. It's more important for us to have a better grip on latency than on throughput. Therefore, we need a partitioning the worker threads in our application: we want to ensure multiple components make progress at the same time, and not that one component is blocking another component because it creates more tasks. Therefore, using a task_arena is a must for us. Then, we want to enqueue tasks with different priorities, to ensure that some tasks are (almost) always executed in front of others. A final must-have is the ability to cleanly shut down different task_arenas as we are shutting down different components (cancel all the enqueued tasks, wait for in-flight to execute, and only then destroy the arena object).

The example above was just trying to excersize the creation of a lot of tasks. We do have "big tasks" that have parrallel_for inside them, but I agree with you, enqueing other tasks from parallel for doesn't necessarily make sense (but I wouldn't necessarily exclude the possibility of doing this -- even if accidentally)

Responses on the above points:

When I wrote about the loosing the possibility of enquing, I had in mind the big distinction between enquing and spawning tasks (one adds the tasks to the end of the queue, one to the beginning of the worker queue). I believe now that this shouldn't matter; I still need to check, but I guess that even if we spawn tasks with task_group, we would still tend to execute them in order that we spawn them.
As explained above, I need to partition my worker threads. That's why I would always want to go through a task_arena.
The priorities would be assigned mostly per type of tasks. We would "add" some high-prio tasks, and some low-prio tasks, and we need to have the high-prio tasks executed (as much as possible) before the low-prio tasks. Also, I would like to benefit from the inheritance of priorities through task_group_context. If a high-prio tasks spawns a lot of child tasks (for example in a parallel_for), I would want all the child tasks to be executed with high-prio.

It's funny that you say to inherit from task_arena. That's exactly what I initially did (see also the code I posted at https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/721135). If I directly cancel the task_group_context inside the task_arena I get the assertion failure mentioned above whenever I try to shutdown my task_arena whenever I run a tasks that spawns new tasks (for example through parallel_for).

The approach I'm currently taking is to implement a class similar to task_group, in which I expose the inner task_group_context. After all, implementing a task_group class can be done with the low-level primitives that TBB exposes. Managing my own task_group_context means that I can set different priorities for my task_group-like class. Then, in my class that wraps the task_arena, I also create 3 of these custom-made task_groups, one for each priorities. I can then write the old "enqueue(task, prio)" interface in terms of this new task_arena + task_group abstractions.

This new approach that I'm taking is not using any protected interface from TBB, so it should work. I still feel like TBB should expose a little bit more functionality with regards to task_arena. Maybe there is another way I can do this, and I just don't know it.

Thank you very much

Alexey-Kukanov · ‎10-18-2017

What you plan to do makes sense to me. I have a few more comments, just in case those can be useful.

we need a partitioning the worker threads in our application: we want to ensure multiple components make progress at the same time, and not that one component is blocking another component because it creates more tasks

Enqueued tasks (including jobs submitted via task_arena::enqueue) were designed for forward progress/lack of starvation. So might be enqueueing jobs to a single arena is sufficient; but in case you want to have control over the distribution of cores between components, arenas are the way to go.

I still need to check, but I guess that even if we spawn tasks with task_group, we would still tend to execute them in order that we spawn them.

For worker threads, yes - they steal task_group tasks in FIFO order. For the thread that calls task_group.run() and then task_group.wait(), processing is done in the reverse order of run() calls (LIFO).

I need to partition my worker threads. That's why I would always want to go through a task_arena.

In your sample code, the inner execute() is done to the same arena in which the task_group already runs; this is redundant, just submitting to the task_group is enough. If however you want to submit tasks from one arena to another one, then you do need to use execute().

we need to have the high-prio tasks executed (as much as possible) before the low-prio tasks.

Once a single high-priority task is detected, all tasks of lower priority that have not yet started will be postponed until all higher-priority tasks are done. You should definitely get what you want, but priority changes are pretty expensive; if you use those a lot, performance might suffer.

I still feel like TBB should expose a little bit more functionality with regards to task_arena.

Or maybe task_group. Exposing and allowing to set the context for task_group is in the feature list, though not yet in plans. We can perhaps consider extending task_arena::enqueue() with an explicit context parameter or think of other ways to make enqueued jobs cancelable. Waiting for work completion in the arena was considered, and we even tried to add it, but found it being semantically ambiguous and dangerous: waiting just for task pools to be empty is not enough, as some tasks might still be executed and potentially producing more work, while waiting for all threads to leave the arena could deadlock if called from inside the arena. So using task_arena jointly with a task_group is the best way to ensure work completion. We are open for other suggestions for task_arena improvements; merging it with the task_group is unlikely, but if there is something that could make your hybrid implementation easier, please let us know.

Lucian_T_ · ‎10-18-2017

Hi Alexey

Thank you very much for your response. It further clarifies things for me.

Alexey Kukanov (Intel) wrote:

Or maybe task_group. Exposing and allowing to set the context for task_group is in the feature list, though not yet in plans. We can perhaps consider extending task_arena::enqueue() with an explicit context parameter or think of other ways to make enqueued jobs cancelable. Waiting for work completion in the arena was considered, and we even tried to add it, but found it being semantically ambiguous and dangerous: waiting just for task pools to be empty is not enough, as some tasks might still be executed and potentially producing more work, while waiting for all threads to leave the arena could deadlock if called from inside the arena. So using task_arena jointly with a task_group is the best way to ensure work completion. We are open for other suggestions for task_arena improvements; merging it with the task_group is unlikely, but if there is something that could make your hybrid implementation easier, please let us know.

I don't have a lot of experience with TBB, but the way I see it, a task context is an essential piece of controlling how tasks get executed. So, adding it as a parameter to task_arena::enqueue, exposing it for task_group (and, why not, for task_arena too) make a lot of sense to me. It would provide users a little bit more control.

Thank you very much

Lucian_T_ · ‎10-19-2017

Some further problems related to task_arena/task_group combination are posted at https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/747250.

For the time being I'll revert to the propose solution of deriving from task_arena, and calling the cancel on the main context of the arena.

Alexey-Kukanov · ‎10-19-2017

For the time being I'll revert to the propose solution of deriving from task_arena, and calling the cancel on the main context of the arena.

Cancelling through the arena context will for now have problems at least in the case of nested parallelism within the enqueued tasks. I recommend to create a separate context in the derived class, and reimplement the enqueue method to use that separate context (see enqueue_impl for the example).

Lucian_T_ · ‎10-23-2017

Why would canceling the main arena context have problems with the nested parallelism in the enqueued tasks? At least in my example big-task -> parallel_for -> small-task seems to work ok (except asserts from TBB)

Is there something fundamentally different for canceling task_arena than canceling a regular context?

Thank you very much

Alexei_K_Intel · ‎10-25-2017

The situation when the main task_arena context is cancelled leads to the following issues:

The task_arena is transferred in a possibly unexpected state. Any other tasks enqueued to the task_arena will be cancelled automatically and there is no way to reset this state.
If we introduce the way how to reset this state it is unclear how to report from the enqueued task that the cancellation was requested (e.g exception is thrown) because no one waits for its completion. If task_group is used to wait for enqueued tasks then the cancelation/exception will be propagated to the waiting thread through the context of the task_group.

Regards,
Alex

Lucian_T_ · ‎10-25-2017

Hi Alex

Both of these are not problems in my case. I just want to cancel everything before the shutdown of the component and delete the task_arena object itself. Whenever I reach the point of needing to cancel the top-level context of the arena, I'm not anymore interested in the completion status of the tasks, and having any enqueued tasks automatically canceled is perfect.

What do you mean by "task_arena is transferred in a possible unexpected state"? Is it anything more than the two given points above that I should worry about? (except 2 asserts that occasionally fail in TBB)

Thank you very much
LucTeo

Shutting down task arena while enqueuing tasks