Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2469 Discussions

Detaching a task from a tbb::parallel_for_each

Uraza
Novice
3,034 Views

Hi!

I am facing an issue when trying to "detach" a TBB task from a tbb::parallel_for_each.

Here is a simplified example showing this behavior: it basically runs a tbb::parallel_for_each, each object A inside the parallel for spawns a task that sleeps for 10 seconds.

  class A {
   public:
    ~A() {
      _task.wait();
    }

    void Run() {
      _task.run(
          []() { std::this_thread::sleep_for(std::chrono::seconds(10)); });
    }

   private:
    tbb::task_group _task;
  };

  std::vector<A> a(5);

  std::cout << "Starting parallel_for_each at " << GetDate() << std::endl;

  tbb::parallel_for_each(a.begin(), a.end(), [](A& a_) {
    a_.Run();
  });

  std::cout << "parallel_for_each done at " << GetDate() << std::endl;

 

The output is like this (second cout is printed 10 seconds after the first one):

Starting parallel_for_each at Thu Jun 24 16:50:53 2021
parallel_for_each done at Thu Jun 24 16:51:03 2021

The tbb::parallel_for_each seems to be waiting for the TBB tasks to finish before giving back control to the main thread.

Is there a way to "detach" the TBB task from the parallel_for_each so that the program can keep running while the tasks from class A keep executing in parallel?

Thanks.

 

Edit: I am using TBB 2019 U3 on Cent OS 6.6

0 Kudos
1 Solution
Pavel_K_Intel1
Employee
2,908 Views

Hello,

I will start answer with original question. 
1) Why does parallel_for_each block until all tasks are completed, meanwhile task run under task_group?

It's happened because when user thread calling blocking API that will wait until work is completed (e.g. parallel_for_each, task_group::wait, etc.), there is no guaranty that user thread will execute only tasks that related to current context(until you use isolation). In your example user thread that has called parallel_for_each takes tasks from task_group and that's why you observe this behavior. 

2) Why does task_arena help in this case? 

When user thread has called parallel_for_each the work has submitted to implicit arena of user thread. And when you run task_group tasks explicitly with task_group::execute the work has submitted to different explicit arena. 
So, in your example we have two arenas implicit arena of user thread and explicit task_arena. 
implicit arena of user thread contains tasks of parallel_for_each.
explicit task_arena contains tasks of task_group.

In oneTBB user thread can't execute tasks from different arenas, that why last code sample works us you expected.

View solution in original post

0 Kudos
11 Replies
SantoshY_Intel
Moderator
3,006 Views

Hi,


Thanks for reaching out to us.


We are working on your issue and we will get back to you soon!


Thanks & Regards,

Santosh


0 Kudos
Mark_L_Intel
Moderator
2,964 Views

Hello,


Unfortunately, no. Please see:

https://docs.oneapi.com/versions/latest/onetbb/tbb_userguide/Cook_Until_Done_parallel_do.html


The instance of parallel_for_each does not terminate until all items have been processed.


0 Kudos
Uraza
Novice
2,959 Views

Hi.
That is what I do not understand.
The sleep is launched through a separate task_group (line 8).
Isn't that supposed to be non blocking, and thus let the parallel_for_each continue without waiting?
Or is there another way to achieve that behavior?

In the real code, the intent is to have some general processing happen in a parallel_for (that will be fun several times throughout the program execution), and non blocking tasks being triggered when some conditions are met.
Thanks for the feedback.

0 Kudos
Mark_L_Intel
Moderator
2,940 Views

Please also look at async Flow Graph nodes


https://link.springer.com/chapter/10.1007/978-1-4842-4398-5_18


0 Kudos
Uraza
Novice
2,936 Views

Yes, exactly, I am trying to avoid the barrier at the end of the parallel_for since the task that is spawned is independent from the rest of the processing done in the parallel_for.

0 Kudos
Uraza
Novice
2,925 Views

I found out that, for some reason, spawning the task from a task_arena gives the behavior that I expect.

The parallel_for gives back control to the main immediately (I see both prints happen without a 10 seconds gap in between), and then the program waits for the task_group to complete.

The documentation did not help me much in understanding why the task_arena worked in that case.

 

 

  class A {
   public:
    ~A() {
      _task.wait();
    }

    void Run() {
      _arena.execute([&]() {
        _task.run(
            []() { std::this_thread::sleep_for(std::chrono::seconds(10)); });
      };
    }

   private:
    static tbb::task_arena _arena;
    tbb::task_group _task;
  };

  std::vector<A> a(5);

  std::cout << "Starting parallel_for_each at " << GetDate() << std::endl;

  tbb::parallel_for_each(a.begin(), a.end(), [](A& a_) {
    a_.Run();
  });

  std::cout << "parallel_for_each done at " << GetDate() << std::endl;

 

0 Kudos
Pavel_K_Intel1
Employee
2,909 Views

Hello,

I will start answer with original question. 
1) Why does parallel_for_each block until all tasks are completed, meanwhile task run under task_group?

It's happened because when user thread calling blocking API that will wait until work is completed (e.g. parallel_for_each, task_group::wait, etc.), there is no guaranty that user thread will execute only tasks that related to current context(until you use isolation). In your example user thread that has called parallel_for_each takes tasks from task_group and that's why you observe this behavior. 

2) Why does task_arena help in this case? 

When user thread has called parallel_for_each the work has submitted to implicit arena of user thread. And when you run task_group tasks explicitly with task_group::execute the work has submitted to different explicit arena. 
So, in your example we have two arenas implicit arena of user thread and explicit task_arena. 
implicit arena of user thread contains tasks of parallel_for_each.
explicit task_arena contains tasks of task_group.

In oneTBB user thread can't execute tasks from different arenas, that why last code sample works us you expected.

0 Kudos
Uraza
Novice
2,903 Views

Thanks for the detailed explanation.

0 Kudos
Uraza
Novice
2,902 Views

Thanks everyone for the support!

0 Kudos
Pavel_K_Intel1
Employee
2,902 Views

Another way two solve this problem.
The documentation about tbb::this_task_arena::enqueue, please check.

 

#include <thread>
#include <vector>
#include <iostream>
#include <chrono>
#include <atomic>

#define TBB_PREVIEW_TASK_GROUP_EXTENSIONS 1
#include <oneapi/tbb/task_group.h>
#include <oneapi/tbb/task_arena.h>
#include <oneapi/tbb/parallel_for_each.h>

class A {
public:
    ~A() {
      _task.wait();
    }

    void Run() {
      _task.run(
          []() { std::this_thread::sleep_for(std::chrono::seconds(10)); });
    }

private:
    tbb::task_group _task;
};

int main() {
    std::vector<A> a(5);
    std::atomic<bool> all_task_submitted{false};

    auto t1 = std::chrono::high_resolution_clock::now();

    tbb::this_task_arena::enqueue([&] {
        tbb::parallel_for_each(a.begin(), a.end(), [](A& a_) {
            a_.Run();
        });
        all_task_submitted = true;
    });

    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "Spawn tasks duration " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " sec" << std::endl;

    // Wait until all tasks are submitted
    while (!all_task_submitted) { std::this_thread::yield(); }

    // A::~A() will wait for all tasks completion
}
0 Kudos
Reply