Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
2417 Discussions

Detaching a task from a tbb::parallel_for_each

Uraza
Novice
1,236 Views

Hi!

I am facing an issue when trying to "detach" a TBB task from a tbb::parallel_for_each.

Here is a simplified example showing this behavior: it basically runs a tbb::parallel_for_each, each object A inside the parallel for spawns a task that sleeps for 10 seconds.

  class A {
   public:
    ~A() {
      _task.wait();
    }

    void Run() {
      _task.run(
          []() { std::this_thread::sleep_for(std::chrono::seconds(10)); });
    }

   private:
    tbb::task_group _task;
  };

  std::vector<A> a(5);

  std::cout << "Starting parallel_for_each at " << GetDate() << std::endl;

  tbb::parallel_for_each(a.begin(), a.end(), [](A& a_) {
    a_.Run();
  });

  std::cout << "parallel_for_each done at " << GetDate() << std::endl;

 

The output is like this (second cout is printed 10 seconds after the first one):

Starting parallel_for_each at Thu Jun 24 16:50:53 2021
parallel_for_each done at Thu Jun 24 16:51:03 2021

The tbb::parallel_for_each seems to be waiting for the TBB tasks to finish before giving back control to the main thread.

Is there a way to "detach" the TBB task from the parallel_for_each so that the program can keep running while the tasks from class A keep executing in parallel?

Thanks.

 

Edit: I am using TBB 2019 U3 on Cent OS 6.6

0 Kudos
1 Solution
Pavel_K_Intel1
Employee
1,110 Views

Hello,

I will start answer with original question. 
1) Why does parallel_for_each block until all tasks are completed, meanwhile task run under task_group?

It's happened because when user thread calling blocking API that will wait until work is completed (e.g. parallel_for_each, task_group::wait, etc.), there is no guaranty that user thread will execute only tasks that related to current context(until you use isolation). In your example user thread that has called parallel_for_each takes tasks from task_group and that's why you observe this behavior. 

2) Why does task_arena help in this case? 

When user thread has called parallel_for_each the work has submitted to implicit arena of user thread. And when you run task_group tasks explicitly with task_group::execute the work has submitted to different explicit arena. 
So, in your example we have two arenas implicit arena of user thread and explicit task_arena. 
implicit arena of user thread contains tasks of parallel_for_each.
explicit task_arena contains tasks of task_group.

In oneTBB user thread can't execute tasks from different arenas, that why last code sample works us you expected.

View solution in original post

11 Replies
SantoshY_Intel
Moderator
1,208 Views

Hi,


Thanks for reaching out to us.


We are working on your issue and we will get back to you soon!


Thanks & Regards,

Santosh


Mark_L_Intel
Employee
1,166 Views

Hello,


Unfortunately, no. Please see:

https://docs.oneapi.com/versions/latest/onetbb/tbb_userguide/Cook_Until_Done_parallel_do.html


The instance of parallel_for_each does not terminate until all items have been processed.


Uraza
Novice
1,161 Views

Hi.
That is what I do not understand.
The sleep is launched through a separate task_group (line 8).
Isn't that supposed to be non blocking, and thus let the parallel_for_each continue without waiting?
Or is there another way to achieve that behavior?

In the real code, the intent is to have some general processing happen in a parallel_for (that will be fun several times throughout the program execution), and non blocking tasks being triggered when some conditions are met.
Thanks for the feedback.

Mark_L_Intel
Employee
1,145 Views

Hello,

 

It should be a barrier at the end of parallel_for_each. Are you trying to avoid this?

 

Also, have you looked at TBB Flow Graph, e.g,

 

https://software.intel.com/content/www/us/en/develop/documentation/tbb-documentation/top/intel-threa...

 

However, even in case of TBB Flow Graph, it is recommended to use wait_for_all()

 

https://software.intel.com/content/www/us/en/develop/documentation/tbb-documentation/top/intel-threa...

 

 

 

 

Mark_L_Intel
Employee
1,142 Views

Please also look at async Flow Graph nodes


https://link.springer.com/chapter/10.1007/978-1-4842-4398-5_18


Uraza
Novice
1,138 Views

Yes, exactly, I am trying to avoid the barrier at the end of the parallel_for since the task that is spawned is independent from the rest of the processing done in the parallel_for.

Uraza
Novice
1,127 Views

I found out that, for some reason, spawning the task from a task_arena gives the behavior that I expect.

The parallel_for gives back control to the main immediately (I see both prints happen without a 10 seconds gap in between), and then the program waits for the task_group to complete.

The documentation did not help me much in understanding why the task_arena worked in that case.

 

 

  class A {
   public:
    ~A() {
      _task.wait();
    }

    void Run() {
      _arena.execute([&]() {
        _task.run(
            []() { std::this_thread::sleep_for(std::chrono::seconds(10)); });
      };
    }

   private:
    static tbb::task_arena _arena;
    tbb::task_group _task;
  };

  std::vector<A> a(5);

  std::cout << "Starting parallel_for_each at " << GetDate() << std::endl;

  tbb::parallel_for_each(a.begin(), a.end(), [](A& a_) {
    a_.Run();
  });

  std::cout << "parallel_for_each done at " << GetDate() << std::endl;

 

Pavel_K_Intel1
Employee
1,111 Views

Hello,

I will start answer with original question. 
1) Why does parallel_for_each block until all tasks are completed, meanwhile task run under task_group?

It's happened because when user thread calling blocking API that will wait until work is completed (e.g. parallel_for_each, task_group::wait, etc.), there is no guaranty that user thread will execute only tasks that related to current context(until you use isolation). In your example user thread that has called parallel_for_each takes tasks from task_group and that's why you observe this behavior. 

2) Why does task_arena help in this case? 

When user thread has called parallel_for_each the work has submitted to implicit arena of user thread. And when you run task_group tasks explicitly with task_group::execute the work has submitted to different explicit arena. 
So, in your example we have two arenas implicit arena of user thread and explicit task_arena. 
implicit arena of user thread contains tasks of parallel_for_each.
explicit task_arena contains tasks of task_group.

In oneTBB user thread can't execute tasks from different arenas, that why last code sample works us you expected.

Uraza
Novice
1,105 Views

Thanks for the detailed explanation.

Uraza
Novice
1,104 Views

Thanks everyone for the support!

Pavel_K_Intel1
Employee
1,104 Views

Another way two solve this problem.
The documentation about tbb::this_task_arena::enqueue, please check.

 

#include <thread>
#include <vector>
#include <iostream>
#include <chrono>
#include <atomic>

#define TBB_PREVIEW_TASK_GROUP_EXTENSIONS 1
#include <oneapi/tbb/task_group.h>
#include <oneapi/tbb/task_arena.h>
#include <oneapi/tbb/parallel_for_each.h>

class A {
public:
    ~A() {
      _task.wait();
    }

    void Run() {
      _task.run(
          []() { std::this_thread::sleep_for(std::chrono::seconds(10)); });
    }

private:
    tbb::task_group _task;
};

int main() {
    std::vector<A> a(5);
    std::atomic<bool> all_task_submitted{false};

    auto t1 = std::chrono::high_resolution_clock::now();

    tbb::this_task_arena::enqueue([&] {
        tbb::parallel_for_each(a.begin(), a.end(), [](A& a_) {
            a_.Run();
        });
        all_task_submitted = true;
    });

    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "Spawn tasks duration " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " sec" << std::endl;

    // Wait until all tasks are submitted
    while (!all_task_submitted) { std::this_thread::yield(); }

    // A::~A() will wait for all tasks completion
}
Reply