Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

task_group stuck on wait

Uraza
Novice
474 Views

Hello.

I have an issue where two tbb task_group are run, but the first one gets stuck when calling wait.

I am using tbb 2019 Update 3 and I am stuck with that version for now.

A simplified version of the code is like this. There are two Pool objects. When Pool::Start is called, it runs a functor inside a tbb::task_group until Pool::Stop gets called.

There are more than 2 threads available in my case (tbb::task_scheduler_init::default_num_threads() > 2).

 

  class Pool {
   public:
    Pool() : _is_stopped(false) {}

    void Start() {
      _task_group.run([&]() {
        while (!_is_stopped) {
          std::this_thread::sleep_for(std::chrono::microseconds(100));
        }
        std::cout << "Loop for pool " << this << " is finished" << std::endl;
      });
    }

    void Stop() {
      _is_stopped = true;
      std::cout << "Waiting for task in pool " << this << std::endl;
      _task_group.wait();
      std::cout << "Pool " << this << " stopped" << std::endl;
    }

   private:
    tbb::task_group _task_group;
    std::atomic<bool> _is_stopped;
  };

  std::vector<Pool> pools(2);

  for (auto& pool : pools) {
    pool.Start();
  }

  for (auto& pool : pools) {
    pool.Stop();
  }

 

This is what this sample program prints on my side:

Waiting for task in pool 0x605690
Loop for pool 0x605690 is finished

This shows that, although the first task_group execution should be finished, the Pool::Stop call is stuck on wait for some reason (the "Pool 0x605690 stopped" message does not get displayed).

pstack shows that only one thread is still running its task (the second pool):

Thread 1 (Thread 0x2aaaaac44ac0 (LWP 216444)):
#0 0x00002aaab54e4e9d in nanosleep () from /lib64/libpthread.so.0
#1 0x000000000044290b in tbb::internal::function_task<TC1::test_method()::Pool::Start()::{lambda()#1}>::execute() ()
#2 0x00002aaaaab50a65 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x607000, parent=..., child=<optimized out>) at ../../include/tbb/machine/gcc_ia32_common.h:100

All other threads are idle like this:

Thread 2 (Thread 0x2aaad6007700 (LWP 216703)):
#0 0x00002aaab60e3bf9 in syscall () from /lib64/libc.so.6
#1 0x00002aaaaab45aed in futex_wait (comparand=2, futex=0x6117ac) at ../../include/tbb/machine/linux_common.h:85
#2 P (this=0x6117ac) at ../../src/tbb/semaphore.h:209
#3 commit_wait (c=..., this=0x6117a0) at ../../src/tbb/../rml/server/thread_monitor.h:258
#4 tbb::internal::rml::private_worker::run (this=0x611780) at ../../src/tbb/private_server.cpp:277
#5 0x00002aaaaab45b29 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:223
#6 0x00002aaab54ddea5 in start_thread () from /lib64/libpthread.so.0
#7 0x00002aaab60e98dd in clone () from /lib64/libc.so.6
Thread 3...
...

Any idea why this program hangs?

My assumption is that the main thread gets assigned to the execution of the second Pool, thus ending in an infinite loop, but I am not sure about it (I noticed that setting both _is_stopped booleans to true before calling wait solves the issue, e.g. by splitting Pool::Stop into Pool::Stop that sets the boolean to true and Pool:Wait that does task_group::wait).

Thanks.

0 Kudos
1 Solution
Uraza
Novice
448 Views

I think I understood what is happening.

I tracked the sequence of operations by printing tbb::this_task_arena::current_thread_index() inside the various functions:

1) Thread 0 calls Pool::Start and adds a task inside pool 0's task_group.

2) Thread 0 calls Pool::Start and adds a task inside pool 1's task_group.

3) Thread 0 calls Pool::Stop and starts waiting for the pool 0's task_group: since the task_group is not finished, it picks some work and starts executing pool 1's task. At this point, thread 0 is stuck is an infinite loop since since pool 1's _is_stopped flag will never be set to true.

4) Threads 1 starts executing pool 0's task: it ends up immediately since pool 0's _is_stopped flag had been set to true by thread 0 earlier, at the beginning of step 3.

The only way for the initial code to be formed correctly is to set the _is_stopped flag for each Pool first, and then start waiting, thus ensuring that loops are not infinite.

  class Pool {
   public:
    Pool() : _is_stopped(false) {}

    void Start() {
      _task_group.run([&]() {
        while (!_is_stopped) {
          std::this_thread::sleep_for(std::chrono::microseconds(100));
        }
      });
    }

    void Stop() {
      _is_stopped = true;
    }

    void Wait() {
      _task_group.wait();
    }

   private:
    tbb::task_group _task_group;
    std::atomic<bool> _is_stopped;
  };

  std::vector<Pool> pools(2);

  for (auto& pool : pools) {
    pool.Start();
  }

  for (auto& pool : pools) {
    pool.Stop();
  }

  for (auto& pool : pools) {
    pool.Wait();
  }

There may still be a constraint to have enough workers to avoid hanging, but that can be checked upfront.

View solution in original post

2 Replies
Uraza
Novice
449 Views

I think I understood what is happening.

I tracked the sequence of operations by printing tbb::this_task_arena::current_thread_index() inside the various functions:

1) Thread 0 calls Pool::Start and adds a task inside pool 0's task_group.

2) Thread 0 calls Pool::Start and adds a task inside pool 1's task_group.

3) Thread 0 calls Pool::Stop and starts waiting for the pool 0's task_group: since the task_group is not finished, it picks some work and starts executing pool 1's task. At this point, thread 0 is stuck is an infinite loop since since pool 1's _is_stopped flag will never be set to true.

4) Threads 1 starts executing pool 0's task: it ends up immediately since pool 0's _is_stopped flag had been set to true by thread 0 earlier, at the beginning of step 3.

The only way for the initial code to be formed correctly is to set the _is_stopped flag for each Pool first, and then start waiting, thus ensuring that loops are not infinite.

  class Pool {
   public:
    Pool() : _is_stopped(false) {}

    void Start() {
      _task_group.run([&]() {
        while (!_is_stopped) {
          std::this_thread::sleep_for(std::chrono::microseconds(100));
        }
      });
    }

    void Stop() {
      _is_stopped = true;
    }

    void Wait() {
      _task_group.wait();
    }

   private:
    tbb::task_group _task_group;
    std::atomic<bool> _is_stopped;
  };

  std::vector<Pool> pools(2);

  for (auto& pool : pools) {
    pool.Start();
  }

  for (auto& pool : pools) {
    pool.Stop();
  }

  for (auto& pool : pools) {
    pool.Wait();
  }

There may still be a constraint to have enough workers to avoid hanging, but that can be checked upfront.

NoorjahanSk_Intel
Moderator
423 Views

Hi,

 

Thanks for reaching out to us.

 

Glad to know that your issue is resolved and thanks for sharing the solution.

As this issue has been resolved, we will no longer respond to this thread.

If you require any additional assistance from Intel, please start a new thread.

 

Thanks & Regards,

Noorjahan.

 

Reply