Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Multiple tasks running concurrently in a single OS thread?

Xu_W_
Beginner
861 Views

Hi,

I wrote the following code the process a series of files in parallel:

std::vector<std::string> filenames;
// -> fill filenames
tbb::parallel_for_each(filenames.begin(), filenames.end(),
 [&](const std::string& filename)
{
log << "Start processing " << filename << " @Thread" << GetCurrentThreadId();
// -> process file
log << "End processing " << filename << " @Thread" << GetCurrentThreadId();
});

Sometimes after running the program I got the log output like following:

Start processing a.bmp @Thread5228
...
Start processing b.bmp @Thread5228
...
End processing b.bmp @Thread5228
...
End processing a.bmp @Thread5228

It seems the processing of b.bmp was injected into the thread with id=5228 before the processing of a.bmp had been finished.

Is it a normal behavior of the TBB task scheduler? If it is, how can I made the execution of multiple tasks in a same OS thread do not interleave with each other? My processing code use some thread-local variables that can only be used by one task at a time.

(The code was compiled with Visual C++ 2015 update 3 x64 native compiler and run on Windows 10)

0 Kudos
1 Solution
Alexey-Kukanov
Employee
861 Views

Xu W. wrote:
TBB is a threading library, why do I get this fiber-like behavior?

Don't get confused by the name. TBB is a parallel library for multicore. Its high-level API produces tasks that are then executed by the task scheduler. The scheduler happens to use threads as execution agents, but it could have used fibers or something else suitable for the job. Or, in different words, you get this behavior because in certain circumstances - namely, when there is nested parallelism - tasks may behave like fibers.

There are two levels of parallelism in your code: the outer is the parallel_for_each loop over files, and the inner is within MKL. When you use the TBB-based version of MKL, the tasks from both levels are processed by the same instance of TBB, i.e. the same set of threads.It might happen that a certain thread that started a nested parallel job in MKL has to wait while some tasks related to this job are executed by other threads. In such case, by default TBB allows this thread to take any task available for execution, and it might happen that the task it takes is from the outer level. You observe it as a second file being processed by the same thread.

There are recent discussions on this forum about essentially the same issue, and solutions are discussed. I recommend you to read these: https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/706167 and https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/703652. And of course please ask more questions as needed.

View solution in original post

0 Kudos
5 Replies
Xu_W_
Beginner
861 Views

The problem is caused by the use of MKL with TBB threading inside my processing code.

It only happens when I link my program against mkl_tbb_thread.lib. Replacing mkl_tbb_thread.lib with either mkl_sequential.lib or mkl_intel_thread.lib+libiomp5md.lib fixes the problem.

0 Kudos
jimdempseyatthecove
Honored Contributor III
861 Views

On a system with, as an example, 4 hardware threads, and TBB thread pool of 4 processing threads, a parallel_for_each can potentially run with a concurrency of 4. Thus you could (in this example) see the start of 4 Start processing ?.bmp files before seeing the corresponding End processing ?.bmp files. Additionally the End Processing ?.bmp files could (generally will) be in different order from the Start...

If you wish to concurrently process the files (with thread-safe coding) and produce the outputs in order, then consider using tbb::parallel_pipeline. Alternatively, you could insert your own collating method (e.g. ring buffer of a size of a multiple of TBB thread pool number of threads to effectively order the outputs).

If you wish to non-concurrently process the files, then use a standard for statement.

Jim Dempsey

0 Kudos
Xu_W_
Beginner
861 Views

I don't mind the processing order across different threads, but I can't allow two bmp files being processed concurrently in the same thread, as shown in above log output. TBB is a threading library, why do I get this fiber-like behavior?

 

0 Kudos
Alexey-Kukanov
Employee
862 Views

Xu W. wrote:
TBB is a threading library, why do I get this fiber-like behavior?

Don't get confused by the name. TBB is a parallel library for multicore. Its high-level API produces tasks that are then executed by the task scheduler. The scheduler happens to use threads as execution agents, but it could have used fibers or something else suitable for the job. Or, in different words, you get this behavior because in certain circumstances - namely, when there is nested parallelism - tasks may behave like fibers.

There are two levels of parallelism in your code: the outer is the parallel_for_each loop over files, and the inner is within MKL. When you use the TBB-based version of MKL, the tasks from both levels are processed by the same instance of TBB, i.e. the same set of threads.It might happen that a certain thread that started a nested parallel job in MKL has to wait while some tasks related to this job are executed by other threads. In such case, by default TBB allows this thread to take any task available for execution, and it might happen that the task it takes is from the outer level. You observe it as a second file being processed by the same thread.

There are recent discussions on this forum about essentially the same issue, and solutions are discussed. I recommend you to read these: https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/706167 and https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/703652. And of course please ask more questions as needed.

0 Kudos
Xu_W_
Beginner
861 Views

Alexey Kukanov (Intel) wrote:

There are recent discussions on this forum about essentially the same issue, and solutions are discussed. I recommend you to read these: https://software.intel.com/en-us/forums/intel-threading-building-blocks/... and https://software.intel.com/en-us/forums/intel-threading-building-blocks/.... And of course please ask more questions as needed.

Thank you. Wrapping my image processing code inside a task_arena works:

std::vector<std::string> filenames;
// -> fill filenames
tbb::task_arena nested;
tbb::parallel_for_each(filenames.begin(), filenames.end(),
 [&](const std::string& filename)
{
log << "Start processing " << filename << " @Thread" << GetCurrentThreadId();
nested.execute([&]() {
// -> process file
});
log << "End processing " << filename << " @Thread" << GetCurrentThreadId();
});

 

0 Kudos
Reply