Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Very disappointing behavior (I would say bug). See example

Popov__Maxim
Beginner
493 Views
bool func()
{
	bool dummy = false;

	tbb::combinable<int> res( 0 );

	tbb::enumerable_thread_specific<int> tls( 0 );

	tbb::parallel_for(0, 1000, [&](int i) {
		int & local_i = tls.local();

		local_i = 1;

		tbb::parallel_for(0, 1000, [&](int k) {
			dummy = true;
		} );

		res.local() += local_i;

		local_i = 0;
	} );

	const int iRes = res.combine( std::plus<int>() );

	printf("iRes == %d\n", iRes);

	return (iRes == 1000);
}

Anybody would say, that iRes will be equal to 1000. But it is wrong. I spend several days trying to find that bug in very complex software. I finally understood why it is happen, but I would consider it as a design bug in tbb.

This is completely disappointing, because we can't rely on enumerable_thread_specific consistency anymore.

PS: MSVS 2012 update 5. TBB 2017.2

0 Kudos
1 Solution
Alexei_K_Intel
Employee
493 Views
0 Kudos
6 Replies
Alexei_K_Intel
Employee
493 Views

Hi Maxim,

It is a known behavior of Intel TBB Task Scheduler. The thread executing internal/nested parallel loop is allowed to process tasks from outer level parallel loop. Therefore, the TLS local value can be overriden with "local_i = 0" when a thread executing nested parallel loop. You may want to read a documentation article about task isolation.

By the way, why do you need two TLS structures in your application? Is it some sort of reduction? Could not it be solved with tbb::parallel_reduce?

Regards, Alex

0 Kudos
Popov__Maxim
Beginner
496 Views

Hi Alexei,

This is just an example to show and reproduce bug. In application it is much much more complex and completely different.

We often use TLS for memory buffers to optimize allocation. That behavior of TBB Task Scheduler makes it very dangerous.

0 Kudos
Popov__Maxim
Beginner
493 Views

Alexei, I read your link. Thank you!

I see, it is well known behavior. The problem is that nested parallelization could be inside functions or even inside third-party libraries. Sometimes we don't know about it or this parallelization could appear in new version of code or library with no notice.

As I understood, the only way in such case is to wrap entire outer loop in

tbb::this_task_arena::isolate

?

Am I right?

0 Kudos
Popov__Maxim
Beginner
493 Views

Should it work?

#define TBB_PREVIEW_TASK_ISOLATION 1
#include <tbb/tbb.h>

bool func()
{
	bool dummy = false;

	tbb::combinable<int> res( 0 );

	tbb::enumerable_thread_specific<int> tls( 0 );

	tbb::parallel_for(0, 1000, [&](int i) {
		tbb::this_task_arena::isolate( [&]{
			int & local_i = tls.local();

			local_i = 1;

			tbb::parallel_for(0, 1000, [&](int k) {
				dummy = true;
			} );

			res.local() += local_i;

			local_i = 0;
		} );
	} );

	const int iRes = res.combine( std::plus<int>() );

	printf("iRes == %d\n", iRes);

	return (iRes == 1000);
}

Unfortunately I can't test it. I don't have "Preview library" to link with.

0 Kudos
Alexei_K_Intel
Employee
493 Views

Yes, it should work. If you want, you can reduce the scope of isolation to guard only tbb::parallel_for (but there is no any difference).

By the way, what Intel TBB package do you use? Usually, the preview library is shipped together with the main library.

Regards,
Alex

0 Kudos
Alexei_K_Intel
Employee
494 Views

Additional information can be found in the blog article about work isolation.

0 Kudos
Reply