Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
13 Views

Very disappointing behavior (I would say bug). See example

Jump to solution
bool func()
{
	bool dummy = false;

	tbb::combinable<int> res( 0 );

	tbb::enumerable_thread_specific<int> tls( 0 );

	tbb::parallel_for(0, 1000, [&](int i) {
		int & local_i = tls.local();

		local_i = 1;

		tbb::parallel_for(0, 1000, [&](int k) {
			dummy = true;
		} );

		res.local() += local_i;

		local_i = 0;
	} );

	const int iRes = res.combine( std::plus<int>() );

	printf("iRes == %d\n", iRes);

	return (iRes == 1000);
}

Anybody would say, that iRes will be equal to 1000. But it is wrong. I spend several days trying to find that bug in very complex software. I finally understood why it is happen, but I would consider it as a design bug in tbb.

This is completely disappointing, because we can't rely on enumerable_thread_specific consistency anymore.

PS: MSVS 2012 update 5. TBB 2017.2

0 Kudos

Accepted Solutions
Highlighted
Employee
13 Views
0 Kudos
6 Replies
Highlighted
Employee
13 Views

Hi Maxim,

It is a known behavior of Intel TBB Task Scheduler. The thread executing internal/nested parallel loop is allowed to process tasks from outer level parallel loop. Therefore, the TLS local value can be overriden with "local_i = 0" when a thread executing nested parallel loop. You may want to read a documentation article about task isolation.

By the way, why do you need two TLS structures in your application? Is it some sort of reduction? Could not it be solved with tbb::parallel_reduce?

Regards, Alex

0 Kudos
Highlighted
Beginner
13 Views

Hi Alexei,

This is just an example to show and reproduce bug. In application it is much much more complex and completely different.

We often use TLS for memory buffers to optimize allocation. That behavior of TBB Task Scheduler makes it very dangerous.

0 Kudos
Highlighted
Beginner
13 Views

Alexei, I read your link. Thank you!

I see, it is well known behavior. The problem is that nested parallelization could be inside functions or even inside third-party libraries. Sometimes we don't know about it or this parallelization could appear in new version of code or library with no notice.

As I understood, the only way in such case is to wrap entire outer loop in

tbb::this_task_arena::isolate

?

Am I right?

0 Kudos
Highlighted
Beginner
13 Views

Should it work?

#define TBB_PREVIEW_TASK_ISOLATION 1
#include <tbb/tbb.h>

bool func()
{
	bool dummy = false;

	tbb::combinable<int> res( 0 );

	tbb::enumerable_thread_specific<int> tls( 0 );

	tbb::parallel_for(0, 1000, [&](int i) {
		tbb::this_task_arena::isolate( [&]{
			int & local_i = tls.local();

			local_i = 1;

			tbb::parallel_for(0, 1000, [&](int k) {
				dummy = true;
			} );

			res.local() += local_i;

			local_i = 0;
		} );
	} );

	const int iRes = res.combine( std::plus<int>() );

	printf("iRes == %d\n", iRes);

	return (iRes == 1000);
}

Unfortunately I can't test it. I don't have "Preview library" to link with.

0 Kudos
Highlighted
Employee
13 Views

Yes, it should work. If you want, you can reduce the scope of isolation to guard only tbb::parallel_for (but there is no any difference).

By the way, what Intel TBB package do you use? Usually, the preview library is shipped together with the main library.

Regards,
Alex

0 Kudos
Highlighted
Employee
14 Views
0 Kudos