Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2465 Discussions

tbb_thread - using instance methods as piece of code to execute

superfox_il_volpone
722 Views
Hi,
I am striving how to create a thread that executes an instance method. This is the code I arrived:
[bash]
void ParallelAlgorithmThreadsImpl::execute(){
	// init threads
	assert(num_threads>=1);
	assert(num_threads<=32); // seems kind of error?

	threads = (tbb_thread**) calloc(num_threads, sizeof(tbb_thread*)); // threads is instance member
	ThreadParams params(num_threads);
	for(size_t i = 0; i < num_threads; i++){
		threads = new tbb_thread(&ParallelAlgorithmThreadsImpl::thread_execute, &(*this), params);
	}

	// wait for termination
	for(size_t i = 0; i < num_threads; i++){
		tbb_thread* t = threads;
		if(t->joinable()) t->join();
		// delete once a thread terminated
		delete t; t = NULL;
	}
}


void ParallelAlgorithmThreadsImpl::thread_execute(const ThreadParams& params){
// ...
}[/bash]
receiving the following error:

C:\\dev\\tbb\\include/tbb/tbb_thread.h: In static member function 'static unsigned int tbb::internal::thread_closure_2::start_routine(void*) [with F = void (ParallelAlgorithmThreadsImpl::*)(const ParallelAlgorithmThreadsImpl::ThreadParams&), X = ParallelAlgorithmThreadsImpl*, Y = ParallelAlgorithmThreadsImpl::ThreadParams]':
C:\\dev\\tbb\\include/tbb/tbb_thread.h:149:13: instantiated from 'tbb::internal::tbb_thread_v3::tbb_thread_v3(F, X, Y) [with F = void (ParallelAlgorithmThreadsImpl::*)(const ParallelAlgorithmThreadsImpl::ThreadParams&), X = ParallelAlgorithmThreadsImpl*, Y = ParallelAlgorithmThreadsImpl::ThreadParams]'
../ParallelAlgorithmThreadsImpl.cpp:50:94: instantiated from here
C:\\dev\\tbb\\include/tbb/tbb_thread.h:111:13: error: must use '.*' or '->*' to call pointer-to-member function in 'self->tbb::internal::thread_closure_2::function (...)', e.g. '(... ->* self->tbb::internal::thread_closure_2::function) (...)'

thanks for any consideration
regards
- s.fox
0 Kudos
10 Replies
RafSchietekat
Valued Contributor III
722 Views
"I am striving how to create a thread that executes an instance method."
Why would you even consider launching threads to execute a TBB task, I wonder?
0 Kudos
jimdempseyatthecove
Honored Contributor III
722 Views
Look at the parallel Fibonacci example for recursive programming.

Consider something along the line of:

[cpp]void invokeInstance(int n, const ThreadParams& params){
  if(n > 1) {
     parallel_invoke(
       [&](){ invokeInstance(n-1, params); },
       [&](){ ParallelAlgorithmThreadsImpl::thread_execute(params); });}
   else {
      ParallelAlgorithmThreadsImpl::thread_execute(params); }
}

Jim Dempsey[/cpp]
0 Kudos
RafSchietekat
Valued Contributor III
722 Views
This should work if there is no required concurrency between the threads (the essence of my blunt question above), but why sacrifice scalability just for the privilege of using lambdas if even parallel_invoke itself, with a fixed number of arguments, goes for balanced recursiveness instead (for multiple arguments)?
0 Kudos
jimdempseyatthecove
Honored Contributor III
722 Views
Raf,

Using the recursive parallel_invoke is scalable.

Consider the original method (prototyped along pthread programming style):
The instigator thread suffers the overhead of enqueueing all the participating tasks before it can participate in executing 0 or more of the enqueued tasks. IOW if n threads are available (including self) and if n tasks are enqueued, and if runtime of enqueued tasks is greater than n-1 (or n) enqueues, then the self thread has expended n/(n-1) enqueue overheads prior to starting work on work object.

The results will vary depending on the number of different tasks, and the ratio of the enqueue overhead versis the work (per task) overhead. In the O.P. first post, the same object appears to be passed to different tasks. The execution time of these tasks will likely vary from one another.

If this user has many such objects to pass through this gauntlet of tasks, then this would be an ideal situation for using a parallel_pipeline (as opposed to this pseudo-pthread style).

Jim Dempsey
0 Kudos
RafSchietekat
Valued Contributor III
722 Views
"Using the recursive parallel_invoke is scalable."
It takes O(tasks) latency for the "instigator thread" (with perhaps a constant-factor slowdown compared to spawning a task_list?), and also O(tasks) stealing overhead, compared to O(log tasks) for both with balanced recursiveness, doesn't it?
0 Kudos
jimdempseyatthecove
Honored Contributor III
722 Views
Raf,

The parallel_invoke need not perform a two way split.
[cpp]void invokeWork(int i, int j)
{
   int k = j-i+1;
   switch(k)
   {
   case 1:
       doWork(i);
       break;
   case 2:
       parallel_invoke(
          [=](){ doWork(i); },
          [=](){ doWork(i+1); });
       break;
   case 3:
       parallel_invoke(
          [=](){ doWork(i); },
          [=](){ doWork(i+1); },
          [=](){ doWork(i+2); });
       break;
   case 4:
       parallel_invoke(
          [=](){ doWork(i); },
          [=](){ doWork(i+1); },
          [=](){ doWork(i+2); },
          [=](){ doWork(i+3); });
       break;
    default: // k .gt. 4
       _ASSERT(k > 4); // debugging aid
       parallel_invoke(
          [=](){ invokeWork(i,i/2); },
          [=](){ invokeWork(i/2,j); });
   }
}[/cpp]

Ihave not looked at the internal implementation of the parallel_invoke within TBB. So I cannot attest to the O(tasks) latency for TBB. Regardless of the enqueuing latency(ies), should the additional threads require wakeup (SetEvent or signal condition variable) then there will be additional overhead.

I have constructed parallel_invoke in my QuickThread library. In the QuickThread implementation the parallel_invoke has less overhead than parallel_task. The compiler will generate the lambda functor list at compile time. The execution of the parallel_invoke will encounter a different amount of overhead depending on the state of the other threads (requested to run in the functor list). The parallel_invoke has one library entry overhead for the parallel_invoke plusthe enqueue overhead of less than one parallel_task for the number of functors, plus (iif the additional threads require a wakeup then) it will encounter additional overhead to perform theSetEvent or signal condition variable (as is the case inTBB with suspended threads). When the system is busy then parallel_invoke has very little overhead. The QuickThread parallel_invoke is synchronous (implied join/wait) whereas theQuickThread parallel_task is more flexible (has more overhead)and can be synchronous or asynchronous or as completion routine, or as FIFO, etc... parallel_invoke was stripped down to perform fast fork/join.

Jim
0 Kudos
RafSchietekat
Valued Contributor III
722 Views
I think we may be off on a tangent for lack of information/reaction about the original question: is concurrency required, how wide is the fan-out, does ParallelAlgorithmThreadsImpl have member variables that prevent thread_execute from being a static member function, what version of C++ is being used (std::mem_fn in C++11 might be relevant), etc.
0 Kudos
SergeyKostrov
Valued Contributor II
722 Views

Could you attach a Test-Case ( as acpp-file )with all declarations that reproduces the problem? I'll try to look at it.

Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
722 Views

I reproduced your compilation problem with TBB version 4.

In TBB version 4 a class 'tbb_thread' has 5 constructors with different number of arguments.

You didn't declare properly one, or several, arguments, or there is another problem related to declaration:

...
m_pptThreads = new tbb::tbb_thread( &CParallelAlgorithmThreadsImpl::ThreadExecute, &(*this), Params );
...

Since you didn't provide a complete Test-Case I created a Test-Case that reproduces, to some degree,
your Test-Case. It is the very basic Test-Case and it demonstrates different ways of threads
initialization.

The Test-Case is provided AS IS and some modifications are needed:

...
void ExecuteThread( RTint iParam );

void ExecuteThread( RTint iParam )
{
while( RTtrue )
{
CrtPrintf( RTU("ExecuteThread function - Param value: %ld\n"), iParam );
SysSleep( 500 );
}
}

class CThreadAction
{
public:
CThreadAction()
{
m_iParam = 0;
};

CThreadAction( RTint iParam )
{
m_iParam = iParam;
};

RTvoid operator()()
{
while( RTtrue )
{
CrtPrintf( RTU("CThreadAction::operator() - Param Value: %ld\n"), m_iParam );
SysSleep( 500 );
}
};

RTint m_iParam;
};

class CThreadParams
{
public:
CThreadParams()
{
m_iNumThreads = 0;
};

CThreadParams( RTint iNumThreads )
{
m_iNumThreads = iNumThreads;
};

RTvoid operator()()
{
while( RTtrue )
{
CrtPrintf( RTU("CThreadParams::operator() - Param Value: %ld\n"), m_iNumThreads );
SysSleep( 500 );
}
};

RTint m_iNumThreads;
};

class CParallelAlgorithmThreadsImpl
{
public:
CParallelAlgorithmThreadsImpl( RTint iNumThreads )
{
m_pptThreads = RTnull;
m_iNumThreads = iNumThreads;
};

virtual ~CParallelAlgorithmThreadsImpl()
{
if( m_pptThreads == RTnull )
return;

if( m_iNumThreads == 0 )
return;

for( RTint i = 0; i < m_iNumThreads; i++ )
{
CrtDelete( m_pptThreads );
m_pptThreads = RTnull;
}

CrtFree( m_pptThreads );
m_pptThreads = RTnull;
};

void Execute();
void ThreadExecute( const CThreadParams &Params );

tbb_thread **m_pptThreads;
RTint m_iNumThreads;
};

RTvoid CParallelAlgorithmThreadsImpl::Execute()
{
if( m_iNumThreads == 0 )
return;

RTint i;
// Init Threads
m_pptThreads = ( tbb_thread ** )CrtCalloc( m_iNumThreads, sizeof( tbb_thread * ) );

CThreadParams Params1;
CThreadParams Params2( m_iNumThreads );
CThreadAction Action1;

for( i = 0; i < m_iNumThreads; i++ )
{
// Test 01 - Status: Compilation Error
//m_pptThreads = CrtNew tbb::tbb_thread( &CParallelAlgorithmThreadsImpl::ThreadExecute, &(*this), Params );
m_pptThreads = CrtNew tbb::tbb_thread(); // Test 02 - Status: Compiled \ Tested \ Works
//m_pptThreads = CrtNew tbb::tbb_thread( CThreadParams() ); // Test 03 - Status: Compiled \ Tested \ Works
//m_pptThreads = CrtNew tbb::tbb_thread( CThreadParams( 777+i ) );// Test 04 - Status: Compiled \ Tested \ Works
//m_pptThreads = CrtNew tbb::tbb_thread( Params1 ); // Test 05 - Status: Compiled \ Tested \ Works
//m_pptThreads = CrtNew tbb::tbb_thread( Params2 ); // Test 06 - Status: Compiled \ Tested \ Works

//CThreadParams Params3( 777+i );
//m_pptThreads = CrtNew tbb::tbb_thread( Params3 ); // Test 07 - Status: Compiled \ Tested \ Works

//m_pptThreads = CrtNew tbb::tbb_thread( CThreadAction() ); // Test 08 - Status: Compiled \ Tested \ Works
//m_pptThreads = CrtNew tbb::tbb_thread( CThreadAction( 777+i ) ); // Test 09 - Status: Compiled \ Tested \ Works
//m_pptThreads = CrtNew tbb::tbb_thread( Action1 );// Test 10 - Status: Compiled \ Tested \ Works

//CThreadAction Action2( 777+i );
//m_pptThreads = CrtNew tbb::tbb_thread( Action2 ); // Test 11 - Status: Compiled \ Tested \ Works

//m_pptThreads = CrtNew tbb::tbb_thread( ExecuteThread, 777+i ); // Test 12 - Status: Compiled \ Tested \ Works
}

for( i = 0; i < m_iNumThreads; i++ )
{
tbb_thread *pT = m_pptThreads;
if( pT == RTnull )
continue;

if( pT->joinable() )
pT->join();
}
}

RTvoid CParallelAlgorithmThreadsImpl::ThreadExecute( const CThreadParams &Params )
{
}

...
RTvoid CrtMain( RTvoid )
{
CParallelAlgorithmThreadsImpl pat( 4 );
pat.Execute();
}
...

0 Kudos
RafSchietekat
Valued Contributor III
722 Views
Note that tbb_thread is meant to be equivalent to std::thread, which does not support this use either, so don't blame TBB.

Several workarounds have already been mentioned above.
0 Kudos
Reply