Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2457 Discussions

Invoking parallel_for while holding a lock



The following code does not finish (mostly on my system). Am I doing something I shouldn't do? (i.e. invoking parallel_for while holding a lock) I think if parallel_for is invoked while all the other workers are blocked, the parallel_for routine should be executed single-threaded but it seems like this is not what's happening...

#include <iostream>

#include <tbb/tbb.h>

using namespace std;

int main( int argc, char* ap_args[] ) {
        cout << "test start." << endl;

        tbb::task_scheduler_init init( 12 );

        tbb::mutex* p_mutex;

        p_mutex = new tbb::mutex();

        for( int step = 0 ; step < 1000 ; step++ ) {
                cout << "step " << step << " start." << endl;
                tbb::parallel_for( tbb::blocked_range<int> ( 0, 10, 1 ), [&]( const tbb::blocked_range<int>& r ) {
                for( int i = r.begin() ; i < r.end() ; i++ ) {
                        cout << "i=" << i << " before lock." << endl;
                        cout << "i=" << i << " after lock." << endl;

                        tbb::parallel_for( tbb::blocked_range<int> ( 0, 100, 1 ), [&]( const tbb::blocked_range<int>& r2 ) {
                        int localSum = 0;
                        for( int j = r2.begin() ; j < r2.end() ; j++ ) {
                                localSum += j;
                        cout << "localSum=" << localSum << endl;
                        } );

                        cout << "i=" << i << " before unlock." << endl;
                        cout << "i=" << i << " after unlock." << endl;
                } );
                cout << "step " << step << " end." << endl;

        delete p_mutex;

        cout << "test end." << endl;

        return 0;
0 Kudos
2 Replies


Intel TBB uses task based approach to distribute work across the threads. It means that each tbb::parallel_for is split into some number of tasks that are taken by worker threads. The worker threads are not aware if the loop is outer or inner. They can take any task of any running parallel loop. So if some thread acquires the lock and starts processing of the inner parallel loop, it will generate the tasks that can be processed by other threads. The inner loop can also be parallel (even if it is guarded by a lock).

The thread that starts the parallel loop also participate in work distribution mechanism. It means that it can take a task that is not generated by the current parallel loop. Therefore, the thread can acquire the lock, start processing the inner parallel loop and take a task generated by outer parallel loop. As a result, it can try to acquire the lock the second time that leads to the observed deadlock.

You need either reconsider your approach not to guard the inner parallel loop or consider the task isolation functionality.

Regards, Alex

0 Kudos

Additional information can be found in the blog article about work isolation.

0 Kudos