Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

concurrent_unordered_map doesn't iterate in parallel

AJ13
New Contributor I
300 Views

Hi,

I am tuning my application with VTune, and I have found that in an area of my code that does parallel traversal of a concurrent_unordered_map, I get zero parallelism.  After this part of the code completes, I get a high amount of concurrency.

In particular, I use a parallel_for, with tbb::concurrent_unordered_map::range() to do some work for each (key, value) entry within the map.  I suspect that the lack of concurrency is due to interal waits, or some form of synchronization that is occurring within the implementation.  It could also happen that the range() is not providing sufficient work for the worker threads to participate.

Any suggestions (or experience) with slow traversal via range() ?

0 Kudos
3 Replies
AJ13
New Contributor I
300 Views
Here is a VTune-Friendly example of the problem. I compile this with: g++ -g -O3 -DTBB_USE_THREADING_TOOLS=1 -I /opt/intel/vtune_amplifier_xe_2013/include/ --std=c++0x play.cpp -ltbb -L /opt/intel/vtune_amplifier_xe_2013/lib64/ -littnotify [cpp] #include #include #include #include "ittnotify.h" #include int main() { tbb::concurrent_unordered_map testmap; const int worksize = 10000000; __itt_domain* vt_domain = __itt_domain_create("sample"); const std::string event_name_build_table = "build_table"; __itt_event event_build_table = __itt_event_create( event_name_build_table.c_str(), event_name_build_table.size() ); const std::string event_name_traverse_table = "traverse_table"; __itt_event event_traverse_table = __itt_event_create( event_name_traverse_table.c_str(), event_name_traverse_table.size() ); // BUILD THE TABLE __itt_event_start(event_build_table); tbb::parallel_for(0, worksize, [&](unsigned int x) { testmap[ x ] = x; } ); __itt_event_end(event_build_table); // TRAVERSE THE TABLE __itt_event_start(event_traverse_table); tbb::concurrent_vector interesting_numbers; tbb::parallel_for( testmap.range(), [&]( decltype( testmap)::range_type& r) { for ( auto curr_entry = r.begin(); curr_entry != r.end(); ++curr_entry) { // We are going to do something a bit tricky / expensive auto my_num = curr_entry->second; auto counter = my_num; for ( int i = 2; i < 100; ++i ) { if ( i % 2 == 0 ) { counter = counter * i; } else { counter = 3 * counter - i; } } if ( counter < my_num ) { interesting_numbers.push_back(curr_entry->second); } } } ); __itt_event_end(event_traverse_table); // Now, just do something silly to prevent optimizer from messing with us std::cout << "Junk: " << interesting_numbers.size() << std::endl; } [/cpp]
AJ13
New Contributor I
300 Views
Sigh. Looks like posting C++ code doesn't work as well as you would want. I'm attaching a text file instead.
RafSchietekat
Black Belt
300 Views
Can you rule out contention on interesting_numbers by omitting it or by combining it from TLS? Maybe it's just a red herring, but...
Reply