concurrent_unordered_map doesn't iterate in parallel

AJ13 · ‎10-09-2012

Hi,

I am tuning my application with VTune, and I have found that in an area of my code that does parallel traversal of a concurrent_unordered_map, I get zero parallelism. After this part of the code completes, I get a high amount of concurrency.

In particular, I use a parallel_for, with tbb::concurrent_unordered_map::range() to do some work for each (key, value) entry within the map. I suspect that the lack of concurrency is due to interal waits, or some form of synchronization that is occurring within the implementation. It could also happen that the range() is not providing sufficient work for the worker threads to participate.

Any suggestions (or experience) with slow traversal via range() ?

AJ13 · ‎10-09-2012

Here is a VTune-Friendly example of the problem. I compile this with: g++ -g -O3 -DTBB_USE_THREADING_TOOLS=1 -I /opt/intel/vtune_amplifier_xe_2013/include/ --std=c++0x play.cpp -ltbb -L /opt/intel/vtune_amplifier_xe_2013/lib64/ -littnotify [cpp] #include #include #include #include "ittnotify.h" #include int main() { tbb::concurrent_unordered_map testmap; const int worksize = 10000000; __itt_domain* vt_domain = __itt_domain_create("sample"); const std::string event_name_build_table = "build_table"; __itt_event event_build_table = __itt_event_create( event_name_build_table.c_str(), event_name_build_table.size() ); const std::string event_name_traverse_table = "traverse_table"; __itt_event event_traverse_table = __itt_event_create( event_name_traverse_table.c_str(), event_name_traverse_table.size() ); // BUILD THE TABLE __itt_event_start(event_build_table); tbb::parallel_for(0, worksize, [&](unsigned int x) { testmap[ x ] = x; } ); __itt_event_end(event_build_table); // TRAVERSE THE TABLE __itt_event_start(event_traverse_table); tbb::concurrent_vector interesting_numbers; tbb::parallel_for( testmap.range(), [&]( decltype( testmap)::range_type& r) { for ( auto curr_entry = r.begin(); curr_entry != r.end(); ++curr_entry) { // We are going to do something a bit tricky / expensive auto my_num = curr_entry->second; auto counter = my_num; for ( int i = 2; i < 100; ++i ) { if ( i % 2 == 0 ) { counter = counter * i; } else { counter = 3 * counter - i; } } if ( counter < my_num ) { interesting_numbers.push_back(curr_entry->second); } } } ); __itt_event_end(event_traverse_table); // Now, just do something silly to prevent optimizer from messing with us std::cout << "Junk: " << interesting_numbers.size() << std::endl; } [/cpp]

AJ13 · ‎10-09-2012

Sigh. Looks like posting C++ code doesn't work as well as you would want. I'm attaching a text file instead.

RafSchietekat · ‎10-11-2012

Can you rule out contention on interesting_numbers by omitting it or by combining it from TLS? Maybe it's just a red herring, but...