- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am tuning my application with VTune, and I have found that in an area of my code that does parallel traversal of a concurrent_unordered_map, I get zero parallelism. After this part of the code completes, I get a high amount of concurrency.
In particular, I use a parallel_for, with tbb::concurrent_unordered_map::range() to do some work for each (key, value) entry within the map. I suspect that the lack of concurrency is due to interal waits, or some form of synchronization that is occurring within the implementation. It could also happen that the range() is not providing sufficient work for the worker threads to participate.
Any suggestions (or experience) with slow traversal via range() ?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a VTune-Friendly example of the problem. I compile this with:
g++ -g -O3 -DTBB_USE_THREADING_TOOLS=1 -I /opt/intel/vtune_amplifier_xe_2013/include/ --std=c++0x play.cpp -ltbb -L /opt/intel/vtune_amplifier_xe_2013/lib64/ -littnotify
[cpp]
#include
#include
#include
#include "ittnotify.h"
#include
int main()
{
tbb::concurrent_unordered_map testmap;
const int worksize = 10000000;
__itt_domain* vt_domain = __itt_domain_create("sample");
const std::string event_name_build_table = "build_table";
__itt_event event_build_table = __itt_event_create( event_name_build_table.c_str(), event_name_build_table.size() );
const std::string event_name_traverse_table = "traverse_table";
__itt_event event_traverse_table = __itt_event_create( event_name_traverse_table.c_str(), event_name_traverse_table.size() );
// BUILD THE TABLE
__itt_event_start(event_build_table);
tbb::parallel_for(0, worksize, [&](unsigned int x) { testmap[ x ] = x; } );
__itt_event_end(event_build_table);
// TRAVERSE THE TABLE
__itt_event_start(event_traverse_table);
tbb::concurrent_vector interesting_numbers;
tbb::parallel_for( testmap.range(),
[&]( decltype( testmap)::range_type& r)
{
for ( auto curr_entry = r.begin(); curr_entry != r.end(); ++curr_entry)
{
// We are going to do something a bit tricky / expensive
auto my_num = curr_entry->second;
auto counter = my_num;
for ( int i = 2; i < 100; ++i )
{
if ( i % 2 == 0 )
{
counter = counter * i;
}
else
{
counter = 3 * counter - i;
}
}
if ( counter < my_num )
{
interesting_numbers.push_back(curr_entry->second);
}
}
}
);
__itt_event_end(event_traverse_table);
// Now, just do something silly to prevent optimizer from messing with us
std::cout << "Junk: " << interesting_numbers.size() << std::endl;
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you rule out contention on interesting_numbers by omitting it or by combining it from TLS? Maybe it's just a red herring, but...
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page