- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have some code which uses an unordered_map. The code is broken into two phases:

- Phase 1: Building the map

- Phase 2: Further computation a which uses the map in a read only way

I was previously using the boost::unordered_map, but it did not support concurrent insertion. I switched to concurrent_unordered_map provided by tbb 4.0 (open source edition) and I have been able to get parallel speedup in Phase 1

but Phase 2 has slowed down dramatically.

In phase 2 I have multiple threads reading concurrently from this object (this worked fine with the boost object), it seems with the new object this this is not so fine.

The way phase 2 works is that it maintains a vector of pointers into the map (which is divided into equal sized pieces, each executing in parallel), each thread iterates through its portion of the vector, and at every step it performs some computation.

The computation involved basically is of this form: it looks at the value of the given key in the map, doing this it then performs some find()'s back into the map, and then stores the results of these finds, and then it keeps going..

Important: This code never modifies the underlying structure.

Is there some sort of locking going on in a call to find (or maybe, begin(), end()) that is causing this slowdown?

Can someone (preferably if possible, the implementor(s) of this structure), give me some kind of idea of what is going on under the hood with this structure, where does synchronization need to occur, and where I might _look_ for what is causing this...

Otherwise I will have to revert to the non tbb container, and take the serial performance hit with this container.

Link Copied

13 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

-- Bucket Data for Concurrent map --

Bucket count: 524,288

Load factor: 4

-- Bucket Data for Serial map --

Bucket count: 3,145,739

Load factor: 0.666664

just for sanity here are the typedef lines for each of the objects:

typedef tbb::concurrent_unordered_map< Cell, Data, Hash> Map;

typedef boost::unordered_map< Cell, Data, Hash> Map;

It looks like there is a dramatically different hash_map implementation between the two.

Anyone have a clue what is going on here?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

for example, why is it that in internal_find() we have this:

// If bucket is empty, initialize it first

if (!is_initialized(bucket))

init_bucket(bucket);

also, if the bucket is empty, and we are going to initalize it, the key is clearly not in the list.

yet the init_bucket() code adds a dummy iterator to the bucket so that the next giant for() loop executes, doing a bunch of seemingly useless things since the bucket is supposedly empty.

then it returns end().

if we are going to init_bucket() why not:

// If bucket is empty, initialize it first

if (!is_initialized(bucket)){

init_bucket(bucket);

return end();

}

???

**hopes developers actually read this forum***

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

yet the init_bucket() code adds a dummy iterator to the bucket so that the next giant for() loop executes, doing a bunch of seemingly useless things since the bucket is supposedly empty.

On a concurrent system, the moment after init_bucket(), a different thread may insert an item before your thread gets its opportunity to complete an insert (into what you think is a "supposedly"empty bucket).

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Nevertheless, the load factor seems un-necessarily high, and the bucket count is off by a HUGE factor compared to the boost unordered_map.

The result on the load factor and bucket count appears to be the same no matter how many threads I am using.

Is this a limitation of the a concurrent container, or simply an inferior implementation?

(Or am I just doing it wrong? I hope this is the answer)..

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The implementation of concurrent_unordered_map is based on split-ordered-list algorithm which ensures no locking while resizing&rehashing the table for growth. Of course, it involves a lot of in-memory atomic synchronizarion to provide thread-safety and thus is definetly slower than serial algorithm (on one thread).

Unfortunately, there is no serial counterpart method for find() similar to concurrent_hash_map::equal_range().

BTW, I would add serial interfaces for all the concurrent containers but our architect is against it. He doubts it has more advantages than potential problems.

As for the load factor, please use method max_load_factor(0.6) to set the value you want.

Hint 1: please pay attention to your hash function which can be not well suitable for (our concurrent) hash tables with size ofpower of two (please see the Reference). The hash function must provide good randomness in lower bits, e.g. via multiplication by a prime number.

Hint 2: reserve enough initial buckets before growing the table using rehash(1000) method. It can further improve your phase I.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

thanks for the reply!

I infact construct the tablet with as many buckets as I have keys. Again this seems to not help. I can try calling rehash when doing this.

The slowness seems to be agnostic to the number of threads. However it does seem to be due to this atomic+ strange implementation. I actually set the load factor to .6 but it doesn't seem to help (If I do this, then the build map gets a load_factor of .5 at the end), is the load_factor in this implementation comparable to load factors for traditional hash tables?

My hash function is the one used by Aho in the dragon book (the compilers book), pg 436:

h = 0;

for( typename Container::const_iterator i = c.begin();

i != c.end();

++i) {

h = (h << 4) + *i;

if ((g = h & 0xf0000000)) {

h = h^(g >> 24);

h = h^g;

}

}

return h;

to be honest, I don't really understand this function, but it seems to be doing a good job for me.

I see your answer to my post on your blog, so what you seem to be suggesting is that using concurrent_hash_map() should solve my problem? does the concurrent_hash_map find() do synchronization? I see the reference suggests that concurrent findss are fine in this structure, as long as you don't call insert/erase. good :)

I will try implementing this in an hour or so and see if it improves things.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

But, the hash function looks weak. The reason is that lower 4 bits are always mapped directly from the last value in a string (I suppose "typename Container" is a string). One simple solution is final multiplication by a prime.

Also, the highest 4 bits seem always zeroed. It is bad for big tables (I suppose 'h' is of size_t type? so, on 32-bit, the table can hold up to ~268M elements which is fairly realistic db size.. for smaller types it is a catastrophe).

Yes, concurrent_hash_map should fit your needs. Sure, the find() is a concurrent operation which is thread-safe and can be executed concurrently to insert()/erase(). So, your wording above is not precise.. What the Reference says is that find() invalidates (serial) iterators as well as other concurrent operations. But it still can be used together with iterators if rehash() was called after insert/erase.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

and yes h is of size_t.

the hash works very well for the boost_unordered_map

What I meant by synchronization is: if I call multiple finds at once, will the find() calls trample each other (i.e. some form of synchronization, like locks, atomics, etc), prevent each find() from doing it's job? (Note: two find() calls will never be made for the same key by two separate threads)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*"What I meant by synchronization is: if I call multiple finds at once, will the find() calls trample each other (i.e. some form of synchronization, like locks, atomics, etc), prevent each find() from doing it's job? (Note: two find() calls will never be made for the same key by two separate threads)"*

find() operations on different keys do not block each other, and that includes the resulting accessor instances' lifetimes.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Nevertheless, for our hash tables your hash function is definetly weak because they have size of power of 2 and so volnerable to poor randomness in lower bits of the hash values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*#8 "The hash function must provide good randomness in lower bits, e.g. via multiplication by a prime number."*

Such multiplication cannot transfer randomness from higher to lower bits, only the other way around.

How about multiplying with internal::hash_multiplier inside the hashing containers (allowing for a simplification of at least one tbb_hasher overload), and then using mainly the most significant bits of the product? Just an idea, I don't know what impact it would have on performance.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks, Raf! You are right.. we are looking what can be done for our tbb_hasher and the Reference

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page