Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Why is TBB Concurrent HashMap So Slow?

mr_stevenfeldman
Beginner
2,746 Views
I am implementing tbb's concurrent hash map to compare the performance of it against a suite of other concurrent hash tables.
However the performance I get out of it is horrendous, I just can't believe it is that slow compared to other concurrent hash tables
Here is my implementation of it:
class TBB: public TestDs{
typedef tbb::concurrent_hash_map > hash_t;
private:
hash_t _ds;
public:
TBB(const Configuration& config) : _ds(config.initial_count) {
}
bool containsKey(int key) {
hash_t::accessor a;
if(_ds.find(a,key)){
return true;
}
else
return false;
}
int get(int key) {
hash_t::accessor a;
if(_ds.find(a,key)){
return (int)(a->second);
}
else
return 0;
}
int put(int key, int value) {
return _ds.insert( std::make_pair(key, value) );
}
int remove(int key) {
return _ds.erase(key);
}
int size() {
return _ds.size();
}
const char* name() {
return "TBB";
}
void print() {}
void shutdown() {}
};
Does anyone see any issue with my implementation or know of any reason why it may perform slow?
It takes over 30 mins for it to insert 200,000 elements in a single thread environment. To put that in perspective, nearly all the other tables perform this test in less than 5 mins.
Here is my build code:
-w -DNDEBUG -g -msse2 -m32 -DINTEL -D_REENTRANT -lrt -pthread -fno-strict-aliasing -l cds -l tbb -lllalloc
UPDATE: I have adjusted my testing code to prefill the hash tables to 1000, instead of 100,000.
When running it again, tbb performs 92 op/sec while, another implementation performs 89431 op/sec. (64 thread environment)...Just saying something doesn't seem right....
Additional Info: Computer is a HP Z600 Workstation with 6gb of ram and 6 cores.
0 Kudos
4 Replies
Anton_M_Intel
Employee
2,746 Views
0! Specify the initial size of the table as an appropriate known level. It scales bad with default initial size.
1. Check the hash function. It should take into account the fact that the table size is power of two and it requires randomness in the lowest bits of the hash value, e.g. multiply it by a prime number (see the Reference).
2. Do not use accessors where possible - they are effectively locks, e.g. rewritecontainsKey() as { return _ds.count(key); }
3. Use tbbmalloc
0 Kudos
mr_stevenfeldman
Beginner
2,746 Views
0!: I do specifiy an inital size
1: Is there a built in tbb hash function for ints?
Hash Compare is defined as:
struct HashCompare {
static size_t hash( const K& key ) {
return sizeof(int);
}
static bool equal( const K& key1, const K& key2 ) {
return ( key1 == key2 );
}
};
2: THe contains key is legacy code, it is not called.
3: tbbmalloc may work, can you give me an implementation example?
0 Kudos
mr_stevenfeldman
Beginner
2,746 Views
It was an issue with my hash compare. it is an o'duh moment,....I can't believe I made that mistake.
I am using boost's hash function, because I don't know too much about tbb, which do u recommend?
0 Kudos
Anton_M_Intel
Employee
2,746 Views
0. Make sure it is several hundreds at least. The lesser size for tbb::concurrent_hash_map does not make much sense (serial map with spin_mutex around it may work faster on small sizes due to cache locality).
1. Yes, there is a default hash function in TBB doing said multiplication. Hash function is always a subject for application-specific tuning, default should be fine but it is not guaranteed to be best match for your application.
2.. ok, anyway, for those, who want to benchmark pure hashing algorithm without burden of accessors, it still makes sense.
3. It is used by default if tbbmalloc.dll is placed close to the application. If you want enforce its usage, please specify scalable_allocator as the template argument for the container.
0 Kudos
Reply