topic concurrent_unordered_map: good hash functionsj for lists of 32-bit integers, character strings, etc in IntelĀ® oneAPI Threading Building Blocks & IntelĀ® Threading Building Blocks
https://community.intel.com/t5/Intel-oneAPI-Threading-Building/concurrent-unordered-map-good-hash-functionsj-for-lists-of-32/m-p/785741#M1403
Hi,<BR /><BR />I am disappointed with the lack of documentation regarding the hashing for concurrent_unordered_map. In particular, as a drop in replacement for boost's concurrent_unordered_map, they have completely different hash behavior.<BR /><BR />In my application, I hash sorted vectors of 32-bit integers (std::size_t or unsigned int), With Boost I was using the 'PJW Hash Function'<BR />From page 436 of the Aho Compiler Book (The one with the dragon on the cover)<BR />Also seen here: <A href="http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html" target="_blank">http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html</A><BR /><BR />The boost::unordered_map uses chained addressing and implements the canonical buckets with linked lists, and uses a prime number for the bucket sizes. You can see the large amount of explanation on it here: <A href="http://www.boost.org/doc/libs/1_49_0/doc/html/unordered/buckets.html" target="_blank">http://www.boost.org/doc/libs/1_49_0/doc/html/unordered/buckets.html</A><BR /><BR />the concurrent_unordered_map uses a skip-order list. Essentially the data structure is a linked list of all your data, and the "buckets" are pointers into specific entries of this list. To be honest, even after reading this paper late last night, I'm still a bit confused as to the particular invariants and implementations, but, the biggest important difference, is that the paper seems to use a table size of size 2^k.<BR />Consider (by example), taking any key (unsigned int) and reducing it mod 2^k. For example, if k = 1, then only the least significant bit of the key matters for determining a bucket. i.e. the lowest k significant bits determine the bucket. So a good hash here should enforce randomness on those k bits.<BR /><BR />I'll be posting more to this soon with more explanation as I have it. I hope that those particularly well informed, could post more on this matter.<BR /><BR />One question I have, is, if I know in advance the total size of this hash table, how should I choose the buckets as to minimize space and search time?Fri, 23 Mar 2012 13:54:05 GMTrhl_2012-03-23T13:54:05Zconcurrent_unordered_map: good hash functionsj for lists of 32-bit integers, character strings, etc
https://community.intel.com/t5/Intel-oneAPI-Threading-Building/concurrent-unordered-map-good-hash-functionsj-for-lists-of-32/m-p/785741#M1403
Hi,<BR /><BR />I am disappointed with the lack of documentation regarding the hashing for concurrent_unordered_map. In particular, as a drop in replacement for boost's concurrent_unordered_map, they have completely different hash behavior.<BR /><BR />In my application, I hash sorted vectors of 32-bit integers (std::size_t or unsigned int), With Boost I was using the 'PJW Hash Function'<BR />From page 436 of the Aho Compiler Book (The one with the dragon on the cover)<BR />Also seen here: <A href="http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html" target="_blank">http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html</A><BR /><BR />The boost::unordered_map uses chained addressing and implements the canonical buckets with linked lists, and uses a prime number for the bucket sizes. You can see the large amount of explanation on it here: <A href="http://www.boost.org/doc/libs/1_49_0/doc/html/unordered/buckets.html" target="_blank">http://www.boost.org/doc/libs/1_49_0/doc/html/unordered/buckets.html</A><BR /><BR />the concurrent_unordered_map uses a skip-order list. Essentially the data structure is a linked list of all your data, and the "buckets" are pointers into specific entries of this list. To be honest, even after reading this paper late last night, I'm still a bit confused as to the particular invariants and implementations, but, the biggest important difference, is that the paper seems to use a table size of size 2^k.<BR />Consider (by example), taking any key (unsigned int) and reducing it mod 2^k. For example, if k = 1, then only the least significant bit of the key matters for determining a bucket. i.e. the lowest k significant bits determine the bucket. So a good hash here should enforce randomness on those k bits.<BR /><BR />I'll be posting more to this soon with more explanation as I have it. I hope that those particularly well informed, could post more on this matter.<BR /><BR />One question I have, is, if I know in advance the total size of this hash table, how should I choose the buckets as to minimize space and search time?Fri, 23 Mar 2012 13:54:05 GMThttps://community.intel.com/t5/Intel-oneAPI-Threading-Building/concurrent-unordered-map-good-hash-functionsj-for-lists-of-32/m-p/785741#M1403rhl_2012-03-23T13:54:05Zconcurrent_unordered_map: good hash functionsj for lists of 32-
https://community.intel.com/t5/Intel-oneAPI-Threading-Building/concurrent-unordered-map-good-hash-functionsj-for-lists-of-32/m-p/785742#M1404
With concurrent_hash_map you need good randomness in the low-order bits, so you could try if that also helps with concurrent_unordered_map.Sat, 24 Mar 2012 06:06:17 GMThttps://community.intel.com/t5/Intel-oneAPI-Threading-Building/concurrent-unordered-map-good-hash-functionsj-for-lists-of-32/m-p/785742#M1404RafSchietekat2012-03-24T06:06:17Z