Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Markus_Dreseler
Beginner
114 Views

Bug in tbb::concurrent_unordered_multimap? Entries get lost even if single-threaded

My understanding is that tbb::concurrent_unordered_multimap should behave like std::unordered_multimap if I am using only one thread. However, in this example, it does not:

#include "tbb/concurrent_unordered_map.h"
#include <iostream>
#include <unordered_map>

struct myhash {
    size_t operator()(const int& a) const {
        return 1;
    }
};

int main() {
    tbb::concurrent_unordered_multimap<int, int, myhash> tbbidx;
    std::unordered_multimap<int, int, myhash> stdidx;

    for(int i = 0; i < 100; ++i) {
        tbbidx.insert(std::make_pair(i % 10, i));
        stdidx.insert(std::make_pair(i % 10, i));
    }

    std::cout << "tbb size " << tbbidx.size() << std::endl;
    std::cout << "tbb count " << tbbidx.count(0) << std::endl;
    std::cout << "std size " << stdidx.size() << std::endl;
    std::cout << "std count " << stdidx.count(0) << std::endl;
}

The result is this:

tbb size 100
tbb count 1
std size 100
std count 10

If I remove myhash, I get the correct result. The way I understand hashmaps, however, is that a horrible hash function should only affect the performance, not the correctness as long as the function returns the same value if x==y.

Also, if I do an ordered insert (i / 10, i instead of i % 10, i), everything is fine.

0 Kudos
14 Replies
Markus_Dreseler
Beginner
114 Views

This appears to be a bug in internal_equal_range. Finding the end iterator for the range works like this:

                iterator last = first;
                do ++last; while( allow_multimapping && last != end() && !my_hash_compare(get_key(*last), key) );

If a hash collision occurs, the bucket might look like this (without values): [5, 3, 5, 7, 5, 5, 2]

The above code, when searching for 5, would get !my_hash_compare(3, 5) for the second element and stop there. Thus, all following occurrences of 5 are ignored. If I always return end() for the last iterator, I get all elements that I am looking for. However, I will have to manually check if the keys are really equal to 5.

In my opinion, there should be a wrapper around the solist_t::iterator if allow_multimapping is set which does exactly this. I believe this would also solve this:

  • Insertion does not invalidate or update the iterators returned by equal_range, so insertion may cause non-equal items to be inserted at the end of the range. However, the first iterator will nonethless point to the equal item even after an insertion operation.

Could someone please confirm this?

RafSchietekat
Black Belt
114 Views

Markus D. wrote:

This appears to be a bug in internal_equal_range.

At least in the comment "// This operation makes sense only if mapping is many-to-one.": a map can implement one-to-one and many-to-one, and a multimap can additionally implement one-to-many and many-to-many. Without usage constraints, a map is many-to-one and a multimap is many-to-many. Right? :-)

Markus D. wrote:

If a hash collision occurs, the bucket might look like this (without values): [5, 3, 5, 7, 5, 5, 2]

Have you verified this? I haven't studied this data structure yet, but it seems obvious that elements with identical keys are supposed to be adjacent. It would be more than just a bug for internal_equal_range() to be wrong about that.

Markus D. wrote:

If I always return end() for the last iterator, I get all elements that I am looking for. However, I will have to manually check if the keys are really equal to 5.

In my opinion, there should be a wrapper around the solist_t::iterator if allow_multimapping is set which does exactly this.

That's obviously not appropriate.

Markus D. wrote:

I believe this would also solve this:

Insertion does not invalidate or update the iterators returned by equal_range, so insertion may cause non-equal items to be inserted at the end of the range. However, the first iterator will nonethless point to the equal item even after an insertion operation.

Could someone please confirm this?

It seems easy enough to check the keys while iterating through such a range (you should be able to just stop when a different key is seen).

Markus_Dreseler
Beginner
114 Views

The bug was confirmed by chris at my StackOverflow post (https://stackoverflow.com/questions/23866474/bug-in-tbbconcurrent-unordered-multimap-entries-get-los...) and will hopefully get fixed soon.

RafSchietekat
Black Belt
114 Views

Good to hear! (I didn't doubt that there was a problem, just the analysis.)

114 Views

Hi Markus, Raf,

Thanks again, Markus, for the bug report; it made finding the error much easier.

The split-ordered list keeps the records in ascending order of the bit-reversal of the hash of the keys, but until the fix for this bug, within a particular hash each new item was inserted at the end of the list having the same hash value.  The hope of the designer was that the hash methods used were "good", but the hash isn't guaranteed to be unique.  So now (after the fix) within the same hash value the records are inserted in ascending key order, which ordering is implied by the code in internal_equal_range (well, it would work with either ascending or descending order.  :) )

Definitely a bug.

Regards,
Chris

RafSchietekat
Black Belt
114 Views

Christopher Huson (Intel) wrote:

So now (after the fix) within the same hash value the records are inserted in ascending key order, which ordering is implied by the code in internal_equal_range (well, it would work with either ascending or descending order.  :) )

There is no such thing as "ascending key order" in a concurrent_unordered_multimap. Could you elaborate?

(2014-05-28 Replaced)

114 Views

Hi, Raf,

The elements themselves are not ordered by key.  But you have to be able to find them, so the ordering of the items in the list are based on the hash.  The bits of the hash are reversed to make It possible to divide buckets when necessary, but you use the value and the pointer table to find the element if it exists.

The bug reported by Markus was the degenerate hash case (no possible division by hash.)  To keep items together you keep track of whether you saw the key value, and when you no longer see the key value you add the item to the end.  This enforces an "equality ordering" in the list, but not an inter-key-value ordering. (All the items with the same key are together.  This is implied by internal_find_equal.)

Shalev & Shavit do not deal with the "ultimately-bad" hash (they only filter a few bits out for their performance tests), and they do not talk about multisets and multimaps, so the original split-ordered list paper does not describe this degenerate case.  But given the implementation of the code, and the requirement that the multiset/multimap case give correct counts, this is a required fix.

Best Regards,
Chris

114 Views

So sorry; I misspoke when I said there was an "ascending" order (this occurred because Markus's code inserted the items in ascending key order, but this is not a requirement to make the table work.)

RafSchietekat
Black Belt
114 Views

Ah, from the order imposed by the particular example, that makes sense... although I still don't see what "implied by the code in internal_equal_range" was supposed to mean then. I definitely lost some sleep over that! :-)

BTW, why is the boolean value of the equality predicate in hash_compare inverted?

114 Views

Hi Raf,

In the internal_equal_range() method is the code:

                iterator first = my_solist.get_iterator(it);
                iterator last = first;
                do ++last; while( allow_multimapping && last != end() && !my_hash_compare(get_key(*last), key) );
                return pairii_t(first, last);

So the loop walks the split-ordered list starting at the first occurrence of key, stopping when it hits the end or finds a key which is not equal to key.  I infer from this that internal_equal_range expects all the entries which have this key to be adjacent to each other.

Is there another way it could work?

Regards,
Chris

114 Views

Hi Raf,

I forgot to mention, the value returned by my_hash_compare is a sokey_t, which is typedefed to size_t.  It was unclear why this is, but in any case someone here pointed out to me that my_hash_compare returns a bool.  Confusing, isn't it?

Regards,
Chris

RafSchietekat
Black Belt
114 Views

It is indeed this code in internal_equal_range() that I interpreted as a strong indication that adjacent storage for equal keys is a basic assumption, rather than "a bug" (Markus), and you've since confirmed that the problem was situated elsewhere. But what confused me was the apparent conjunction "So now (after the fix) within the same hash value the records are inserted in ascending key order, which ordering is implied by the code in internal_equal_range (well, it would work with either ascending or descending order.", whereas internal_equal_range() does not imply such order (only the example does), even though the inverted boolean value makes the compare part look like an order predicate (but with the arguments in the wrong order, which I initially missed in a comment that I have since removed), only adjacency of all elements with the same key. Let's leave it at a miscommunication.

I think that my_hash_compare used with one argument (a key) returns a size_t that is typedef'ed to a sokey_t, i.e., the opposite of what you wrote here. But hash_compare is a bundle of two functions, and that's only the hash part. The compare part takes two arguments (two keys), and returns a boolean that indicates whether the keys are unequal, i.e., the inverted boolean value I mentioned above.

Anyway, what counts is that Markus found a problem with a reproducer, and you the solution.

Francis_T_
Beginner
114 Views

Hi Chris

Has this bug been fixed now? If yes, which version has it? Thank you very much.

Regards,

-Francis

114 Views

I guess this was fixed in Intel TBB 4.3 Update 6 according to CHANGES file. I can't reproduce it in version 4.4

--Vladimir

Reply