Solved: Concurrent concurrent_hash_map::erase while iterating it

pacha_shevaev · ‎02-19-2009

Folks, am I right in my guessing that concurrent_hash_map::erase is not thread safe while traversing the map in other thread? What are the best practices to concurrently erase items from concurrent_hash_map except using explicit mutexes?

P.S. I'm new to this forum, tbb rocks!

Alexey-Kukanov · ‎02-20-2009

Quoting - pacha.shevaev

Please forgive my stubborness, I'll repeat my question again, is usage of explicit locking of the whole map the only way to go?

If you want to traverse the map concurrently with calling find(), insert(), and erase() on it - yes.

The primary use of maps is to store and access elements by keys, and this is what you could do in parallel with concurrent_hash_map. Traversing a container which at the same undergoes concurrent changes both makes little sense to me (because you can't guarantee you have traversed every element, or even every element within certain subset of keys) and cannot be implemented without significant performance penalties on both traversal and key-based operations.

In addition to the O'Reilly book, I recommend you to use official TBB Reference manual to learn more about what can and cannot be done. Though I agree it can be improved, the book does not pretend to be a complete manual, neither it should.

View solution in original post

Dmitry_Vyukov · ‎02-19-2009

Quoting - pacha.shevaev

Folks, am I right in my guessing that concurrent_hash_map::erase is not thread safe while traversing the map in other thread? What are the best practices to concurrently erase items from concurrent_hash_map except using explicit mutexes?

I believe that such usage is 100% thread-safe. tbb::concurrent_hash_map internally locks elements/buckets while you are iterating, so thread that deletes an element will wait for readers.

robert-reed · ‎02-19-2009

Quoting - Dmitriy Vyukov

I believe that such usage is 100% thread-safe. tbb::concurrent_hash_map internally locks elements/buckets while you are iterating, so thread that deletes an element will wait for readers.

Pacha, if you want to read all the gory detail about this, see http://software.intel.com/en-us/forums/showthread.php?t=57902

pacha_shevaev · ‎02-19-2009

Quoting - Dmitriy Vyukov

I believe that such usage is 100% thread-safe. tbb::concurrent_hash_map internally locks elements/buckets while you are iterating, so thread that deletes an element will wait for readers.

Well, here's a sample unit test which, I believe, demonstrates the problem:

 SUITE(ConcurrentHashMapTest) { struct Foo {   Foo(int i) : dummy(i) {}   int dummy; }; typedef boost::shared_ptr FooPtr;  template struct shared_ptr_hash_compare  {   static size_t hash(const boost::shared_ptr& ptr)    {     return (size_t)ptr.get();   }   static bool equal(const boost::shared_ptr& a, const boost::shared_ptr& b)    {     return a == b;   } };  typedef tbb::concurrent_hash_map > Map;  struct Accessor {   Accessor(Map& m) : map(m) {}    void run()   {     while(map.size() > 0)     {       BOOST_FOREACH(Map::value_type& el, map)       {         boost::this_thread::yield();         el.second->dummy += 1;       }     }   }    Map& map; };  struct Deleter {   Deleter(Map& m) : map(m) {}    void run()   {      while(map.size() > 0)     {       Map::iterator it = map.begin();       if(it != map.end())         map.erase(it->first);        boost::this_thread::yield();     }   }    Map& map; };  TEST(ConcurrentAccessToSharedPtrs) {   Map map;    for(size_t i=0; i<10000; ++i)   {     FooPtr foo(new Foo(i));     map.insert(std::make_pair(foo, foo));   }    Accessor accessor(map);   Deleter deleter(map);    boost::thread_group thg;   thg.create_thread(boost::bind(&Accessor::run, boost::ref(accessor)));   thg.create_thread(boost::bind(&Deleter::run, boost::ref(deleter)));    thg.join_all(); } };

In this test the accessor thread changes existing values while the deleter thread removes items from the map one by one. This test segfaults almost in 90% cases(if you are interested I can show dbg's backtrace for coredump)

Maybe, it's just my improper usage of the library, please guide me :)

P.S. I'm usingtbb21_20080605oss_src

pacha_shevaev · ‎02-19-2009

This forum seems to mangle tags, here's a pastebin link:

http://pastebin.com/f357e7725

pacha_shevaev · ‎02-19-2009

Quoting - Robert Reed (Intel)

Pacha, if you want to read all the gory detail about this, see http://software.intel.com/en-us/forums/showthread.php?t=57902

Now if only I could understand what those folks are talking about :) I'm kinda kidding but, seriously, that topic is way too technical for me and I'm just a TBB beginner at the moment. As I could understand reported bugs should have been fixed intbb21_20080605oss, please correct me if I'm wrong.

RafSchietekat · ‎02-20-2009

It's not a bug, it's a feature! - anonymous

As I understand it, in an attempt to improve the performance of concurrent_hash_map, it was made to behave like a Unix file system instead of a Microsoft file system: process 1 opens file, process 2 "erases" file (unlinks it, a thing with i-nodes), process 3 doesn't see file anymore, process 1 is still using it... or thinks it is, because meanwhile process 4 has come along and only what it writes into a new file by that name will be preserved, while the changes of process 1 will be lost. Hmm, I don't feel comfortable using Unix as the bad guy here, but there it is: sometimes it makes you have to be more careful than you would like.

I submitted an alternative implementation that hopefully provides similar performance without sacrificing predictability (erasers do wait their turn), and Dmitriy Vyukov submitted one that did sacrifice predictability (IMHO), but also did provide otherwise unattainable performance (or that was my impression, I have not measured it).

Perhaps the current implementation could be replaced with both of these implementations, so that the user can choose between predictability and speed?

(Added) Or an equivalent change, of course.

pacha_shevaev · ‎02-20-2009

Quoting - Raf Schietekat

It's not a bug, it's a feature! - anonymous

Thanks for explanations!

So, does it actually mean the only way to go for thenon-patched version of tbbis to use explicit locks?

Alexey-Kukanov · ‎02-20-2009

Quoting - pacha.shevaev

In this test the accessor thread changes existing values while the deleter thread removes items from the map one by one. This test segfaults almost in 90% cases(if you are interested I can show dbg's backtrace for coredump)
Maybe, it's just my improper usage of the library, please guide me :)

You use iterators, but tbb::concurrent_hash_map iterators are not thread safe.
You might use find, insert, and erase methods concurrently without problems.

RafSchietekat · ‎02-20-2009

It seems I should have looked at the actual problem to verify the relevance of the reference before reacting to the request for explanation.

"You use iterators, but tbb::concurrent_hash_map iterators are not thread safe." I agree.

"You might use find, insert, and erase methods concurrently without problems." I disagree. :-)

pacha_shevaev · ‎02-20-2009

Quoting - Alexey Kukanov (Intel)

Quoting - pacha.shevaev
In this test the accessor thread changes existing values while the deleter thread removes items from the map one by one. This test segfaults almost in 90% cases(if you are interested I can show dbg's backtrace for coredump)
Maybe, it's just my improper usage of the library, please guide me :)

You use iterators, but tbb::concurrent_hash_map iterators are not thread safe.
You might use find, insert, and erase methods concurrently without problems.

I was suspecting that, unfortunately O'Reilly Intel Thread Building Blocks book doesn't mention this in the section on concurrent_hash_map iterators.

Please forgive my stubborness, I'll repeat my question again, is usage of explicit locking of the whole map the only way to go?

Alexey-Kukanov · ‎02-20-2009

Quoting - pacha.shevaev

Please forgive my stubborness, I'll repeat my question again, is usage of explicit locking of the whole map the only way to go?

If you want to traverse the map concurrently with calling find(), insert(), and erase() on it - yes.

The primary use of maps is to store and access elements by keys, and this is what you could do in parallel with concurrent_hash_map. Traversing a container which at the same undergoes concurrent changes both makes little sense to me (because you can't guarantee you have traversed every element, or even every element within certain subset of keys) and cannot be implemented without significant performance penalties on both traversal and key-based operations.

In addition to the O'Reilly book, I recommend you to use official TBB Reference manual to learn more about what can and cannot be done. Though I agree it can be improved, the book does not pretend to be a complete manual, neither it should.

Alexey-Kukanov · ‎02-20-2009

Quoting - Raf Schietekat

"You might use find, insert, and erase methods concurrently without problems." I disagree. :-)

Yeah I know :) I recall that in our discussions we tried to convince you that there is no difference in visible behavior and guarantees for every participating thread, without success of course. Pardon my obliviousness, I can't recall if there was a realistic meaningful test giventhat would fail due to the "hole" in concurrent_hash_map's erase behaviour.

pacha_shevaev · ‎02-20-2009

Quoting - Alexey Kukanov (Intel)

If you want to traverse the map concurrently with calling find(), insert(), and erase() on it - yes.

Thank you all guys for helpful comments and suggestions

RafSchietekat · ‎02-20-2009

"is usage of explicit locking of the whole map the only way to go?" - Pacha Shevaev (sic?)

Perhaps if the map is used predominantly for reading, you might take the calculated risk of (and assume full responsibility for) locking the map only for iterating through it and for operations modifying its membership (insert/erase), which should work with the current implementation (unless it already changed too much since I last had a look at it). Otherwise you'll probably need an additional data structure that is maintained synchronously with the concurrent_hash_map and that is suitably resilient to concurrent insert/erase (adapting or throwing an exception according to your program's requirements). I myself wouldn't necessarily shy away from adding a concurrent internal iterator (visiting each element with a user-provided function object), but, as Alexey indicates, you would need to be very clear about the semantics, and it isn't likely to be added unless you first build a strong case; a concurrent external iterator seems very unlikely.

"I can't recall if there was a realistic meaningful test given that would fail due to the "hole" in concurrent_hash_map's erase behaviour." - Alexey Kukanov

For clarity, that would be erase(key) or erase(const_accessor), not erase(accessor), which is inherently safe, but I don't think the version I changed had that one yet. What would be your definition of "realistic" and "meaningful"? I think my friend Murphy would concur if I say that a lot of problems arise precisely because of a difference in opinion on what constitutes meaningful use. Compound that with the fragility of software systems, and you have a potential disaster on your hands. Many people never have problems working with Unix file systems without ever being aware that directories are really like maps of smart pointers, until something happens ("No, the program can't still be running, it's designed to refuse to run if the log file was not created yet, and it would have stopped by itself when the log file was removed."); I should point out that this behaviour may also be very useful, e.g., to update a program while its current version is still running. How do I know, if I give an example, that you wouldn't simply say that it is unrealistic and meaningless (which would be very difficult for me to deny because I invented it intending it to be flawed)? :-)

RafSchietekat · ‎02-20-2009

Well, I probably shouldn't be writing about this anymore without studying the current implementation. On the one hand having erase(accessor) seems all that is required (except clear documentation of what the overloads do differently from each other and an admonishment to revise any code that assumed the previous behaviour of erase(key)), no need for two implementations after all, and on the other hand I don't understand the discrepancy between returning "void" in Reference Manual (Open Source), rev. 1.9, and returning "bool" in tbb21_20081109oss, let alone why erase(accessor) could possibly have something to return. So consider me confused (I'm even having a deja vu about mentioning these details).

Alexey-Kukanov · ‎02-23-2009

Quoting - Raf Schietekat

What would be your definition of "realistic" and "meaningful"? ... How do I know, if I give an example, that you wouldn't simply say that it is unrealistic and meaningless (which would be very difficult for me to deny because I invented it intending it to be flawed)? :-)

I meant that it should not be merely a test application designed to show how the current implementation of tbb::concurrent_hash_map behaves not the same way as you would like it to, but something that pretends to be a use case, possibly based on some real problem to solve, or at least models such a use case. E.g. something more similar to TBB examples than to TBB unit tests.

Alexey-Kukanov · ‎02-23-2009

Quoting - Raf Schietekat

On the one hand having erase(accessor) seems all that is required (except clear documentation of what the overloads do differently from each other and an admonishment to revise any code that assumed the previous behaviour of erase(key)), no need for two implementations after all, and on the other hand I don't understand the discrepancy between returning "void" in Reference Manual (Open Source), rev. 1.9, and returning "bool" in tbb21_20081109oss, let alone why erase(accessor) could possibly have something to return. So consider me confused (I'm even having a deja vu about mentioning these details).

Thank you for reporting the discrepancies between implementation and the docs. I referred Anton to your post, to look at the problem and fix as appropriate. Most likely, we changed the implementation for some reason but forgot to update the docs.

I think that erase by accessor returns value only for consistency with other erase methods, and that it should always return true. But the actual implementation is shared with erase by const_accessor, and I am not the implementer; after quick glance I can't say whether my guess is correct.

RafSchietekat · ‎02-23-2009

My friend Murphy told me not to worry about that example: he'll get some people's programs to fail so they can then blame you. He says all it takes is unnecessary opportunity for misuse, and time. He told me I was actually too pessimistic about the opportunity for failure (his way of telling that it's actually easier to fail): even erase(accessor), the one that does require exclusive ownership, allows a new node with the same key to be inserted before the old node is destroyed (I think that I only temporarily forgot about that, myself), so that even supposedly smart people are likely to get lulled into a false sense of security, and allow the destructor of a mapped value to destroy an external resource or something. He also asked me not to turn this around and challenge you to, e.g., motivate the need for erase(key) and erase(const_accessor), because you might consider to take remedial action (like deprecating them or even just warning users), and that might spoil the fun... oops! :-)

Alexey-Kukanov · ‎02-24-2009

Quoting - Raf Schietekat

My friend Murphy told me not to worry about that example [...]

I like your sense of humor, sometimes :)

Anton_M_Intel · ‎03-02-2009

Quoting - Raf Schietekat

and on the other hand I don't understand the discrepancy between returning "void" in Reference Manual (Open Source), rev. 1.9, and returning "bool" in tbb21_20081109oss, let alone why erase(accessor) could possibly have something to return. So consider me confused (I'm even having a deja vu about mentioning these details).

The last revision of the Reference Manual (1.13) represents right and actual function prototype:
4.2.3.12 bool erase( accessor& item_accessor )
I believe it was fixed since revision 1.10.