I am looking for a concurrent map with the ability to insert, erase, and traverse concurrently.
In the concurrent_unordered_map, the erase methods are prefixed with unsafe_ to indicate that they are not concurrency safe.
What exactly happens if I use the unsafe_erase concurrenly with iterator or insertion? How dangerous is that?
I agree with Raf. Insertion and traversal are supported concurrently because the pointer updates are done atomically, and if a thread is traversing a list while another is inserting a new item, the new item may or may not be visited, but the traversal is still correct.
If an item is deleted while the list is being traversed, a thread examining the item being deleted may not find the next field pointing to the next item in the list, but possibly a free list or nothing in particular (with possible resulting pyrotechnics.)
Wow, Chris, did you clear that with legal and marketing? Let me amend my own statement by clarifying that today's hardware can probably take it, but also that maybe it will be the programmer who deliberately wrote such code who might get... fired.
More seriously, I would consider using concurrent_unordered_map with a mapped_type containing an atomic<bool> to indicate whether the item is still alive, depending on how many dead items may accumulate that way, and/or whether the program will have the opportunity to occasionally prune them.
Alternatively, maybe concurrent_hash_map could be used if iteration and deletion are made mutually exclusive with the help of an external mutex? (Note the question mark: it's just an idea, I haven't checked it.) You couldn't use concurrent_unordered_map for that because, in addition to what Chris wrote, there's no synchronisation between deletion and use, while in a concurrent_hash_map the user at least knows the item is still valid, even if it may have been concurrently removed from the map (like a file on a Unix-like O.S.).
(Added) Insertion would also require a lock on the external mutex, which should probably be reader-writer: non-exclusive for traversal, and exclusive for insertion and deletion, or vice versa (for lack of a mutex with two categories that each can be locked multiple times). Would that do (if it works)?
You might look at proxy_vector as used in QuickThread (www.quickthreadprogramming.com). This code is now GPL. With some work, and attribrution, you may be able to port it into TBB. Essentially it is a concurrent unordered map of a vector of pointers to objects that is Lock-Free and Wait-Free (with individual cells being lockable).
A push_back(&obj) as performed by one thread when vector is at capacity will internally cause that thread to attempt to expand the vector. The vector is not locked during this process. An, or any number of, additional threads may also attempt a push_back(&otherObj) prior to the first thread succeeding in expanding the vector (possibly due to preemption). In this event multiple threads would compete (without locks) as to who will win the competition to expand the vector. The proxy_vector is live during expansion, meaning other threads performing scans or cell lock/cell free are free to do so unimpeded (other than for attempts to cell lock the same cell). Concurrent locks of same cell do not block the loser, rather a failure indication is returned.
Object creation (ctor) deletion (dtor) or copy (operator=()) is not the responsibility of the proxy_vector, the application threads (or shell object) is responsible for this. Because of this there is no requirement for locks and/or critical sections. Interlocked instructions are used, but these are not pre-emptible and thus will not interfere with other threads use of the vector during preemption.
Thanks Chris, Raf, Jim
I simply wrapped the concurrent_unordered_map with read/write locks:
read/shared locks on insert, find, and iterate
and write/exclusive locks on erase.
Here is some numbers on i7, 8 threads, VS 2012
inserting: avg: 0.003553 ms, max: 0.372243 ms
finding: avg: 0.000789 ms, max: 0.367506 ms
erasing: avg: 0.006316 ms, max: 3.970330 ms
I use Windows Slim reader/writer (SRW) locks, they work a little bit better for me than tbb::spin_rw_mutex or tbb::queuing_rw_mutex
Without the locks:
inserting: avg: 0.001579 ms, max: 0.121581 ms
finding: avg: 0.001579 ms, max: 0.080528 ms
So it is not so bad.
That also works, of course, but you should at least weigh those microbenchmark results to reflect the mix of accesses in your application, and with a preponderance of find accesses that's about twice as slow. Well, it's the same factor for insertion, but that may not be as important, which leads me to say:
Why don't you give my suggestion a try? Assuming that it is valid, it doesn't use an external mutex for find accesses, and so its performance should be closer to that of unprotected use of a concurrent_hash_map, with the described mix of operations.