Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2466 Discussions

concurrent_hash_map snap shot (copy of contents)

ele123456
Beginner
420 Views
I need to have "some kind of" snap shot or copy of the contents of a concurrent_hash_map while there are multiple threads inserting and erasing elements. I have browsed through this forum and kind of concluded from some of the answers that this may not be possible but didn't really see it directly confirmed anywhere.

I made a small test application that takes snap shot copieswith 1) copy constructor, 2) constructor that takes begin/end arguments, 3) assignment operator. None of these seem to work (they crash) while multiple threads insert and erase elements.

2) seems to work if there are no new elements added or elements erased, i.e. there are only lookups or changes to existing elements. 1) and 3) crash even in this case.

I can well understand 1)and 3)not working because the signature actually requires the source to be const. The signature of the constructor with begin/end iterators is not so clear, are these const iterators?

My questions are:

A. Is there any practical way of taking "some kind of" snap shot copy of a concurrent_hash_map?
B. Isit correctthat 2) works if the size of the concurrent_hash_map does not change, i.e. only lookups or changes to existing elements. I f I could rely on this fact then I might be able to take a snap shot copy by locking inserts and erases when snap shots need to be taken.
C. If answer to B is no, thenshould I lock all access while making a snap shot?
D. Is there a way of taking a snap shot of any of the other concurrent containers? If yes, I could keep my keys in such a container and use them to make a snap shot of the concurrent_hash_map.
E. I needa snap shot fairly seldom so it's OK if it's "heavy stuff" anddegrades other performance while doing it. This would certainly be the result if I need to lock all access while doing a snap shot.

Any clever ideas about this would be greatly appreciated!
0 Kudos
5 Replies
RafSchietekat
Valued Contributor III
420 Views
"I can well understand 1) and 3) not working because the signature actually requires the source to be const."
That's not what const means. Instead, it is a promise that the callee will not modify the argument. I don't immediately see a reason for the differences you've found, though.

You might store new keys in a concurrent_vector?
0 Kudos
Anton_M_Intel
Employee
420 Views
Quoting - ele123456
A. Is there any practical way of taking "some kind of" snap shot copy of a concurrent_hash_map?
B. Isit correctthat 2) works if the size of the concurrent_hash_map does not change, i.e. only lookups or changes to existing elements. I f I could rely on this fact then I might be able to take a snap shot copy by locking inserts and erases when snap shots need to be taken.
C. If answer to B is no, thenshould I lock all access while making a snap shot?
D. Is there a way of taking a snap shot of any of the other concurrent containers? If yes, I could keep my keys in such a container and use them to make a snap shot of the concurrent_hash_map.
E. I needa snap shot fairly seldom so it's OK if it's "heavy stuff" anddegrades other performance while doing it. This would certainly be the result if I need to lock all access while doing a snap shot.

Any clever ideas about this would be greatly appreciated!

Thanks for the questions. Basically, iterators of concurrent_hash_map are not concurrent, so they are not supposed to work in parallel to any concurrent method. But in fact, only concurrent erase can cause a crash with iterators. Other concurrent methods like insert(), find(), and count() can only reorder the items in the table - as result, the same item may be faced more than once. So, I suggest the following aproach to get a snapshot: 1) Use a lock to synchronize snapshot and erase() only - other methods will be out of this overhead. 2) If the dupcliation of items is an isue for you, use a local hash table as a destination for the snapshot (not necessary concurrent if taking a snapshot is not parallel).

In the next (major) version we will provide alternative hash table implementation with support of _concurrent_ iterators (but without concurrent erase).

[Added later] Also, some items can be missed. See my blog about concurrent traversal.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
420 Views
Do you need exactly consistent snapshot or it's Ok just to iterate over items and copy what we observe?
Consider, first thread inserts item1, then inserts item 2. Concurrently thread 2 takes a snapshot of the map. Is it Ok if snapshot will contain item2 but not item1?

0 Kudos
Anton_M_Intel
Employee
420 Views
Quoting - Dmitriy Vyukov
Do you need exactly consistent snapshot or it's Ok just to iterate over items and copy what we observe?
Consider, first thread inserts item1, then inserts item 2. Concurrently thread 2 takes a snapshot of the map. Is it Ok if snapshot will contain item2 but not item1?

Right, there is no such a guarantee for hash tables even with "concurrent" iterators. The only way to provide this behaviour seems to synchronize transactions using top-level locks.
0 Kudos
ele123456
Beginner
420 Views
Quoting - Dmitriy Vyukov
Do you need exactly consistent snapshot or it's Ok just to iterate over items and copy what we observe?
Consider, first thread inserts item1, then inserts item 2. Concurrently thread 2 takes a snapshot of the map. Is it Ok if snapshot will contain item2 but not item1?


The snap shot doesn't need to be "consistent" (whatever that means in this context). I can live with duplicate items etc (I just drop duplicates). But I can't live with crashes while I take a snap shot.

A for_each method that applies a functor with an accessor argument to concurrent_hash_map elements even in some "inconsistent way" but without crashing would be a much appreciated new feature if it can be done. The method could have full internal control over iteration and and locking.Such a methodwould make it possible to make snap shot copies or other kind of processing on "all" elements. Attribute "all" might notinclude all elements orsome elements might be duplicated.

Refererring to your previous answer, a for_each method could maybe internally handle synchronization between iteration and concurrent erase.
0 Kudos
Reply