Looking for suggestions on the best way to efficiently time things out of the concurrent hash map. My scenario:
1.) I have N threads spending > 90% of time reading from the hash map (and N threads spending ~10% of time writing to the hash map)
2.) Ideally, the readers need to update each entry with a last accessed time
3.) After a specified timeout, the old entries in the map should be removed.
4.) One last biggie, there are around 1 or 2 million entries in the map.
I have some thoughts on how to proceed with this design, however, I'm interested if the community can think of any slick ways to pull this off without requiring too much exclusive access to the elements.
Thanks for your input.
Erasing from the hash map requires locking of an element anyway (since there are threads which read and write concurrently). So, you need to get a write accessor to the element, [double] check its time-stamp and call erase(/*by accessor*/) which avoid additional locking.
However, you might want also to avoid locking of each element in the map while traversing. It is possible if there is the only thread which actually erases elements during the scan. Traversing the map is actually safe if no deletion can occur concurrently. I plan to write a blog about parallel traversal of concurrent_hash_map. But I think all the ideas can already be found on this forum.
And just for our information, could you describe please what kind of application you develop?
I was planning on having a reaper thread that traversed the map periodically to delete the "old" flows. The problem as I see it, is that the readers would need access to update timestamps of elements on each read, therefore requiring write access and killing my performance. That's the heart of the issue I would appreciate feedback on.
Thanks in advance.
Humm... well, you must have some other useful information associated with a flow, that you read/write during processing of each packet anyway. No?
If no, then what's the point of maintaining timestamp just to clean up them later?..
Or perhaps, you do have some other useful information, but it's only read during processing most of the time. And timestamps need to be written, and that causes problems?
Please, provide more detailed info (sizes of objects, workload characteristics, etc).
Btw, can you provide more info on this? Can you describe as to how much it kills performance? It's just for my personal collection.
I searched the forum, but I can't find any info on where it states that traversal of the map is "safe" as long as deletes do not concurrently occur. Can you point me to a reference? Thanks.
concurrent_hash_map is currently not documented as safe for concurrent insertion and traversal, though as Anton says, there may be a way to do it.
The new concurrent_unordered_map in the latest open source release is specifically designed to allow concurrent insertion and traversal (but not concurrent erasure). It's inteface is very similar to the C++0x unordered_map.