- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm looking for a concurrent trie implementation to do prefix matching for ids used to identify messages being sent across the components in my application.
The message ids begin in string form and look like a path for example "system/component/feature/action" but I have been hashing them using one of Bob Jenkin's hashing algorithms and storing them in a concurrent_hash_map.
This has served me well for exact matching but the requirements of the messenger component have changed and now it needs to do prefix matching as well.
I figured I would make a trie and store hashes of each directory from the root as nodes in the trie but I'm not very adept and high performance scalable concurrency so I was hoping someone had a similar problem and solved it already.
The message ids begin in string form and look like a path for example "system/component/feature/action" but I have been hashing them using one of Bob Jenkin's hashing algorithms and storing them in a concurrent_hash_map.
This has served me well for exact matching but the requirements of the messenger component have changed and now it needs to do prefix matching as well.
I figured I would make a trie and store hashes of each directory from the root as nodes in the trie but I'm not very adept and high performance scalable concurrency so I was hoping someone had a similar problem and solved it already.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MichaelMarcin:
I'm looking for a concurrent trie implementation to do prefix matching for ids used to identify messages being sent across the components in my application.
Is trie modified while program running? If strings represent *types* of messages, than this information can be constant, I think.
How frequently trie gets modified?
How many threads modify trie?
What is the size of trie? Number of strings, length of strings, size of keys?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Message ids can be generated at runtime. Objects can subscribe/unsubscribe from the messenger dynamically which would mutate the trie. In general after initialization modifying the trie is expected to be much less frequent than reading it. The messenger can be accessed mutated concurrently from as many threads as the hardware can run concurrently. I would expect the trie to contain around a thousand keys but I have no evidence to back that up. Current ids are not limited in size or complexity but currently we have ids from 3 to 40 characters with 1 to 4 parts for prefix matching.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MichaelMarcin:Message ids can be generated at runtime. Objects can subscribe/unsubscribe from the messenger dynamically which would mutate the trie. In general after initialization modifying the trie is expected to be much less frequent than reading it. The messenger can be accessed mutated concurrently from as many threads as the hardware can run concurrently. I would expect the trie to contain around a thousand keys but I have no evidence to back that up. Current ids are not limited in size or complexity but currently we have ids from 3 to 40 characters with 1 to 4 parts for prefix matching.
I don't have off-the-shelf concurrent trie implementation. I can only made some suggestions about implementation.
First of all, very good implementation can be as follows. When thread have to modify trie, it make a copy of current trie state, then apply modifications to copy, and then atomically replace current whole trie with new version. This way readers are wait-free, and don't have to acquire any mutexes. Ideally readers will not execute any atomic RMW operations at all, and don't modify any shared data.
Obvious tradeoff is that writers have to make the copy of whole structure. So suitability of this approach depends on size of the structure and frequency of modifications. From your description it seems that this approach is not very suitable.
Then, good approach is to just wrap single-threaded trie with reader-writer mutex.
If you want really low overhead and linear scalability on high number of cores, then it's better to employ first variant but substantially finer-grained version. I.e. writeres make atomic replacement of *parts* of trie.
Here is description of basic Partial Copy On Write technique:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7d202e726424c0bf
As GC (PDR) you can use SMR (hazard pointers), RCU, ROP, VZOOM, Proxy-Collector or any other algorithms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reference material and suggestions I'll read through them and consider an implementation.
A shared mutex over a single threaded trie might be enough for now. The project is going to take over a year to complete so perhaps someone will release a high quality free implementation of a scalable trie by then.
A shared mutex over a single threaded trie might be enough for now. The project is going to take over a year to complete so perhaps someone will release a high quality free implementation of a scalable trie by then.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page