Community
cancel
Showing results for 
Search instead for 
Did you mean: 
michaelmarcin
Beginner
40 Views

concurrent_trie

I'm looking for a concurrent trie implementation to do prefix matching for ids used to identify messages being sent across the components in my application.

The message ids begin in string form and look like a path for example "system/component/feature/action" but I have been hashing them using one of Bob Jenkin's hashing algorithms and storing them in a concurrent_hash_map.

This has served me well for exact matching but the requirements of the messenger component have changed and now it needs to do prefix matching as well.

I figured I would make a trie and store hashes of each directory from the root as nodes in the trie but I'm not very adept and high performance scalable concurrency so I was hoping someone had a similar problem and solved it already.




0 Kudos
4 Replies
Dmitry_Vyukov
Valued Contributor I
40 Views

MichaelMarcin:

I'm looking for a concurrent trie implementation to do prefix matching for ids used to identify messages being sent across the components in my application.


Is trie modified while program running? If strings represent *types* of messages, than this information can be constant, I think.
How frequently trie gets modified?
How many threads modify trie?
What is the size of trie? Number of strings, length of strings, size of keys?

michaelmarcin
Beginner
40 Views

Message ids can be generated at runtime. Objects can subscribe/unsubscribe from the messenger dynamically which would mutate the trie. In general after initialization modifying the trie is expected to be much less frequent than reading it. The messenger can be accessed mutated concurrently from as many threads as the hardware can run concurrently. I would expect the trie to contain around a thousand keys but I have no evidence to back that up. Current ids are not limited in size or complexity but currently we have ids from 3 to 40 characters with 1 to 4 parts for prefix matching.
Dmitry_Vyukov
Valued Contributor I
40 Views

MichaelMarcin:
Message ids can be generated at runtime. Objects can subscribe/unsubscribe from the messenger dynamically which would mutate the trie. In general after initialization modifying the trie is expected to be much less frequent than reading it. The messenger can be accessed mutated concurrently from as many threads as the hardware can run concurrently. I would expect the trie to contain around a thousand keys but I have no evidence to back that up. Current ids are not limited in size or complexity but currently we have ids from 3 to 40 characters with 1 to 4 parts for prefix matching.



I don't have off-the-shelf concurrent trie implementation. I can only made some suggestions about implementation.

First of all, very good implementation can be as follows. When thread have to modify trie, it make a copy of current trie state, then apply modifications to copy, and then atomically replace current whole trie with new version. This way readers are wait-free, and don't have to acquire any mutexes. Ideally readers will not execute any atomic RMW operations at all, and don't modify any shared data.
Obvious tradeoff is that writers have to make the copy of whole structure. So suitability of this approach depends on size of the structure and frequency of modifications. From your description it seems that this approach is not very suitable.

Then, good approach is to just wrap single-threaded trie with reader-writer mutex.

If you want really low overhead and linear scalability on high number of cores, then it's better to employ first variant but substantially finer-grained version. I.e. writeres make atomic replacement of *parts* of trie.
Here is description of basic Partial Copy On Write technique:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7d202e726424c0bf
As GC (PDR) you can use SMR (hazard pointers), RCU, ROP, VZOOM, Proxy-Collector or any other algorithms.

michaelmarcin
Beginner
40 Views

Thanks for the reference material and suggestions I'll read through them and consider an implementation.

A shared mutex over a single threaded trie might be enough for now. The project is going to take over a year to complete so perhaps someone will release a high quality free implementation of a scalable trie by then.
Reply