MichaelMarcin:Message ids can be generated at runtime. Objects can subscribe/unsubscribe from the messenger dynamically which would mutate the trie. In general after initialization modifying the trie is expected to be much less frequent than reading it. The messenger can be accessed mutated concurrently from as many threads as the hardware can run concurrently. I would expect the trie to contain around a thousand keys but I have no evidence to back that up. Current ids are not limited in size or complexity but currently we have ids from 3 to 40 characters with 1 to 4 parts for prefix matching.
I don't have off-the-shelf concurrent trie implementation. I can only made some suggestions about implementation.
First of all, very good implementation can be as follows. When thread have to modify trie, it make a copy of current trie state, then apply modifications to copy, and then atomically replace current whole trie with new version. This way readers are wait-free, and don't have to acquire any mutexes. Ideally readers will not execute any atomic RMW operations at all, and don't modify any shared data.
Obvious tradeoff is that writers have to make the copy of whole structure. So suitability of this approach depends on size of the structure and frequency of modifications. From your description it seems that this approach is not very suitable.
Then, good approach is to just wrap single-threaded trie with reader-writer mutex.
If you want really low overhead and linear scalability on high number of cores, then it's better to employ first variant but substantially finer-grained version. I.e. writeres make atomic replacement of *parts* of trie.
Here is description of basic Partial Copy On Write technique:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7d202e726424c0bfAs GC (PDR) you can use SMR (hazard pointers), RCU, ROP, VZOOM, Proxy-Collector or any other algorithms.