Parallel operations in a B-Tree using TBB

juliusmenb · ‎03-25-2009

Hi, I have been trying for some parallel operations in B-Tree, which I use to construct a database engine, and I have many doubts especially with the recursion, so I vant to ask advice on how I should parallel the basic operations(recursive) of B-Tree (search, split, merge, ...)
Thank in advance.

robert_jay_gould · ‎03-25-2009

Quoting - juliusmenb

Hi, I have been trying for some parallel operations in B-Tree, which I use to construct a database engine, and I have many doubts especially with the recursion, so I vant to ask advice on how I should parallel the basic operations(recursive) of B-Tree (search, split, merge, ...)
Thank in advance.

Did TBB have a B-Tree?

Anyways for an on-memory repository I've used TBB's concurrent hash with great results, but I was working with about 1.000.000 entries. So it depends on your use-cases. If its similar to mine just try out the hash and that's all you need to do.

Otherwise B-Tree's can be spilt-up into N subtrees, and processed in parallel that way. For searching you can have several threads try different subsegments, and have the one that finds the result first return. Splittingis trivial depending on the split, and merge is complicated if you need to reorder the tree.

Do you have more specific information about your project and goals?

RafSchietekat · ‎03-25-2009

An in-memory B-tree? Whatever will they dream up next... a Judy array perhaps? :-)

jimdempseyatthecove · ‎03-26-2009

When your database is disk based then the majority of time is waiting for I/O to read in the next bucket. Once read in the number of keys in the bucket is generally small (few) perhaps 100-500 or so depending on database design. Due to the relatively small numbers of keys to search, parallelizing the bucket search may be counter productive.

What is, or may be, productive will depend upon your platform. When the database is managed by a single system with several processors, each with several cores, then the near-root level buckets can be distributed into the cache systems of specific cores. Then if you have affinity based scheduling, and know which cores are likely to have the bucket(s) or branch of interest in there cache, then you would schedule the search to be run on that thread (or groups of threads sharing the particular cache system). Unfortunately TBB does not provide this level of thread scheduling.

This said, you could (may be able to)use the TBB concurrent_hash_map to locate the leaf (or nearby branch) within your B-Tree and thus reduce the tree search time (by bypassing the first several bucket searches).

Jim Dempsey