MADadrobiso:Googling "parallel particle simulation" yields a lot of prior work. My impression is that space-based partitions are popular (i.e. exploiting the "light cone" property). Can you explain the tree structure in more detail? For example,if Y is a child of X in the tree, what does that mean in your tree?
Thanks for the detailed explanation. With real-time graphics, would make a flashy demo. (Aside: This reminds me of the time Martin Gardner wrote about how the devil plays pool and starts with a triangle that's 36 balls on a side, for a total of 666 balls. Nevada is the 36th state and there are 36 combinations of two dice.)
By the way, the CBT is providing functionality that is often known as a "priority queue". Wikipedia has an article with that title which seems to cover the common implementations.
You might try to apply time warp ideas or some other form of speculative execution. Here's a link to one article on time warp. (I've never used time warp, but studied it a little bit back in the late 80's as background for my thesis.) E.g., serially grab P events from the top of the tree, see if they are sufficiently separated ("space-like" in relativity parlance) and process them in parallel.
We're tried coding various concurrent priority queues/heaps in the past, and found them to be slower than the equivalent heap with a big fat lock around it. The overhead of atomic operations, and contention at the root, have been the problems. Perhaps what you need is not a tree, but a "shrub" with multiple roots, and somehow percolate stuff through the shrub so that "space-like" events end up at different roots?
Your colliding particle (sphere) application sounds like an interesting project. It is kind of the inverse effect of what I am working on. The research work that I am involved with uses a Finite Element simulation of tethers and objects (Space Elevator work). The tethers are composed of beads (mass, position, velocity, acceleration and a few other properties), connected by segments (material properties of the tether elastic area, length, length rate, Youngs Modulus, spring damping constant, and many other attributes). The objects are additional bodies (planets, moons, satellites, spacecraft/equipment attached to the tether).
The simulation has to handle the situation where a tether goes slack. In which case tension of the slack segment goes to zero. The simulation used to have a fixed time integration step size. This proved to be untenable when a segment transitioned from slack to taunt as well as taunt to slack. This transition is quite similar to your collision of spheres in that if your integration time is fixed and you step to far into the collision, the recoil from the collision is akin to a Flubber effect whereby more energy comes out of the collision than when into the collision.
The solution to both problems (my tether and your spheres) is to vary the integration step size such that if a collision were to occur that the step size coincided with the approximate minimal intrusion of the interface within the precision of the calculation. In my case I couldnt go below 1.0E-15 second but this is dependent on other factors in the model. (i.e. yours may be different).
In your simulation it was not disclosed if the interaction was only collision or if the interaction was due to additional properties (heat, electrostatic, gravitational, viscosity, etc). If collision only then it is minimumtime-of-flight. If the other interaction properties are involved then it would be the minimum of time of flight with whatever the desired fidelity of the other interactions was required.
The approach I would like to suggest, for the lack of a better term, is a collection of bean bags type of approach. Beans being particles and bean bag being a collection of beans to work with. The number of bean bags could potentially be dynamic but will likely gravitate towards a particular number for the given problem. The number of beans per bag is not static but will vary from time to time dependent on computational loads. The problem solving portion of the application would tend to have as many threads as there are available cores.
The complete algorithm is a bit too complex to list here but the general idea is to place the beans in a hierarchy dependent on minimum time-of-flight to collision. In the same bean bag, different bags owned by the same thread, bag owned by other thread and finally bag in free queue. The simulation advances through a process collisions, shuffle beans, shuffle bean bags, determine step size.
We could discuss this off thread if you desire
I don't remember how much slower the concurrent priority queues were. I just remember they were slower than using a plain priority queue with a spin lock around it. So you might implement the atomic priority queue by using a a plain priority queue with a spin lock around it.
If N is relatively small, it might be possible to continuously process N events in flight instead of taking them out in chunks of N. The logic for each thread would be:
Both the priority queue and the "in flight" set would have to be atomic or protected by a lock. There's nothing "off the shelf" in TBB to deal with this kind of scheduling, though I suspect that it can be done in TBB with a little effort (e.g. the logic resembles class tbb::pipeline somewhat, but not closely enough to use tbb::pipeline). It may be simplest to prototype it in pthreads, using condition variables to implement step 2.
If it works out well, then you could consider reworking it TBB-style, which would involve implementing the "wait" as parking a continuation task that would be spawned by the thread that removes the blocking "time-like" event in step 5. That typically achieves much lower overhead than condition variables, but is muchmore work, so I'd prototype using pthreads and condition variables first.