This sounds like an interesting example of parallel_scan. I'd really like to see the final algorithm.
Idon't expect thework imbalancebetween the first andsecond passes to be a problem.For 1 processor, the current implementation does indeed do only the final scan. For 2 processors, it does the pre_scan pass for approximately the second half of the data, though the "approximately" may vary widely. I.e., on average it requires 1.5 passes with 2 processors.