The application would be to

rnickb · ‎04-25-2015

Would guys be open to extending TBB with a general-purpose blocked_range_nd class that uses c++11 variadics and works for any dimensionality?

RafSchietekat · ‎04-26-2015

What application do you have in mind for that? The main purpose of higher-dimensionality blocked ranges, it seems to me, is to minimise chunk boundaries, but most applications for that seem to be in 3-dimensional space or lower. Would there be a use for this in space-time perhaps, or something else? Or would you be able to abstract over the number of dimensions currently in use?

(2015-04-29 Added) Don't misunderstand, I'm all for creativity, so please go ahead, but also often less is more, so it wouldn't be useful to add this to TBB if it weren't... useful. To clarify that further, if you just have a bunch of dimensions in a collection, then you may still be able to linearise it and use plain old blocked_range without any performance penalty, as long as you have no need to exchange information on boundaries, in contrast to, e.g., computing how heat spreads through a volume where you do want to maximise chunk volume against chunk surface area, but I don't see how that would be needed in 4 dimensions or higher. It might even be counterproductive in situations where nesting would be more appropriate, e.g., a nested loop with the outer level over situations and/or specimens and the inner level over volume would be far better than a single range over 4 dimensions. If it's just about convenience, then I would still argue that it would be better to use an existing or create a new iterator-based solution and use that with a blocked_range, because you're going to have to iterate in the base case anyway; unfortunately the Index-only parallel_for overloads assume that Index is an integral type, otherwise you wouldn't need a Range at all.

rnickb · ‎04-29-2015

The application would be to support parallel iteration over an arbitrary dimensioned space. It's not at all uncommon to have data that you want to break down and compute over many dimensions. For example, suppose I have a collection of sales data. I might want to compute statistics broken down by quarter, region, product category, time-of-day, etc. I should be able to write something like this

execute(make_shape(NumQuarters, NumRegions, NumProductCategories,                  
                   NumTimeBuckets),                                                
        [](int quarter_index, int region_index, int product_category_index,        
           int time_bucket_index) {                                                
          // compute statistics                                                    
        });

Not sure I understood all of your points, but I'll try to address.

often less is more, so it wouldn't be useful to add this to TBB if it weren't... useful.

I agree. And in this case, an arbitrarily dimensioned blocked range would be both "less" in the sense that the 2 and 3 dimensional cases would be handled by the same code instead of a special cases; and "more" in the sense that it could easily support iteration over higher dimensioned spaces.

if you just have a bunch of dimensions in a collection, then you may still be able to linearise it and use plain old blocked_range without any performance penalty, as long as you have no need to exchange information on boundaries

Do you mean to use a blocked range over 0, to Dim1*Dim2*...*Dimk, then use % and / to reconstruct the k-dimensional index? You could, but why would that be a better solution than providing a library solution that both simplifies the existing API as well as makes it easy to iterate over higher dimensioned space without requiring users to write code that messes about with index conversions.

RafSchietekat · ‎04-30-2015

Your example uses type "int" in all dimensions. If you really want to have something generally useful, how about supporting different types in different dimensions, if possible? Enum types immediately come to mind.

What I'm proposing to linearise the space is not to let the user mess about with computations, but to create a new iterator type that hides all that, rather than a new range type. It should be a random-access iterator, of course. In the base case you could then simply use a range-based for statement to execute the current chunk, whereas with a new range type you would still have to find a way to iterate over all the cells. You could do that with an iterator of an internal type to be able to use the range-based for statement (this is C++11 after all), unless you want to force the user to use nested loops there, but why not just concentrate on the new iterator type and reuse the existing blocked_range, instead?

You would also have a new iterator type that could be applied separately from TBB, and "more general" seems like a good thing.

I'm not saying that it is clearly superior or anything, but there do appear to be some reasons to at least consider this alternative, unless you obviously need recursive subdivision of hyperboxes anyway, perhaps because you prefer nested loops in the base case. Maybe the computations are just simpler if you only have to subdivide a single dimension each time, that's also quite possible.

RafSchietekat · ‎05-08-2015

"unfortunately the Index-only parallel_for overloads assume that Index is an integral type, otherwise you wouldn't need a Range at all"

It's easy enough to adapt those parallel_for overloads, but you could already use parallel_for_each() for that. I'm not sure how well it performs against parallel_for() when using random-access iterators, c/o parallel_do()'s implementation, but it should be fairly close. Has anybody compared the two yet?

Of course, it may still be the case that a range type is more efficient, and perhaps both could then be provided so that the user might choose between simplicity and efficiency.

More importantly: are you making any progress with this, using either approach?

(Edited) Removed proof of concept for modified parallel_for() because parallel_for_each() also seems to do the job.

rnickb · ‎05-08-2015

Yes, I wrote a KBlockedRange that extents the 2 and 3-dimensional blocked ranges to the general case.

https://github.com/echo-ml/execution_context/blob/master/include/echo/execution_context/tbb/blocked_range.h

Also, wrote functions that let you do execution and reductions of arbitrary arity functions over such ranges

https://github.com/echo-ml/execution_context/blob/master/include/echo/execution_context/tbb/expression_executer.h

Though, in my case, this is specific for evaluating expression-templates, so it's a little different than what a more general-purpose framework would be. Example usage:

https://github.com/echo-ml/execution_context/blob/master/benchmark/tbb_benchmark.cpp

Making a multiple-dimensional iterator efficient would require something like Austern's segmented iterator concept (http://lafstern.org/matt/segmented.pdf)

While I think that's a useful abstraction, the case when you're dealing with integral indexes is sufficiently common that I think it warrants special treatment.

general purpose blocked_range_nd