Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

blocked_range

RafSchietekat
Valued Contributor III
354 Views

Why is there no documented requirement that Value support assignment? Even private method blocked_range::do_split, which explicitly comments that it doesn't want to impose assignment, assigns one Value to another ("r.my_end = middle;"). (BTW, "auxiliary" is misspelt on tbb22_20091101oss/include/tbb/blocked_range.h:111.)

Conversely, why the need to be able to add a size_t?

Are non-integral template arguments allowed?

How relevant is the support for heterogeneous dimensions for blocked_range2d and blocked_range3d? Would it be missed if not available?

Why are sizes in the different dimensions compared relative to the respective grainsizes to decide which dimension to split next? Is it for code simplicity? Or is there another reason to not disregard grainsize ratios or have another set to decide about splitting?

0 Kudos
4 Replies
ARCH_R_Intel
Employee
354 Views

The specification should indicate that operator= is required. The size_t requirement was true of an old version of TBB, but is no longer required. The real requirement is that the statement "Value middle = r.my_begin + (r.my_end-r.my_begin)/2u;" work, where all the my_ fields are of type Value. I'll fix these issues in the Reference.

Spelling of "auxiliary" now fixed.

Thanks for pointing out these issues.

Every time I have used blocked_range2d, it has been in a heterogeneous context. I cannot speak for others.

Decomposing into non-square subregions is important sometimes because of anisotropic communication. In particular, cache lines typically run along one axis, and not the other. In such cases, it may pay to decompose into regions that are squarish in units of the number of cache lines along each each dimension, not squarish in units of the number of logical elements.

0 Kudos
RafSchietekat
Valued Contributor III
354 Views
"Decomposing into non-square subregions is important sometimes because of anisotropic communication. In particular, cache lines typically run along one axis, and not the other. In such cases, it may pay to decompose into regions that are squarish in units of the number of cache lines along each each dimension, not squarish in units of the number of logical elements."
Hmm, that would only work if the critical dimension is a power of two times the size of a cache line, wouldn't it? Otherwise you would have to bring your own range type, it seems. And any multiple of the cache line would do, plus you'd probably want to get a bigger size to reduce parallelism overhead anyway, preferably within cache size limits. I guess if you want to keep circumference or surface area to a minimum (isn't that a concern in several physics problems?), you just have to choose identical grainsizes and make sure that the critical dimension is that power of two times a cache line? So it still seems to be mainly about code simplicity to me.
0 Kudos
RafSchietekat
Valued Contributor III
354 Views
java -jar Mandelbrot.jar
0 Kudos
RafSchietekat
Valued Contributor III
354 Views

Why doesn't parallel_for just split off tasks in a loop until it gets to something it doesn't want to subdivide, instead of making a tree?

@Override public void run() {
setReferenceCount(0/*child tasks*/ + 1/*waitForAll()*/);
while(this.range.isDivisible()) {
incrementReferenceCount();
new ParallelForTask(this, range.split(), body).spawn();
}
body.run(range);
waitForAll();
}

(Requires Java 6 for some layout issue:) java -jar Mandelbrot.jar

0 Kudos
Reply