Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

Parameter (Grainsize) doesn't take effect in Template <parallel_for>

Rex_X_
Beginner
223 Views

When I set the grainsize to 100/200/500/1000, results are the same.

...

parallel_for(blocked_range<size_t>(0, to_scan.size(), 200), //i changed the parameter to 100/200/500/1000
SubStringFinder( to_scan, max, pos ));

...

the output ranges as follows, it seems that the grainsize is more than 1000 and the parameter doesn't take effect at all.

...

[
[8855,9962)
0,1106)

[4427,5534)

[2213,2766)

...

So what's wrong with it? Thank you for your help.

Tbb version is 3.0; Processor: Intel i5 2450M; OS: Win7

0 Kudos
3 Replies
RafSchietekat
Valued Contributor III
223 Views
Be glad, that's a good thing! A range is splittable when its size is at least grainsize (so that's a bit of a misnomer if you ask me) and larger than 1 (of course). By default, grainsize is 1, and if you use that with simple_partitioner (which will always split every splittable range), that's also the chunk size you'll get (each element executed by itself). In earlier versions, simple_partitioner was the default (or even the only possibility), and you would have had to tune grainsize to avoid overhead from too many execute() function calls (maybe even each in a separate task for lack of recycling opportunities?) and get a more efficient execution during each call through automatic loop unrolling or even vectorisation. Now the default is auto_partitioner, and you'll automatically get more appropriately chosen chunk sizes without having to care about grainsize at all (just leave it at the default value), although the same splittability rule still applies as an additional constraint. There is an important assumption for auto_partitioner: execution time should not be distributed too unevenly, because then the division algorithm may not be able to provide enough parallel slack for acceptable latency. In such a case, do use an explicit simple_partitioner, and choose the right grainsize. But most of the time you don't have to consider that at all.
0 Kudos
Rex_X_
Beginner
223 Views
thank u, Raf
0 Kudos
RafSchietekat
Valued Contributor III
223 Views

Er, an off-by-1 error on my part: a range is splittable not when its size is "at least" grainsize but when it is "bigger than" grainsize (which also means that 1 is not a special case). I occasionally mix this up, but I don't know what to think about it that nobody else noticed this first?

What still makes it a misnomer in my opinion is that a range smaller than 2*grainsize would be split into subranges at least one of which will be smaller than grainsize, and a typical chunk is smaller than grainsize (with simple_partitioner anyway, and often also with auto_partitioner).

0 Kudos
Reply