- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Maybe a naive question, but here it is:
Would it be safe to use parallel_invoke to call functions that call parallel_for ?
TIA,
Petros
PS: the called functions contain no memory allocations - if this helps.
Maybe a naive question, but here it is:
Would it be safe to use parallel_invoke to call functions that call parallel_for ?
TIA,
Petros
PS: the called functions contain no memory allocations - if this helps.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it would.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much!
Petros.
Petros.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was going for the shortest answer possible above ("Post description must be a minimum of 14 characters"), but it seems
appropriate to also say that recursive parallelism is
one of the design principles of TBB, with the caveat that you shouldn't
subdivide to the point that parallel overhead becomes significant, so
you might want to experiment with grainsize even if you use the
auto_partitioner, which seems to assume that it has all the threads for
its own use and might subdivide too much when running on a highly
parallel machine with lots of other instances making the same
assumption.
(Added 2012-07-24) To clarify, as far as I understand auto_partitioner would not, e.g., make different initial cuts in one parallel_for() input range when in the presence of other concurrent parallel_for() calls, but when a hypothetical big input range is first divided among several parallel_for() calls each would make smaller initial cuts than with one parallel_for() on the original input range. While that is not necessarily a bad thing because the hypothetical case is just that, it provides less clearance from parallel overhead than when you would reason that there is less use for parallel slack in the individual inner loops when there are more than just a few of them and increase their effective grainsize, which would be done transparently had they been presented together to one big parallel_for(). Just something to think about...
(Added 2012-07-24) To clarify, as far as I understand auto_partitioner would not, e.g., make different initial cuts in one parallel_for() input range when in the presence of other concurrent parallel_for() calls, but when a hypothetical big input range is first divided among several parallel_for() calls each would make smaller initial cuts than with one parallel_for() on the original input range. While that is not necessarily a bad thing because the hypothetical case is just that, it provides less clearance from parallel overhead than when you would reason that there is less use for parallel slack in the individual inner loops when there are more than just a few of them and increase their effective grainsize, which would be done transparently had they been presented together to one big parallel_for(). Just something to think about...

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page