- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

[cpp]tbb::affinity_partitioner ap; std::vectorinput; std::generate_n(std::back_inserter(input), 4096, []{return rand();}); std::vector result(input.size(); std::vector result2(input.size(); // Do some calculations tbb::parallel_for(tbb::blocked_range (0, input.size()), [&](const blocked_range & r) { for(int n = r.begin(); n != r.end(); ++n) result = do_some_calc(input ); // temporal store } , ap); // Do some more calculations tbb::parallel_for(tbb::blocked_range (0, input.size()), [&](const blocked_range & r) { for(int n = r.begin(); n != r.end(); ++n) result2 = do_some_calc(result ); } , ap); [/cpp]

Basicly what I want is that second parallel_for will map its ranges in the same way as the first parallel_for in order to fully utilize the local caches.

Link Copied

8 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

So I don't see a problem here, except that you might want to hoist the r.end() out of the loop to allow possibly dramatic optimisation:

[cpp]for(size_t n = r.begin(), n_end = r.end(); n != n_end; ++n)[/cpp]

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Why not consider one parallel_for with two enclosed for loops? That will assure the same core completes each loop.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*"Why not consider one parallel_for with two enclosed for loops? That will assure the same core completes each loop."*

The parallel_for and the serial for would be combined together. But can the loop kernels really be taken at face value here, or are they merely standing in for something that does involve a barrier...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Yes, it is a simplified sample. I have some stuff between the loops in my real code.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Some feedback is always nice: how did the for loop header rewrite work out for you?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

n_end = r.end()

Didn't do any differance, I think my compiler optimizes it to the same code. Atleast it seems that way from the dissasembly.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page