Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

affinity_partitioner shared between for loops?

nagy
New Contributor I
473 Views
Is the following a valid use, if the first parallel_for works on data that the second parallel_for will use. I haven't seen any use of affinity_partitionershared between loops, which is why I was a bit unsure.
[cpp]
tbb::affinity_partitioner ap;   
  
std::vector input;
std::generate_n(std::back_inserter(input), 4096, []{return rand();});

std::vector result(input.size();
std::vector result2(input.size();
  
// Do some calculations  
tbb::parallel_for(tbb::blocked_range(0, input.size()), [&](const blocked_range& r)
{
     for(int n = r.begin(); n != r.end(); ++n)
	      result = do_some_calc(input); // temporal store
}
, ap);   
  
 // Do some more calculations
tbb::parallel_for(tbb::blocked_range(0, input.size()), [&](const blocked_range& r)
{
     for(int n = r.begin(); n != r.end(); ++n)
	      result2 = do_some_calc(result);
}
, ap);   [/cpp]

Basicly what I want is that second parallel_for will map its ranges in the same way as the first parallel_for in order to fully utilize the local caches.
0 Kudos
8 Replies
RafSchietekat
Valued Contributor III
473 Views
Two parallel_for loops over identical ranges using a common affinity_partitioner instance will behave as expected even if the loop kernels differ, so any failure to improve performance would be attributable to other reasons that would also occur with identical loop kernels, such as not fitting in cache etc.

So I don't see a problem here, except that you might want to hoist the r.end() out of the loop to allow possibly dramatic optimisation:
[cpp]for(size_t n = r.begin(), n_end = r.end(); n != n_end; ++n)[/cpp]
0 Kudos
Alexey-Kukanov
Employee
473 Views
An example of using the same affinity_partitioner object in different consequent loops exists in our Seismic sample. See ParallelUpdateUniverse in universe.cpp.
0 Kudos
jimdempseyatthecove
Honored Contributor III
473 Views
>>Basicly what I want is that second parallel_for will map its ranges in the same way as the first parallel_for

Why not consider one parallel_for with two enclosed for loops? That will assure the same core completes each loop.

Jim Dempsey
0 Kudos
RafSchietekat
Valued Contributor III
473 Views
"Why not consider one parallel_for with two enclosed for loops? That will assure the same core completes each loop."
The parallel_for and the serial for would be combined together. But can the loop kernels really be taken at face value here, or are they merely standing in for something that does involve a barrier...
0 Kudos
nagy
New Contributor I
473 Views
Yes, it is a simplified sample. I have some stuff between the loops in my real code.
0 Kudos
RafSchietekat
Valued Contributor III
473 Views
That was my assumption...

Some feedback is always nice: how did the for loop header rewrite work out for you?
0 Kudos
nagy
New Contributor I
473 Views
Hi raf
n_end = r.end()
Didn't do any differance, I think my compiler optimizes it to the same code. Atleast it seems that way from the dissasembly.
0 Kudos
RafSchietekat
Valued Contributor III
473 Views
That's nice, of course, but it doesn't always work out like that. Please have a real problem next time. :-)
0 Kudos
Reply