Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

need something like a sorted parallel_do

Peter_F_1
Beginner
421 Views

I need something like a sorted parallel_do. As it looks like concurrent_priority_queue fits. But I would have to deal with thread pools myself, would't I?

0 Kudos
8 Replies
Peter_F_1
Beginner
421 Views
The example uses task_group. But to use this I would have to count the running tasks myself and would have to know the number of CPUs.
0 Kudos
RafSchietekat
Valued Contributor III
421 Views

parallel_do() is an algorithm, and concurrent_priority_queue is a data structure.

You can run parallel_do() from a concurrent_priority_queue.

0 Kudos
Peter_F_1
Beginner
421 Views

I did not see any interface of parallel_do where I can pass the container to be used for the parallel_do_feeder<item_type>::add implementation.

0 Kudos
Peter_F_1
Beginner
421 Views

ok I think I found a solution:

I use a dummy container to be passed to parallel_do -- the contents don't matter -- so one could use boost::integer_range. The range must be of the same size like the number of elements contained in the concurrent_priority_queue before the parallel_do() is invoked. Inside the body function I call tbb::parallel_do_feeder::add(dummy_argument) after adding the real argument to the concurrent_priority_queue. At the start of the body function I pop one argument from the concurrent_priority_queue -- try_pop() must succeed. Again the argument passed to the body is ignored.

Understandable?

Can anybody come up with a solution, which does not require the unused container hidden inside parallel_do?

0 Kudos
RafSchietekat
Valued Contributor III
421 Views

The easiest solution is with a small adapter to the unfortunately "deprecated" parallel_while(), translating pop_if_present() or somesuch to the container's try_pop().

I would not recommend relying on any reported current length of the container: unless I am mistaken, there is no assurance that this length does not overestimate the actual number of elements present, so try_pop() might conceivably fail. It could be that I just didn't get the point, but there should also be no need to resort to the feeder.

I recommend the easier solution for now. You can always come up with something better if and when parallel_while() is removed, and by that time the development team may have provided, or "somebody" may have contributed, an easier solution.

0 Kudos
Alexey-Kukanov
Employee
421 Views

Raf Schietekat wrote:

The easiest solution is with a small adapter to the unfortunately "deprecated" parallel_while(), translating pop_if_present() or somesuch to the container's try_pop().

Raf, I'm not sure why you think that deprecating parallel_while is unfortunate. It's functionality is covered by either parallel_do or a simple two-stage parallel_pipeline, unless someone wants both an input stream without iterators and the possibility to add work on the fly. And even that can possibly be worked around if on-the-fly work could be added to the stream directly, or e.g. to a concurrent_queue shared between the input filter and the processing filter.

My recommendation for "something like a sorted parallel_do" is a combination of parallel_pipeline with concurrent_priority_queue. The queue defines the order; the first pipeline stage pops items out of the queue in that order, and the second pipeline stage processes the items. And in the case when some work needs to be added on the fly, it might be added to the initial queue, with some extra care necessary as the input filter might already take everything out of the queue and signal the pipeline to stop; e.g. a loop around the pipeline that checks if the queue is empty, and re-runs the pipeline if it's not, would be sufficient.

0 Kudos
Anton_M_Intel
Employee
421 Views

Another alternative is a combination of parallel_for with concurrent_priority_queue if the latter is not filled concurrently to execution. In this case, you know the number of items in the container and can run parallel_for(0, pq.size(), body);

0 Kudos
RafSchietekat
Valued Contributor III
421 Views

I think that it is easier to work with parallel_while() than with parallel_do(). For some, inventing an ad-hoc iterator is trivial, but it makes me at least scratch my hair, and for others it will be a big or insurmountable hurdle. This is a prime example: the simplest glue is an fairly trivial class sitting between concurrent_priority_queue and concurrent_while().

A pipeline would be enough here (no feeder needed), but not in the general case. Also, in the general case, feeding back into a concurrent_queue would be bad for performance because new items should probably be processed while they are still hot in cache (curiously, there's no concurrent_stack).

Maybe you could ship an input_iterator with concurrent_priority_queue, but that still leaves other uses for parallel_do() unaddressed. There should at least be example code in the Reference Manual to construct an ad-hoc input iterator (copy&paste plus modify), but it won't be minimal. Maybe it's possible to provide an input iterator that is a class template that takes a user-provided function that would call try_pop() or whatever has to be done?
0 Kudos
Reply