Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2477 Discussions

Pipeline - number of instances running in parallel

Nav
New Contributor I
702 Views
I was running the pipeline text filter from the example programs that are provided with TBB. I noticed that two instances of MyTransformFilter, two instances of MyInputFilter and two instances of MyOutputFilter are created.
1. How can that happen, when MyInputFilter and MyOutputFilter are serial? Shouldn't these two classes have only one instantiation?
2. Is it the pipeline which decides to instantiate two instances? If so, then what is the significance of the parameter passed to pipeline.run(aNumber)? I don't see aNumber assisting in the parellelism. aNumber just seems to be acting as the size of the buffer for MyInputFilter, which doesn't make sense because MyInputFilter is supposed to be serial.
3. When I created my own pipeline program using a similar logic of reading strings from a text file, only one instance of all three classes were created.

Am I missing something here? I don't understand when the void* operator()(void*) function gets called. Is it called whenever one filter finishes a task and returns a token? How then, does operator get parallelised?. (been through the reference manual, pipeline blogs and getting started tutorial, but it didn't provide the info I needed). Help?
0 Kudos
11 Replies
Nav
New Contributor I
702 Views
Erm........help?
0 Kudos
RafSchietekat
Valued Contributor III
702 Views
1. run_pipeline() is called twice if !is_number_of_threads_set, but each time there is only a single instantiation, as expected. Is that what you saw?
2. The run() parameter is the maximum number of data items in flight through the pipeline at any one time, and it is independent of the number of filters. The input filter, serial or parallel, will not accept any more data items when the maximum is reached, until at least one data item leaves the pipeline.
3. As expected.

A filter's operator gets called for each data item that passes through it. A filter doesn't do something to a task, it's a task that executes a filter on a data item by calling the filter's operator. The task guides a data item through the pipeline (at least last time I looked at the implementation it did).
0 Kudos
Nav
New Contributor I
702 Views
Had kept a static int instanceCount; to see the number of instances created. You're right; it was because run_pipeline() was called twice.
When you say that the filter doesn't do something to a task, that's something I can understand what you meant, wrt how data moves from filter to filter. But for example in MyTransformFilter's filter which is a parallel filter, there would obviously have to be a logic which decides to create n instances of the filter which would process data in parallel. How else could data be processed parallelly if operator is not invoked parallelly? Does the filter have this logic within it? If so, how can I see how many instances of data/tokens are running parallelly at a given point of time?

[ p.s: I apologise for the naive questions. Am new to C++ syntax and multithreading. Am finding it a bit difficult to follow TBB concepts coz of only having the time to intermettently learn TBB ]

0 Kudos
RafSchietekat
Valued Contributor III
702 Views

A parallel filter still only has one instance, which is invoked in parallel.

0 Kudos
Nav
New Contributor I
702 Views
Okay, so what is it that runs in parallel? Is it 'operator'? How does the decision take place to run n tokens in parallel? I'm assuming that when a serial filter passes on a token to a parallel filter, and another token is given before the first one is processed, the parallel filter runs another 'instance' of operator() to process the new token. Is that how it works? But running multiple instances of operator() would require multiple instances of the filter to be created, right?
0 Kudos
ilelann
Beginner
702 Views
I think you're wrong when you assume that running your filter twice in parallel requires two filter objects.
See :
struct filter
{
double operator()(double) const
{
// 2/ (see below)
}
private:
// 1/ (see below)
};

void simulates_tbb_parallel_run_with_two_tokens()
{
filter one_filter;
std::thread t1 (one_filter.operator());
std::thread t2 (one_filter.operator());
}

operator() is executed twice in parallel on the very same filter object.
For data accessed in operator(), that means :
1/ member data held by filter structure are shared among two calls and should be thread safe.
2/ local variables declared in operator() are not shared and require no thread safety.

Regards,
Ivan


0 Kudos
RafSchietekat
Valued Contributor III
702 Views
operator() is not instantiated, it's just like a member function (with an implicit "this" parameter) but with special syntax, and so can be called concurrently from multiple threads on the same "this" instance (provided its implementationis thread-safe of course).
0 Kudos
Nav
New Contributor I
702 Views
Thanks Ivan. Thanks Raf.
So different threads handle operator(). The last question to be clarified is whether the code in operator() is thread safe or not. Both of you seem to be saying the opposite of each other. To me it seems that the code is not thread safe by default.
0 Kudos
RafSchietekat
Valued Contributor III
702 Views

It's your responsibility to make operator() thread-safe, and I didn't see any discrepancies, just a different emphasis. Well, I wouldn't have written "require no thread safety"...

0 Kudos
jimdempseyatthecove
Honored Contributor III
702 Views
Nav,

A filter is code, there is one instance of the code.
A pipeline token is a context which generally contains a data buffer (e.g. what you read into, process, then write out).
A task is the pipeline token thrown into the filter code.
The degree of parallelization you attain depends on:
1) number of available threads (inclusive of threads running this pipeline)
2) number of tokens

The parallelization is the min of the above. (assuming filter has significant work and will not complete prior to starting next task).

Simplified example:

Assume you have 8 HW threads in TBB thread pool
Assume3 threads are running something and5 are idle when...
Assume one of those threads instantiates a parallel_pipeline.

The parallel_pipeline (default) establishes itselfwith 8 tokens (under assumption 8 threads could be available).
The pipeline runs using the6 threads (othertwo are still running other tasks).
During execution of pipeline should those two threads finish up their task(s) then they join in with the pipeline.

I hope this explains the situation for you.

Jim Dempsey
0 Kudos
Nav
New Contributor I
702 Views
Thanks Raf and Jim. The explanation was helpful :)
0 Kudos
Reply