Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Simple pipeline question

tbbnovice
Beginner
366 Views

If all the filters in my pipeline are serial, would I still see an improvement in performance vs. running them sequentially?

Specifically, if I had serial filters A->B->C->D, would B be running and generating the next item at the same time C is processing the last item generated by B?

Thanks a lot.

0 Kudos
7 Replies
ARCH_R_Intel
Employee
366 Views

In theory, you could see a performance improvement, since the filters will be operating in parallel. Whether you get this performance improvement in practice depends upon whether there is enough work per item to amortize scheduling overheads, and how balanced the work is across the stages.

Amdahl's law for a tbb::pipeline is that the throughput of the pipeline is limited by the throughput of the slowest serial stage.

0 Kudos
jimdempseyatthecove
Honored Contributor III
366 Views
Quoting - tbbnovice

If all the filters in my pipeline are serial, would I still see an improvement in performance vs. running them sequentially?

Specifically, if I had serial filters A->B->C->D, would B be running and generating the next item at the same time C is processing the last item generated by B?

Thanks a lot.


>>I had serial filters A->B->C->D...

Also consider the implications of

A1->B1->C1->D1
A2->B2->C2->D2
A3->B3->C3->D3

Where the otherwise single thread only capable functions/tasks A, B, C, D can now work in parallel on seperate work items.

Jim Dempsey

0 Kudos
robert-reed
Valued Contributor II
366 Views

Also consider the implications of

A1->B1->C1->D1
A2->B2->C2->D2
A3->B3->C3->D3

You'll find a diagram similar to this in my blog post (three parts) using TBB pipeline to overlap streaming file I/O and processing. One misleading aspect of Jim's diagram above is that it accidentally meets Arch's expression of the Amdahl limit for pipelines: if all the stages are the same "length" the pipeline reaches maximal concurrency. If each stage is truly serial, i.e., cannot support concurrent processing, throwing in a little variance in length might show another picture:

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

If only a single copy of B can execute at a time in this distended variant of Jim's example, you can see a growing separation between when A finishes and B begins, but note that the Ds finish at a regular interval, though not nearly as quickly as the As.

0 Kudos
jimdempseyatthecove
Honored Contributor III
366 Views

You'll find a diagram similar to this in my blog post (three parts) using TBB pipeline to overlap streaming file I/O and processing. One misleading aspect of Jim's diagram above is that it accidentally meets Arch's expression of the Amdahl limit for pipelines: if all the stages are the same "length" the pipeline reaches maximal concurrency. If each stage is truly serial, i.e., cannot support concurrent processing, throwing in a little variance in length might show another picture:

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

If only a single copy of B can execute at a time in this distended variant of Jim's example, you can see a growing separation between when A finishes and B begins, but note that the Ds finish at a regular interval, though not nearly as quickly as the As.

Right - so Robert,

I should have pointed this out, thanks for your additional comments.

Now that the viewers have had a chance to digest what you have illustrated, they should now appreciate that the little bit of extra effort in making each stage of the pipeline thread safe is well worth the effort.(i.e. making it so A1 can run concurrent with A2, etc...)

Jim

0 Kudos
jimdempseyatthecove
Honored Contributor III
366 Views


Also

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

As your article shows.

One of the other things this illustrates is when your application has additional work to perform, the task stealing nature of TBB will fill in the blanks so to speak.

Jim

0 Kudos
tbbnovice
Beginner
366 Views


Also

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

As your article shows.

One of the other things this illustrates is when your application has additional work to perform, the task stealing nature of TBB will fill in the blanks so to speak.

Jim


Thanks for all the help - the support I get on this forum is incredible! This is one of the reasons why I am pushing for tbb adoption in our projects.

If I have multiple pipelines running concurrently I believe the above will not be much of an issue (unless? are there any caveats?) I don't think I can use a parallel for to run multiple pipelines - because they are not short-running tasks and some of the filters could be blocking (e.g. the file i/o example in the book/blog). Would you suggest using real threads to run multiple pipelines concurrently?
0 Kudos
robert-reed
Valued Contributor II
366 Views
Quoting - tbbnovice
If I have multiple pipelines running concurrently I believe the above will not be much of an issue (unless? are there any caveats?) I don't think I can use a parallel for to run multiple pipelines - because they are not short-running tasks and some of the filters could be blocking (e.g. the file i/o example in the book/blog). Would you suggest using real threads to run multiple pipelines concurrently?

Concurrent pipelines? There has been some discussion about running multiple pipelines on this forum: Multiple Concurrent Pipelines, Multiple Pipelines, and most importantly,Need Help with Pipeline Deadlock. I refer you to those discussions for some background.
0 Kudos
Reply