Simple pipeline question

tbbnovice · ‎01-05-2009

If all the filters in my pipeline are serial, would I still see an improvement in performance vs. running them sequentially?

Specifically, if I had serial filters A->B->C->D, would B be running and generating the next item at the same time C is processing the last item generated by B?

Thanks a lot.

ARCH_R_Intel · ‎01-05-2009

In theory, you could see a performance improvement, since the filters will be operating in parallel. Whether you get this performance improvement in practice depends upon whether there is enough work per item to amortize scheduling overheads, and how balanced the work is across the stages.

Amdahl's law for a tbb::pipeline is that the throughput of the pipeline is limited by the throughput of the slowest serial stage.

jimdempseyatthecove · ‎01-05-2009

Quoting - tbbnovice

If all the filters in my pipeline are serial, would I still see an improvement in performance vs. running them sequentially?

Specifically, if I had serial filters A->B->C->D, would B be running and generating the next item at the same time C is processing the last item generated by B?

Thanks a lot.

>>I had serial filters A->B->C->D...

Also consider the implications of

A1->B1->C1->D1
A2->B2->C2->D2
A3->B3->C3->D3

Where the otherwise single thread only capable functions/tasks A, B, C, D can now work in parallel on seperate work items.

Jim Dempsey

robert-reed · ‎01-05-2009

Quoting - jimdempseyatthecove

Also consider the implications of

A1->B1->C1->D1
A2->B2->C2->D2
A3->B3->C3->D3

You'll find a diagram similar to this in my blog post (three parts) using TBB pipeline to overlap streaming file I/O and processing. One misleading aspect of Jim's diagram above is that it accidentally meets Arch's expression of the Amdahl limit for pipelines: if all the stages are the same "length" the pipeline reaches maximal concurrency. If each stage is truly serial, i.e., cannot support concurrent processing, throwing in a little variance in length might show another picture:

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

If only a single copy of B can execute at a time in this distended variant of Jim's example, you can see a growing separation between when A finishes and B begins, but note that the Ds finish at a regular interval, though not nearly as quickly as the As.

jimdempseyatthecove · ‎01-05-2009

Quoting - Robert Reed (Intel)

You'll find a diagram similar to this in my blog post (three parts) using TBB pipeline to overlap streaming file I/O and processing. One misleading aspect of Jim's diagram above is that it accidentally meets Arch's expression of the Amdahl limit for pipelines: if all the stages are the same "length" the pipeline reaches maximal concurrency. If each stage is truly serial, i.e., cannot support concurrent processing, throwing in a little variance in length might show another picture:

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

If only a single copy of B can execute at a time in this distended variant of Jim's example, you can see a growing separation between when A finishes and B begins, but note that the Ds finish at a regular interval, though not nearly as quickly as the As.

Right - so Robert,

I should have pointed this out, thanks for your additional comments.

Now that the viewers have had a chance to digest what you have illustrated, they should now appreciate that the little bit of extra effort in making each stage of the pipeline thread safe is well worth the effort.(i.e. making it so A1 can run concurrent with A2, etc...)

Jim

jimdempseyatthecove · ‎01-05-2009

Also

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

As your article shows.

One of the other things this illustrates is when your application has additional work to perform, the task stealing nature of TBB will fill in the blanks so to speak.

Jim

tbbnovice · ‎01-06-2009

Quoting - jimdempseyatthecove

Also

A1->B00000001->C0001->D01
A2-> B00000002->C0002->D02
A3-> B00000003->C0003->D03
A4-> B00000004->C0004->D04

As your article shows.

One of the other things this illustrates is when your application has additional work to perform, the task stealing nature of TBB will fill in the blanks so to speak.

Jim

Thanks for all the help - the support I get on this forum is incredible! This is one of the reasons why I am pushing for tbb adoption in our projects.

If I have multiple pipelines running concurrently I believe the above will not be much of an issue (unless? are there any caveats?) I don't think I can use a parallel for to run multiple pipelines - because they are not short-running tasks and some of the filters could be blocking (e.g. the file i/o example in the book/blog). Would you suggest using real threads to run multiple pipelines concurrently?

robert-reed · ‎01-06-2009

Quoting - tbbnovice

If I have multiple pipelines running concurrently I believe the above will not be much of an issue (unless? are there any caveats?) I don't think I can use a parallel for to run multiple pipelines - because they are not short-running tasks and some of the filters could be blocking (e.g. the file i/o example in the book/blog). Would you suggest using real threads to run multiple pipelines concurrently?

Concurrent pipelines? There has been some discussion about running multiple pipelines on this forum: Multiple Concurrent Pipelines, Multiple Pipelines, and most importantly,Need Help with Pipeline Deadlock. I refer you to those discussions for some background.