- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is there any advantage in pipeline vs raw thread?
i.e.I have divided my job into several raw thread,they communicate with each other via queues(lock-free/concurrent,etc),
will there be any benefit from pipeline mode?
BTW,I think TBB::task pattern is easy to understand,but anyone can tell its advantages vs raw thread?
benchmark/test result comparision are appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tbb::pipeline will mostly avoid dragging data across caches (cache locality is a big concern for performance): the filters will tend to make house calls instead (and since code doesn't change it can live in several caches at the same time).
"BTW,I think TBB::task pattern is easy to understand,but anyone can tell its advantages vs raw thread?"
TBB will take care of several things related to getting the most out of the machine that would otherwise distract you, if you can accept that it sacrifices fairness and currently sometimes even global progress for throughput-oriented performance. Instead of wasting time reinventing the wheel (and getting the bumps out), you can let TBB's scheduler provide better cache locality, scalability, etc.
"benchmark/test result comparision are appreciated."
Anyone?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would argue that the advantage of task-based parallelism over raw thread implementations are two fold.
The first advantage is portability to multiple numbers of cores. Given a 2-, 4-, and 8-threaded machine, you have to either over-provision the number of threads (i.e. 8 threads on the 2-core machine) or write three different versions of the code, to avoid running too many threads. Tasking does this naturally, dividing the work among how ever many worker threads are being used.
The second advantage is automatic load balancing. If your parallel workload includes jobs that are dynamic in the amount of computation they do, it's impossible to statically load-balance these jobs across threads. Task schedulers deal with this by asking the programmer to express all available parallelism in the program and keeping unassigned tasks around for "bored" worker threads to grab and do.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would argue that the advantage of task-based parallelism over raw thread implementations are two fold.
The first advantage is portability to multiple numbers of cores. Given a 2-, 4-, and 8-threaded machine, you have to either over-provision the number of threads (i.e. 8 threads on the 2-core machine) or write three different versions of the code, to avoid running too many threads. Tasking does this naturally, dividing the work among how ever many worker threads are being used.
The second advantage is automatic load balancing. If your parallel workload includes jobs that are dynamic in the amount of computation they do, it's impossible to statically load-balance these jobs across threads. Task schedulers deal with this by asking the programmer to express all available parallelism in the program and keeping unassigned tasks around for "bored" worker threads to grab and do.
can pipeline grauanteethe data go through the pipeline as its original order?
raw thread is not easy to do this
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please consult http://www.threadingbuildingblocks.org/, Documentation, Reference Manual (Open Source or Commercial), and look for "serial_in_order". Note that this is only guaranteed to be the order of the input filter if that input filter is also serial_in_order (which might be made more explicit in the documentation).
(Correction) Only "Reference (Open Source).pdf" currently mentions "serial_in_order", perhaps because it is revision 1.13 vs. 1.10 for the commercial documentation, but maybe itthat just lags the actual software (I have not checked).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
raw thread is not easy to do this
All ordered serial filters in the pipeline process data in the same order, as established by the first such filter.
So you can make your first filter ordered, as well as every other filter where the order is important.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for pointing out yet another place to improve our docs.
After internal discussion, we would like to add the following tip right after the introduction of stage types:
"Tip: Use a serial_in_order input filter if there are any subsequent serial_in_order stages that should process items in their input order."
The question to everyone: would such a tip be sufficient to makeit more obvious how to process items in the order of input?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also thought that "in no particular order" looks rather scary (althoughI wouldn't promise FIFO either), but thenI'm an excellent nitpicker if nothing else. :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for pointing out yet another place to improve our docs.
After internal discussion, we would like to add the following tip right after the introduction of stage types:
"Tip: Use a serial_in_order input filter if there are any subsequent serial_in_order stages that should process items in their input order."
The question to everyone: would such a tip be sufficient to makeit more obvious how to process items in the order of input?
how about the parallel filter? i.e. class MyTransformFilter in text_filter, how does pipeline grauantee the 'MyBuffer' will enter 'MyOutputFilter'in correct order? because 'MyTransformFilter is not serial filter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Conceptually, the first serial_in_order filter gives each data item a ticket from a pipeline-global dispenser, which the data item has to present to each subsequent serial_in_order filter (other kinds of filters just ignore it). When I was playing around with the code (although I haven't completed yet what I set out to do), I thought it might be useful to make the metaphor more explicit by substituting names like ticket_dispenser, now_serving, next_please, although I didn't go as far as renaming input_buffer to waiting_room (yet). :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It does not matter in which order items are put into the buffer of an ordered filter. The proper ordering is established when the items are taken from the buffer to be processed by the filter. If the item that should be processed next has not yet arrived, all other items will wait in the buffer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also thought that "in no particular order" looks rather scary (althoughI wouldn't promise FIFO either), but thenI'm an excellent nitpicker if nothing else. :-)
I mean that there is no promise to prevent pipeline-local starvation of any particular data item, which, as far as I can tell, can in fact happen in a parallel filter (but see below), because, even though a pipeline's filters are given different priorities in an effort to promote pipeline-local progress, such a data item might be held captive in a thread that is now busy executing tasks not related to the pipeline.
The maximum number of tokens in flight will ensure that this will not cause an unbounded backlog in the second of two serial_in_order filters that occur somewhere before and after the parallel filter (and I have to go check what this means for my pipeline reimplementation), but this will instead cause the entire pipeline to stall.
Does this seem plausible?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The maximum number of tokens in flight will ensure that this will not cause an unbounded backlog in the second of two serial_in_order filters that occur somewhere before and after the parallel filter (and I have to go check what this means for my pipeline reimplementation), but this will instead cause the entire pipeline to stall.
Does this seem plausible?
I think you are correct, though I probably do not understand why you stressed the parallel filter.
However I am not sure if this level of details makes sense in the documentation; it's too implementation specific to my sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In a parallel filter, a data item can suffer an unbounded delay relative to other data items passing through the same pipeline. This does not happen in a serial filter, incidentally because of their current implementation (either type), or inherently because of their specification (all but the first serial_in_order filters).
"However I am not sure if this level of details makes sense in the documentation; it's too implementation specific to my sense."
I would tend to agree if we were talking about a possible warning not to use a microwave oven to dry pets, but in this case I would probably be somewhat cross if my application malfunctioned because of something somebody knew but didn't care to tell me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In a parallel filter, a data item can suffer an unbounded delay relative to other data items passing through the same pipeline. This does not happen in a serial filter, incidentally because of their current implementation (either type), or inherently because of their specification (all but the first serial_in_order filters).
"However I am not sure if this level of details makes sense in the documentation; it's too implementation specific to my sense."
I would tend to agree if we were talking about a possible warning not to use a microwave oven to dry pets, but in this case I would probably be somewhat cross if my application malfunctioned because of something somebody knew but didn't care to tell me.
Thanks for clarification; now I think I really understand what you meant. So this is about the "no particular order" note for unordered filters, how much reordering could be there, and progress of processing some items relatively to their peers.
A caution could be added, though I am not sure what it should say beyond what is already said. Also I do not think that the "unbounded delay" problem will ever have place in practice. As far as I remember, there are just two places in the implementation where tasks get spawned - at the start of the pipeline, and in note_done after processing by a serial filter. The former creates input tasks, and those do not carry any item yet, so there is no "unbounded delay" problem. The latter place spawns a task that should process the serial filter, and advances the current task to the next filter which might be parallel. So in the current implementation there are no parallel stage tasks sitting in the pools.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I still have troubles to devise a caution that would be specific enough to not just scary the reader (such as "in the worst case, a token can be completed after N-1 tokens that it would precede in sequential execution"), and yet high-level enough to not bury the reader under the amount of implementation specific details, like we are discussing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stealing without nested blocking-style parallelism? Anyway, this is not the current situation.
"If considering nested blocking style parallelism, then the executing task could be blocked for longer time than necessary to complete the nested computation."
You and I know that (now), but we're not the intended audience of such a statement...
"And with the recent task pool changes that also make stealing not limited by depth, I agree it could cause relative delay, bounded by the allowed number of simultaneously active tokens as you mentioned before."
This is about unbounded relative delays, aka. starvation.
"I still have troubles to devise a caution that would be specific enough to not just scary the reader (such as "in the worst case, a token can be completed after N-1 tokens that it would precede in sequential execution"), and yet high-level enough to not bury the reader under the amount of implementation specific details, like we are discussing."
I'm sure you'll find a way to sugarcoat what the user should know. :-)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page