Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Execution order of items in pipelines

Minh-Nhut_Hong
Beginner
697 Views

I have question regarding execution order of items in pipelines.

I have a pipeline with 3 stages s1, s2 and s3. Filter of these stages are set to 'serial_in_order'.

The pipeline is invoked within a parallel_while. Focus on two seperate invocation of s1 (these are within the parallel_while). Call them s1X and s1Y. Items of s1X (i.e s1X_1 s12X_ ... s1Xn_) are executed in serial order since its filter is set to 'serial_in_order'. The same is for s1Y (i.e item s1Y_1 s1Y_2 ... s1Y_m are executed in serial order). Is there execution order between s1X items and s1Y items? If '<' denotes 'executed before' order, can the following execution order happen   s1X_1 < s1X_2 < s1Y_1 < .... < s1X_n < s1Y_2 < ...?

 

0 Kudos
13 Replies
RafSchietekat
Valued Contributor III
697 Views

I don't think you're supposed to call pipeline::run() concurrently, but maybe that should be documented.

This is also one case where you may directly use threads to run separate pipeline instances.

Otherwise, do tell what you're trying to do here!

0 Kudos
jimdempseyatthecove
Honored Contributor III
698 Views

I think what Hong is looking for is:

[cpp]

X pipeline without parallel while
s1X_1
s1X_2, s2X_1
s1X_3, s2X_2, s3X_1
s1X_4, s2X_3, s3X_2
...
s1X_(n), s2X_(n-1), s3X_(n-2)
s2X_(n), s3X_(n-1)
s3X_n

-----------------
parallel while with concurrent X and Y
(~ == may be skewed with respect to time)

s1X_1                         ~ s1Y_1
s1X_2, s2X_1                  ~ s1Y_2, s2Y_1
s1X_3, s2X_2, s3X_1           ~ s1Y_3, s2Y_2, s3Y_1
s1X_4, s2X_3, s3X_2           ~ s1Y_4, s2Y_3, s3Y_2
...
s1X_(n), s2X_(n-1), s3X_(n-2) ~ s1Y_(n), s2Y_(n-1), s3Y_(n-2)
s2X_(n), s3X_(n-1)            ~ s2Y_(n), s3Y_(n-1)
s3X_n                         ~ s3Y_n
[/cpp]

time sequence is top to bottom

The above has no assurances as to time of completion amongst s3X_n and s3Y_n (and s3Z_n, ...)

each iteration of the parallel while will require a seperate instance of the parallel pipeline object.

Jim Dempsey

0 Kudos
Minh-Nhut_Hong
Beginner
698 Views

I recently completed the implementation for 'tiling  rectangle' problem using TBB this time

 http://software.intel.com/en-us/forums/p1-m2-tiling-rectangles

I got the correct results and very good execution time. However I could not reason about execution order of tasks mentioned in the question I posted the other day.

For a given 'area', there are many pairs (width, height) such that 'width' x 'height' = 'area'. For each pair of (width, height), I need to find out if it is possible to do the tiling. There are two pipelines in the program. There is a call to 'tiling_rectanges' in the last stage of the first pipeline. This function is defined in multi_input_set.h file. It is the one responsible for doing the tiling.

I use parallel_while in tiling_rectangel function to loop through all the (width, height) pairs. Note that these pairs are discovered sequentilally from the given input.

The second pipeline is defined in Tiling class. It tries to find out how to do the tiling and write the results into a file. The writing of results is done by the third stage of the second pipeline.

I attached part of the source code for your reference.

The question is the execution order of items related to the third stage of the second pipeline. For a given (width, height) pair,  the execution of all items associated to the third stage is in order. What about execution order of items across (width, height) paris?

 

 

0 Kudos
Christophe_H_Intel
698 Views

Hello Minh,

I am sorry I haven't had the time to examine the code you have posted.  As to your original question, Raf is correct.  The pipeline consists of filter objects and buffers (between those stages that require them; there is no need for a buffer between two parallel stages, for example.)  The tokens in the pipeline have sequence numbers to support serial in-order filters.  If two runs occur in parallel the token IDs won't be unique, and the pipeline will not keep them separate.  As Raf noted, this should be part of the documentation.

I don't entirely understand how one could differentiate two runs of the same pipeline without changing the the filters.  I will take a look at your code.

Best Regards,
Chris

0 Kudos
Minh-Nhut_Hong
Beginner
698 Views

Thank you all for your comments.

You all told me that there is no constraint execution order between items of different runs. I can understand that.

In fact that was the constraint I was thinking to implement; all items of one run to be executed either after or before all the items of a different run complete their execution. Then I realized that the code has been behave that way after running the program thousand times using different data set and on machines with different number of cores. Either I got the correct results by chances or something that made all these executed in the order expected.

0 Kudos
Minh-Nhut_Hong
Beginner
698 Views

I may have an answer why I got the correct results despite of having two runs of the same pipeline in parallel.

The first stage of the pipeline separates input into chunks which shall be written to an output buffer by the second stage. Since filter of the second stage is set to 'serial-in-order', chunks shall be written into output buffer in order. These chunks also seperated by run since each run has its own buffer.

Chris, you mentioned that "If two runs occur in parallel the token IDs won't be unique, and the pipeline will not keep them separate". If a stage's filter set to 'serial-in-order', are its items per run still executed in order?

0 Kudos
Christophe_H_Intel
698 Views

Hi, Minh,

The pipeline has a token_counter which is atomic.  When a token is injected to the pipeline (in an ordered or thread-bound filter) the token is assigned a value.  The result of running the pipeline twice in parallel would be that the stream(s) would be ordered, but the token assignment would be arbitrary (the only guarantee would be that the token number would be monotonically-increaing for each input stream.)

The serial in-order filters wait for the "next" token (the last processed token value + 1), and once the buffer slot corresponding to that token is filled the filter will execute.  This means the streams of tokens are mixed, but will each be processed in-order.

If the object being passed through the filter also has some sort of identifier telling you which "run" the object is part of, then the final filter can somehow differentiate the two streams.  If not, the output would be strictly ordered within each stream, but shuffled arbitrarily between streams.

The down-side of this is your serial filter is a bottleneck for all iterations of the parallel construct running the pipeline.

Does that sound like what is happening?

Regards,
Chris

0 Kudos
Christophe_H_Intel
698 Views

One thing I'm not sure of.  Do you build the pipeline once, and invoke its run() method in parallel?  If so, how does the input filter know what each different invocations wants it to do?  The filter object is built only once in your scenario, so it can't, for instance, read two different files and forwards data from them to other filters.  The run() method only specifies the number of tokens and optionally the context.

Best Regards,
Chris

0 Kudos
Minh-Nhut_Hong
Beginner
698 Views

Hi Chris,

Thanks for the explanation.

To see the execution order in action, I recorded start time and end time of an executed item.

Here is part of the results. All entries are sorted by (width x heigth).

     width x  height       - start time (ms)                 end time (ms)

         24 x   32500000 - 1377476075246.4418945, 1377476075566.2089844
         24 x   32500000 - 1377476075566.2189941, 1377476075566.2189941
         20 x   39000000 - 1377476075461.7309570, 1377476075811.4050293
         20 x   39000000 - 1377476075811.4331055, 1377476075811.4331055
         16 x   48750000 - 1377476075481.8979492, 1377476075885.8798828
         16 x   48750000 - 1377476075885.8898926, 1377476075885.8911133
      30000 x      26000 - 1377476074433.8569336, 1377476074433.8649902
      30000 x      26000 - 1377476074433.9160156, 1377476074433.9160156
      30000 x      26000 - 1377476074434.0649414, 1377476074434.0659180
      30000 x      26000 - 1377476074434.0729980, 1377476074434.0739746

Here is part of the same results. All entries are sorted by start time.

      30000 x      26000 - 1377476074447.8969727, 1377476074448.0258789
       3840 x     203125 - 1377476074448.0390625, 1377476074448.0390625
      30000 x      26000 - 1377476074448.0480957, 1377476074448.0500488
      30000 x      26000 - 1377476074448.0620117, 1377476074448.0639648
      30000 x      26000 - 1377476074448.0749512, 1377476074448.0769043
       2600 x     300000 - 1377476074448.0849609, 1377476074452.8701172
      30000 x      26000 - 1377476074448.0849609, 1377476074448.0878906
Items associted to a run are executed in order. And items associted to different runs can be executed in parallel.

The observed behaviors agree with my intention.

0 Kudos
Minh-Nhut_Hong
Beginner
698 Views

Hi Chris,

Do you build the pipeline once, and invoke its run() method in parallel?

No, I build the pipeline for each parallel_while's item invocation. Each pipeline runs with different input paramenters.

0 Kudos
Christophe_H_Intel
698 Views

Hello, Minh,

Thanks for the information.  Different instances of a pipeline can be run in parallel, and they should not interfere (beyond using the tasking resources of the scheduler.)  I misunderstood your initial description.

Regards,
Chris

0 Kudos
RafSchietekat
Valued Contributor III
698 Views

I still don't quite get it, but never mind.

Maybe a correction to Chris' reply: Different pipeline instances may be invoked concurrently, but it at least used to be the case that their execution could get entangled, possibly causing starvation of one pipeline by another. If that is still the case, a workaround would be to use different application threads, one for each pipeline. Of course, if the pipelines run for a finite amount of time, and if there's no assumption of concurrency, that wouldn't matter. But I'm not sure that I can conclude that this is the case here?

0 Kudos
Christophe_H_Intel
698 Views

Hi Raf,

Yes, that is what I meant by interacting at the task level.  When a filter is finished, and if the next stage of a pipeline can be executed (which would not necessarily be the case for a serial filter), the next filter is executed in preference to finding some other available work to do (the scheduler is bypassed.)

Regards,
Chris

0 Kudos
Reply