pipeline vs graph question

Customer__Intel6 · ‎05-17-2011

Hi,
I need to process multiple file streams in parallel like below.

input filter(s) -> parallel process filter -> output filter(s)

Should I launch multiple pipelines in separate threads (one per input file)? Can this be done in the new graph preview feature? The parallel process filter is the same for all the input files which are saved separately by other processes and cannot be concatenated together. Also, I would like the improved I/O throughput of accessing the files in parallel. I am on a 64 bit CentOS 5.5 with 16 cores.

Thanks,
--Kannan.

Kirill_R_Intel · ‎05-17-2011

Hi Kannan,

Making several pipelines can solve the problem, but will not give enough load balancing. Using tbb::graph can give you more flexibility. You can create graph with several nodes reading from files. They can than pass data to buffer nodes (queue if needed), from where functional nodes will extract data and process it. You can create any number of nodes of each type and connect them the way you like.

There is also possibility to change graph structure dynamically - e.g. you can create extra reading node when more files appear.

For using graph take a look at some articles:

http://software.intel.com/en-us/blogs/tag/graph/

Regards,

Kirill Rogozhin

jimdempseyatthecove · ‎05-18-2011

Kannan,

In a file processing situation you have thread stalls for Open, Close, Read, Write. The TBB architecture is an equal priority tasking system. Meaning during a thread stall you may be undersubscribed for parallel processing of your process filter. Try adding +2 threads (run with 18 or 34 if you have HT). Also, run with more tokens (I/O buffers) than number of processing threads. The number of additional buffers will depend on the worst case latency of the I/O.

On QuickThread (my product) I have two classes of threads: Compute class(like TBB) plus I/O class. You use the I/O class for I/O or other stalling tasks (e.g.waiting for events).

Using QuickThread for this pipelineyou would typically specify two I/O class threads and number of hardware threads for the compute class. When programmed this way, and with 2x number of compute class threads for buffers, the simple TBB sample program to upcase words in a file run on a Dell R710 converts at ~3.2GB/sec. This speed is saturating the I/O and memory bandwidths and in this case adding an additional pipeline (for say 2nd conversion stream) would be counterproductive. This sample is running with one verylarge input file and writing one large output file.

When processing multiple smaller files I/O latency will tend tobe longer due to many Open/Close operations. In this situation you "might" find some benefit in running multiple pipelines but this is something you would have to test on a case by case basis. Note, running multiple I/O streams tends to increase average I/O latency due to the potential for increased seek latency as well as internal disk buffer data eviction. The following may work on TBB and it deffinately works for QuickThread.
For a many files situation consider a two dimensional pipeline

file-by-file input filter->
parallel second dimension process filter ->
file-by-file output filter

combined with second dimension process filter (pipeline)

intra-file input filter ->
parallel process filter ->
intra-file output filter

In this case you might want +4 threads - experiment

Jim Dempsey