Divide & Conquer maps well to parallel_for/while
Pipes - pipeline/filters
so for languages like Erlang, and packages like MPI, message passing is the basis of the parallel architecture which implies that there is a message pump between the "threads"
SO my question:
For patterns like message-passing, Layers, and especially master-worker (common for me), what best TBB tool can I use easily? Should I be looking at anyting in particular? I never really understood "task" classes - is there a way I can spawn a task that waits for commands like a worker? I could easily implement message pumps no problem, but how would I implement the worker/master threads heterogeneously. Thanks a bunch for any help or direction!
There is no support for message-passing in TBB. During the design of TBB, we investigated many different approaches. What we settled on, largely inspired by Cilk, is an approach that yield good cache locality, good space bounds, load balancing, full support for nested parallelism, and could be implemented efficiently as a pure C++ library. Message passing, though definitely useful in some contexts, did not have all the properties we were looking for.
That's not meant to disparage message-passing. A language like Erlang that has compiler support for message passing can do it very well. And MPI's message passing is certainly appropriate for writing programs that run on distributed-memory machines, albeit at some cost in programmer labor.
Master-worker is generally not scalable, because eventually the number of workers outstrips the ability of the master to direct/feed them. I usualy recommend hierarchical structures in such situations. But sometimes master-worker suffices and is easy to program. There's no one true way to do parallel programming.
Since TBB does not support all possible parallel programming paradigms, we did go to some effort to make it interoperate reasonably with native threads on each platform. So you can certainly mix TBB and native threads. Appendix B of the Tutorial describes how to do. Basically, the requirement is that the native threadhas to have an activetask_scheduler_init object while it is running a TBB algorithm.
If there is some higher-level way to express your pattern than "master-worker", and it seems like a generally useful pattern, perhaps there is a way we could add the pattern to TBB. What's the general nature of your algorithm?
- Arch Robison
Though the master-worker problem is more indicative of the nature of the algorithms we write, we actually use pipelines more for the heterogeneous tasks. But they become somewhat of a problem when the tasks become sufficiently small and we must split them again and again until we have a gazillion (no that's not a word ;-) ) filters, or what feels like a gazillion... Sometimes the filters have very little shared data but could spend a lot of time in a loop but these loops are in files of their own, because we try to be good software engineers..
I guess from TBB's point of view, it seems like an issue that is not easy to solve - more of a language barrier.OpenMP has a nice solution, "#pragma omp section", which beautifies the situation a bit. Problem is that we also compile with a custom gcc which doesn't support OpenMP.
I guess the question I have for you boils down to whether or not you have any suggestions for small hetereogenous tasks and what patterns or language nuances I/we haven't though of that may lend themselves to this type of environment.
It might help to stud the sequential version of the program. Another nice property of TBB is that there is that a TBB program can be run sequentially (e.g. consumer-producer programs require parallelism, which can be a bummer to debug). How would the tasks be sequenced in a sequential version of your program? If you can split the sequential execution, or a part thereof, into two independent parts, that might reveal a divide and conquer approach. Part of the advantage of divide and conquer is, when a Cilk/TBB style scheduler is used, that all the potential parallelism is not turned loose all at once. That tends to swamp a machine. Instead, the parallel depth-first execution tends to create just enough parallelism to keep the machine busy.
As far as loops being in different files, maybe there's a way to build some kind of registery of them at startup time? One way to approach it is to ask how does the sequential version know which loops to invoke when?