I am new to parallelism and I managed to read the TBB book that was just recently published. However, my lack to some of the main concepts in Parallelismprompt me to read more to understand the subject more.
Now to my understanding is that OpenMP is more geared to Shared Memory Architecture whereas MPI is more geared to Distributed Memory computing. So the next question comesin that where TBB fits? Where does TBB fullfill its purpose on which hardware configurations? Is TBB Portable? the scheduler is it Static,Dynamic or Hybrid? Furthermore, the exploitation of concurrency, are they implemented using Data, Task to data flow decomposition patterns?
The reason I ask all these questions is because through outthe reading of all materials in the TBB books and TBB Website, I haven't found detailed information to unveil the power of TBB.
Again the last book that was skimming through "Patterns for Parallel Programming", TBB wasn't mentioned, it was mostly focused arround OpenMP and MPI.
TBB targets shared memory architectures. It works even on single-core single-processor machines, though for performance benefit there should be two or more hardware threads (i.e. full-fledged cores or at least hyper-threads) available.
TBB proivdes a high-level, system-independent interface; in this sense, it's "portable". Inside, TBB containsmachine-, OS-, and compiler-specific low level primitives which should be provided for (ported to) a platform of interest.
The TBB scheduler is a work-stealing scheduler. It's build on different principle than OpenMP schedulers you listed. The scheduler provides natural support for load balancing and nested and recursive parallelism. TBB provides high-level algorithm templates such as parallel_for to support "flat" loop parallelism; internally, it recursively divides the flat iteration space.
The exploitation of concurrency can not be pre-implemented in TBB; it's something you should do in your application with help of TBB (or other means). For this purpose, TBB provides algorithms to express both data and task decomposition, and if these are not enough, one can implement a custom algorithm using the low-level interface to the task scheduler provided by TBB.