I'm developing a realtime audio processing software. There may be several (for example even 100) processors at each moment, in several parallel chains. I cannot let the processors cooperate and must assume any possible sequence of processing. Each of them receives a block of data usually 256-1024 values and needs to process them as quickly as possible, so that the results may be passed to the next item in chain. If the data is not delivered in time, bad things happen... But in many cases just a few processors may be used and the goal is to keep general CPU usage minimal then. The algorithms in each processor vary a lot, so it is hard to predict anything.
The "host" for all these processors is unknown and usually implements some kind of parallelization as well, but in my testing huge project it was reporting "near trouble" CPU usage, while the system task manager reported just about 14% CPU usage on my 8-core Xeon E5, so evidently there's a lot of spare processing power.
From what I know these are the choices:
1) TBB - this one looks harder to use.
3) OpenMP - I actually tested this one via MSVC and sadly it seemed to have open actively waiting threads, which means that the CPU was at 100% despite pretty small improvement in performance.
I'd prefer if the solution could be linked statically. All of the processor implementations will be present in a single DLL (Windows) / dylib (OSX).
what about tbb flow graph and flow graph designer?
flow graph designer
I'm not sure why you found TBB hard to use. For simple loops it is not harder than simple loops in OpenMP or Cilk Plus.
For example, http://www.parallel-school.ru/wp-content/uploads/2014/09/Intel_TBB.pdf. ;There 100 slides of advantages:)
Basically you can start from simple loops and then improve efficiency of your C++ code if you have C++ code. If you use C then OpenMP or Cilk Plus is better.
Some summary (Slide 93):
Use OpenMP if...
• Code is C, Fortran, (or C++ that looks like C)
• Parallelism is primarily for bounded loops over built-in types
• Minimal syntactic changes are desired
Use Intel® Threading Building Blocks if..
•Must use a compiler without OpenMP support
• Have highly object-oriented or templated C++ code
• Need concurrent data structures
• Need to go beyond loop-based parallelism
• Make heavy use of C++ user-defined types
Threading runtimes differences (Slide 92):