Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Intel tbb flowgraph speedup


Here is my attempt to benchmark the performance of intel tbb flow graph. Here is the setup:

- One broadcast node sending continue_msg to N successor nodes (broadcast_node<continue_msg>)

- Each successor node perform a computation that takes t seconds.

- The total computation time when performed serially is Tserial = N* t

- The ideal computation time if all cores are used is Tpar(ideal) = N * t / C, where C is the number of cores.

- The speedup is defined as Tpar(actual) / Tserial

- I tested the code with gcc5 on a 16 core PC.

Here are the results showing the speedup as a function of the processing time of individually task (i.e. body):

t = 100 microsecond  ,    speed-up =  14

t  = 10 microsecond  ,    speed-up =  7

t  = 1 microsecond  ,   speed-up =  1

As can been for light weight tasks (whose computation takes less than 1 microseconds), the parallel code is actually slower that the serial code. Here are my questions:

  1. Are these results inline with intel tbb benchmarks?

  2. It there a better paradigm than flow graph for the case when there are thousands of tasks each taking less than 1 microsecond?

0 Kudos
0 Replies