I'm using flow::graph API to build a pipeline that processes live video. There is a fixed part of this pipeline built on application startup, schematically as follows:
camera_source_node => limiter_node(6) => .... => display_node => recycle_node => limiter_node.decrement
The part in dots is made up of different graph segments that I switch at run time (after stopping the graph of course).
Now, consider the following segment:
camera_source_node => limiter_node(6) => light_node => very_busy_node => display_node => recycle_node => limiter_node.decrement
I'm using 6 as the maximal number of messages / video_frames allowed within the graph, as a compromise between various graph segment needs, but in the example above, most of those messages are accumulated in the input queue of the very_busy_node, which is the bottleneck.
If I would halve the threshold of limiter_node from 6 to 3, the time it takes a video_frame from camera to display is halved too, because each message waiting in the input queue of very_busy_node contributes to this latency. But other graph segments would perform less optimally with such a small number of allowed messages.
What I want to do is to somehow move the extra messages from the input queue of very_busy_node to the input queue of recycle_node, which is outside the path between camera and display. For that to happen, recycle_node should delay messages a little more than very_busy_node.
The problem is I don't know which node is the bottleneck in various graph segments, so I have to measure the average time each node takes to process messages, but some nodes have a concurrency higher than 1, which complicates things because while they report the maximal time, they may not be the bottleneck thanks to parallelism. Dividing their time by their concurrency doesn't give the right answer either, because if I've set their concurrency to 4, there's not guarantee 4 messages are being processed simultaneously.
This is my first time using tbb and I don't know all the available tools the API provides, so my question to the people that have a deeper understanding of this platform than me is, are there any tools available to tackle this problem?
Regarding your question about performance and bottlenecks investigation, you could start with Flow Graph Analyzer which allows you to capture the performance data from running Intel® TBB Flow Graph applications.
Moreover, you could look at this usage scenario, which seems like similar to yours.