Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Flow in the Flow graph

Bouvier__Alexandre
430 Views

Hello everyone,

I'm really new in TBB Flow Graph and I'm facing an issue in a realtime detection application.

I use several sources (from 8 custom threads that read pixels from cameras) that send the data with a try_put to a broadcast_node. The broadcast send to a function_node for preprocessing (with unlimited concurrency) and the preprocess function send the result to a queue_node.Finally the queue is connected to the detection block which is another function_node. Due to the expensive computation (on the gpu) i need to restrict the number of detectors to 4 ; so I set the maximum concurrency to 4.

These detectors are slow (15 fps) but the final framerate has to be at the same framerate as the cameras. So when a detector is busy, I simply skip the detector's function_node and send the data to another buffer ( the previous queue_node is connected to it). Finally i reorder the frames with a sequencer_node 

In this configuration i get weird behavior when only 4 frames pass through the detector block and the rest are skipped. I decide to change the detector function_node with a couple of limiter_node and multifunction_node (output 0 for the data and ouput 1 for the continue_msg) but got the same result. 

 

It would be very helpfull if someone could give me some clues. 

Thank you

Alex

0 Kudos
1 Reply
Aleksei_F_Intel
Employee
429 Views

Hi Alex,

From your description, it seems that you can remove broadcast_node at the beginning, and do try_put() directly on the unlimited function_node, which does preprocessing. Then, I suggest trying to replace single detection node with max_concurrency equals to four, to four function nodes with unlimited max_concurrency, and decide where to try_put the data to from the queue_node. E.g. put it in round-robin fashion. You might want to use multifunction_node for such a decider logic. Perhaps, it even makes sense to combine this logic with the frame skipping one in this single node.

If that does not help, then I would remove the queue_node in the middle and send the output of the preprocessing step directly to detection step.

One more thing to try after all that is to use the lightweight policy on all four detection_nodes with unlimited concurrency. 

However, looking at your description from the high-level, it does make sense to me to check the logic of skipping frames. It sounds like the logic merely decides to skip all the frames after the fourth one. Could you also please share a little more details on how the frame skipping logic is attached to the graph?

Regards,

Aleksei

0 Kudos
Reply