Solved: Hi Matvey,

Ivanov__Matvey · ‎04-09-2020

Hi,

I have been working with the NCS2 and the Openvino Python Api for almost a year. I always used the sync mode for measurements, because it's delivers the most consistent results in terms of latency on different hosts (laptops, desktops, RPi). Now I have come across this repo:

https://github.com/decemberpei/openvino-ncs2-python-samples

And tried running the scripts, just to see considerable performance gains of the async mode, compared to sync. The scripts utilize multiple NCS, as well as multi threading and mutli processing, which obviously should give performance benefits, but I am still unclear about the exact meaning of inference request.

This video describes the async mode somewhat: https://www.youtube.com/watch?v=fzYe_E5sARA&t=6s

But I'm still lacking the understanding of the exact implementation of the request system.

Could someone please explain how it works or point to a good reference ressource?

Many thanks,

Matvey

SIRIGIRI_V_Intel · ‎04-13-2020

Hi Matvey,

Could you please go through the Async documentation which explains about the working of Async mode.
An inference request is an id which is sent to the inference engine to perform the task. For Async, the number of requests are initialized to 2 or more. The requests are swapped and utilized as per the application.

Refer the object_detection_demo_ssd_async to implement the Async api in the application.
Let us know if this helps.

Regards,

Ram prasad

View solution in original post

SIRIGIRI_V_Intel · ‎04-13-2020

Hi Matvey,

Could you please go through the Async documentation which explains about the working of Async mode.
An inference request is an id which is sent to the inference engine to perform the task. For Async, the number of requests are initialized to 2 or more. The requests are swapped and utilized as per the application.

Refer the object_detection_demo_ssd_async to implement the Async api in the application.
Let us know if this helps.

Regards,

Ram prasad

Ivanov__Matvey · ‎05-06-2020

Ram prasad (Intel) wrote:
Hi Matvey,
Could you please go through the Async documentation which explains about the working of Async mode.
An inference request is an id which is sent to the inference engine to perform the task. For Async, the number of requests are initialized to 2 or more. The requests are swapped and utilized as per the application.
Refer the object_detection_demo_ssd_async to implement the Async api in the application.
Let us know if this helps.
Regards,
Ram prasad

Thank you for the reply!

I have been able to implement threaded multi request asynchronous inference for multiple NCS2 for my application.

Kind regards,

Matvey

Meaning of "request" on the NCS2