- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have been working with the NCS2 and the Openvino Python Api for almost a year. I always used the sync mode for measurements, because it's delivers the most consistent results in terms of latency on different hosts (laptops, desktops, RPi). Now I have come across this repo:
https://github.com/decemberpei/openvino-ncs2-python-samples
And tried running the scripts, just to see considerable performance gains of the async mode, compared to sync. The scripts utilize multiple NCS, as well as multi threading and mutli processing, which obviously should give performance benefits, but I am still unclear about the exact meaning of inference request.
This video describes the async mode somewhat: https://www.youtube.com/watch?v=fzYe_E5sARA&t=6s
But I'm still lacking the understanding of the exact implementation of the request system.
Could someone please explain how it works or point to a good reference ressource?
Many thanks,
Matvey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matvey,
Could you please go through the Async documentation which explains about the working of Async mode.
An inference request is an id which is sent to the inference engine to perform the task. For Async, the number of requests are initialized to 2 or more. The requests are swapped and utilized as per the application.
Refer the object_detection_demo_ssd_async to implement the Async api in the application.
Let us know if this helps.
Regards,
Ram prasad
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matvey,
Could you please go through the Async documentation which explains about the working of Async mode.
An inference request is an id which is sent to the inference engine to perform the task. For Async, the number of requests are initialized to 2 or more. The requests are swapped and utilized as per the application.
Refer the object_detection_demo_ssd_async to implement the Async api in the application.
Let us know if this helps.
Regards,
Ram prasad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ram prasad (Intel) wrote:Hi Matvey,
Could you please go through the Async documentation which explains about the working of Async mode.
An inference request is an id which is sent to the inference engine to perform the task. For Async, the number of requests are initialized to 2 or more. The requests are swapped and utilized as per the application.Refer the object_detection_demo_ssd_async to implement the Async api in the application.
Let us know if this helps.Regards,
Ram prasad
Thank you for the reply!
I have been able to implement threaded multi request asynchronous inference for multiple NCS2 for my application.
Kind regards,
Matvey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page