- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I successfully created a network that I can infer on the NCS2. Now I want to speed up the inference by using 5 networks in parallel on 1 NCS2 where each network should use 4 inference requests.
I do this the following way:
- I spawn 5 threads where each creates an InferenceEngine::ExecutableNetwork for the MYRIAD device This should load 5 networks onto the NCS2.
- In each of those threads I spawn 4 additional threads that create a synchronous inference request.
Everything runs without errors and also the output image is correct but I don't see any speedup. When I measure the computation time it's almost the same (140seconds +- 1s) as when I run only one network. It looks like only one network and one inference request is doing all the work.
Does anyone have similar issues. Maybe there is something wrong with my architecture?
Best Regards
Dominik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dominik,
There seems to be no problem with your architecture based on your explanation.
There is only this requirement to parallelize the workload as much as possible.
Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.
You could check your parallel implementation with crossroad camera demo and action recognition demo.
To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.
Regards,
Rizal
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dominik,
There seems to be no problem with your architecture based on your explanation.
There is only this requirement to parallelize the workload as much as possible.
Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.
You could check your parallel implementation with crossroad camera demo and action recognition demo.
To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.
Regards,
Rizal
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for you feedback. The time difference was just so little that I didn't recognize it at first but now it works fine.
BR
Dominik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dominik,
Do you need any other additional information?
Regards,
Rizal
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dominik,
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.
Regards,
Rizal

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page