Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6403 Discussions

NCS2 Parallel Networks on one Device are Serialized?

Dominik
Beginner
804 Views

Hi everyone,

I successfully created a network that I can infer on the NCS2. Now I want to speed up the inference by using 5 networks in parallel on 1 NCS2 where each network should use 4 inference requests.

I do this the following way:

  1. I spawn 5 threads where each creates an InferenceEngine::ExecutableNetwork for the MYRIAD device This should load 5 networks onto the NCS2.
  2. In each of those threads I spawn 4 additional threads that create a synchronous inference request.

Everything runs without errors and also the output image is correct but I don't see any speedup. When I measure the computation time it's almost the same (140seconds +- 1s) as when I run only one network. It looks like only one network and one inference request is doing all the work.

Does anyone have similar issues. Maybe there is something wrong with my architecture?

Best Regards

Dominik

0 Kudos
1 Solution
Rizal_Intel
Moderator
783 Views

Hi Dominik,

 

There seems to be no problem with your architecture based on your explanation.

There is only this requirement to parallelize the workload as much as possible.

 

Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.

 

You could check your parallel implementation with crossroad camera demo and action recognition demo.

 

To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.

 

Regards,

Rizal


View solution in original post

4 Replies
Rizal_Intel
Moderator
784 Views

Hi Dominik,

 

There seems to be no problem with your architecture based on your explanation.

There is only this requirement to parallelize the workload as much as possible.

 

Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.

 

You could check your parallel implementation with crossroad camera demo and action recognition demo.

 

To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.

 

Regards,

Rizal


Dominik
Beginner
777 Views

Thank you for you feedback. The time difference was just so little that I didn't recognize it at first but now it works fine.

BR
Dominik

0 Kudos
Rizal_Intel
Moderator
761 Views

Hi Dominik,


Do you need any other additional information?


Regards,

Rizal


0 Kudos
Rizal_Intel
Moderator
742 Views

Hi Dominik,


Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.


Regards,

Rizal


0 Kudos
Reply