Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Dominik
Beginner
280 Views

NCS2 Parallel Networks on one Device are Serialized?

Jump to solution

Hi everyone,

I successfully created a network that I can infer on the NCS2. Now I want to speed up the inference by using 5 networks in parallel on 1 NCS2 where each network should use 4 inference requests.

I do this the following way:

  1. I spawn 5 threads where each creates an InferenceEngine::ExecutableNetwork for the MYRIAD device This should load 5 networks onto the NCS2.
  2. In each of those threads I spawn 4 additional threads that create a synchronous inference request.

Everything runs without errors and also the output image is correct but I don't see any speedup. When I measure the computation time it's almost the same (140seconds +- 1s) as when I run only one network. It looks like only one network and one inference request is doing all the work.

Does anyone have similar issues. Maybe there is something wrong with my architecture?

Best Regards

Dominik

0 Kudos
1 Solution
Rizal_Intel
Moderator
259 Views

Hi Dominik,

 

There seems to be no problem with your architecture based on your explanation.

There is only this requirement to parallelize the workload as much as possible.

 

Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.

 

You could check your parallel implementation with crossroad camera demo and action recognition demo.

 

To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.

 

Regards,

Rizal


View solution in original post

4 Replies
Rizal_Intel
Moderator
260 Views

Hi Dominik,

 

There seems to be no problem with your architecture based on your explanation.

There is only this requirement to parallelize the workload as much as possible.

 

Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.

 

You could check your parallel implementation with crossroad camera demo and action recognition demo.

 

To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.

 

Regards,

Rizal


View solution in original post

Dominik
Beginner
253 Views

Thank you for you feedback. The time difference was just so little that I didn't recognize it at first but now it works fine.

BR
Dominik

Rizal_Intel
Moderator
237 Views

Hi Dominik,


Do you need any other additional information?


Regards,

Rizal


Rizal_Intel
Moderator
218 Views

Hi Dominik,


Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.


Regards,

Rizal


Reply