- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi everyone,
I successfully created a network that I can infer on the NCS2. Now I want to speed up the inference by using 5 networks in parallel on 1 NCS2 where each network should use 4 inference requests.
I do this the following way:
- I spawn 5 threads where each creates an InferenceEngine::ExecutableNetwork for the MYRIAD device This should load 5 networks onto the NCS2.
- In each of those threads I spawn 4 additional threads that create a synchronous inference request.
Everything runs without errors and also the output image is correct but I don't see any speedup. When I measure the computation time it's almost the same (140seconds +- 1s) as when I run only one network. It looks like only one network and one inference request is doing all the work.
Does anyone have similar issues. Maybe there is something wrong with my architecture?
Best Regards
Dominik
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Dominik,
There seems to be no problem with your architecture based on your explanation.
There is only this requirement to parallelize the workload as much as possible.
Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.
You could check your parallel implementation with crossroad camera demo and action recognition demo.
To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.
Regards,
Rizal
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Dominik,
There seems to be no problem with your architecture based on your explanation.
There is only this requirement to parallelize the workload as much as possible.
Actually, there is only a single Movidius X chip on the NCS2. Therefore, inference calls are actually queued to be executed on the single chip.
You could check your parallel implementation with crossroad camera demo and action recognition demo.
To get an increase in performance you would need to have multiple NCS2 sticks (or Intel® Vision Accelerator Design). There is an example you can use a reference created by Victor Li for utilising multiple NCS2. This is based on the concept of MYRIAD device allocation.
Regards,
Rizal
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Thank you for you feedback. The time difference was just so little that I didn't recognize it at first but now it works fine.
BR
Dominik
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Dominik,
Do you need any other additional information?
Regards,
Rizal
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Dominik,
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.
Regards,
Rizal