Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Hyodo__Katsuya
Innovator
175 Views

NCS2 x4 + MultiProcess + Core i7 + YoloV3, Boosted about 13 FPS (A little slow)

Hello, everyone. I tried implementing NCS2 + MultiProcess + YoloV3. I would be glad if it would be helpful to everyone. YoloV3 (asynchronous) NCS2 x4 ---> 4 FPS -Github https://github.com/PINTO0309/OpenVINO-YoloV3.git openvino_yolov3_MultiStick_test.py -Youtube https://youtu.be/3mKkPXpIc_U
0 Kudos
14 Replies
Yuanyuan_L_Intel
Employee
175 Views

Hi, Hyodo, Katsuya

I took a look at your mutliple stick sample, openvino_yolov3_MultiStick_test.py. The app does not utilize all of  the multiple ncs2 sticks. In that app, it uses the async API and multiple infer requests. That is good. But, multiple infere requests are not scheduled to multiple sticks. The performance gain from mutlitple infer requests comes from the hidden data transfer cost. Only one ExecutableNetwork instance created, so only one ncs2 device was used.  You can try to monitor the fps when you only plug in one ncs2 there.  If you want to make use of multiple ncs2 devices, multiple ExecutableNetwork instances need to be used.

 

Hyodo__Katsuya
Innovator
175 Views

@Yuanyuan L. (Intel) Thank you for always giving me the precise advice. I seem to have made a big mistake. I will immediately generate multiple "ExecutableNetwork" and review it to share Queue.
Hyodo__Katsuya
Innovator
175 Views

@Yuanyuan L. (Intel) Thanks to you I got about 4 times better performance. However, as the timing of displaying inference results is biased, the look is not beautiful. I will try to adjust just a little more.
Hyodo__Katsuya
Innovator
175 Views

Hello. With my power the following performance was the limit... Full size YoloV3 NCS x4 boosted about 13 FPS - Github https://github.com/PINTO0309/OpenVINO-YoloV3/blob/master/openvino_yolov3_MultiStick_test.py - Youtube https://youtu.be/AT75LBIOAck
RTasa
New Contributor I
175 Views

I have a question for you. Are you trying to get an inference for every video frame coming in at 25 or 30fps? What would happen if you evaluated 1/2 the frames or 1/3. In the application you could display every frame but only get the inference for every other or every third frame coming in. Would you playback at full speed then?
Hyodo__Katsuya
Innovator
175 Views

@Bob T. >Are you trying to get an inference for every video frame coming in at 25 or 30fps? No. However, the first posted video was doing what you asked. >What would happen if you evaluated 1/2 the frames or 1/3. The last post skips a certain number of frames to infer frames. >In the application you could display every frame >but only get the inference for every other or every third frame coming in. I have improved as such. The dance called Capoeira is a very slow movement, so it seems that the movie is moving slowly, but it's equal speed. Video playback and inference are performed asynchronously and the playback situation is as follows. Movie playback = Full frame (30 FPS) Inference = A part of frame (13 FPS) Inference time per frame = abaout 600 ms - 800 ms
RTasa
New Contributor I
175 Views

inference time per frame = about 600 ms - 800 ms? Something doesn't seem right. That is less than 2 fps per stick. Even with 4 sticks that would be less than 8 fps of total inference. Hey Intel, "Why is inference is so slow on these Movidius hardware devices ?
Hyodo__Katsuya
Innovator
175 Views

@Bob T. >That is less than 2 fps per stick. >Even with 4 sticks that would be less than 8 fps of total inference. Your idea is correct. In order to compensate for the delay of inference, I conducted parallel inference to the limit. 4 Stick (4 Thread) × 4 Request = 16 Parallel Python's multithreading is difficult to completely asynchronous due to the problem of Global Interpreter lock (GIL). So, the performance simply does not become 16 times. Thread switching overhead is serious. Actually, I'd like to implement it entirely with MultiProcess, but the OpenVINO API does not seem to correspond to MultiProcess.
RTasa
New Contributor I
175 Views

This is where C++ works so much better. Multi-threading and messaging and controlling a input and output que. If I can get Open Vino installed on my Atom boards I will see what they can do. So far all I get is an error.
Hyodo__Katsuya
Innovator
175 Views

@Bob T. Regrettably, I can hardly write C ++ programs... >So far all I get is an error. What kind of error is displayed at what timing?
RTasa
New Contributor I
175 Views

When its doing the check to see if everything is installed. It looks for http://packages.ros.org/unbuntu xenial I think and it fails with a 404 looking for amd64. I am not on the machine and have not looked.
Peniak__Martin
Beginner
175 Views

Seems too slow indeed. I am getting around 40FPS (mobilnet-ssd) on mini-PCIe Myriad X card plugged into Up board:

https://www.timeless.ninja/blog/the-world-s-first-ai-edge-camera-powered-by-up-squared-and-three-int...

and around 20FPS (mobilnet-ssd) on RPi with NCS 2...I run two models and one is slower so the total FPS right now running both models in parallel is around 12 FPS I think...that's two models on two sticks. If I update the slower model then the speed should be near 20 FPS for both models but right now the inference results are delayed by the slower model.

https://timeless.ninja/blog/the-world-s-first-ai-edge-camera-powered-by-two-intel-myriad-x-vpus

 

Hope this helps

RTasa
New Contributor I
175 Views

The PI is extremely bottlenecked unless you have a fix for that. Using the Movidius's on SDK with 2 NCS sticks delivers about 8 to 12fps maybe. I am not sure how you are getting 20 on a PI. On the UpBoard (not Up2) using a single NCS2 stick I am getting better than PI performance.I will post later.