Showing results for 
Search instead for 
Did you mean: 
New Contributor I

Raspberry + NCS2 : performance comparison

Jump to solution


I've tested below configurations, for Mobilnet+SSD object detection model, and got below results

    net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
    net = cv2.dnn.readNet(args["xml"], args["bin"])

    net = cv2.dnn.readNet(args["xml"], args["bin"])

║                         ║ OpenCV 4 ║ OpenCV-OpenVino  ║ OpenCV-OpenVino  ║
║                         ║          ║     (IR FP32)    ║  + NCS2(IR FP16) ║
║ Ubuntu 18 on VirtualBox ║  11 FPS  ║      26 FPS      ║         ?        ║
║ Raspberry Pi 3 B+       ║  0.6 FPS ║         ?        ║       8 FPS      ║

According to what was reported by NCS2 official homepage, I expected better performance from NCS2, but I saw similar performance reported by other people. I have below questions:

Q.1) is it possible that the communication between Raspberry and NCS2 be the bottleneck of system? and if move to a board with USB3 port, it get better?

Q.2) while my NCS2 is properly detected by Virtual Box and I can run the demo on get-started page, but for running programs in python, I get below error:

E: [xLink] [    782564] dispatcherEventSend:908	Write failed event -1
E: [xLink] [    794413] dispatcherEventReceive:308	dispatcherEventReceive() Read failed -1 | event 0x7fd96affce80 
E: [xLink] [    794413] eventReader:256	eventReader stopped
E: [ncAPI] [    794413] ncGraphAllocate:1409	Can't read input tensor descriptors of the graph, rc: X_LINK_ERROR

Q.3) while on Ubuntu, I could run FP32 models on CPU target, running same program on Raspberry, generates "failed to initialize Inference Engine backend: Cannot find plugin to use"


0 Kudos
22 Replies
New Contributor I

For my Raspberry Pi 2 model B I can achieve the following efficiency:

NCS1: 9.78 FPS

NCS2: 19.8 FPS

I am able to reproduce the result(19.8 FPS) with NCS2. However, if I reload the blob in every loop(because I am process frames from a video),  the fps will drop to 6.7, any suggestion?

There are frame rates for bragging rights and then there are real frame rates that include the all the overhead needed to actually do something useful

With multi-threaded code I'm able to get ~8.3 fps on a Pi3 B+ with NCS2 and OpenVINO sampling 5 Onvif netcams and "real-time" monitoring on the attached monitor.

Basically one thread per  camera, each camera writes to its own queue.  Another thread reads each queue in sequence and does the inference, writing the output to sixth queue.  The main program (thread) reads this output queue and takes what ever action is required.

Same code on a faster Odroid XU-4 (I hacked seupvars,sh to get it installed) gets about ~15 fps.

OTOH, same code, CPU TARGET, no NCS on an i5 4200U gets ~21 fps,  Using the NCS2 and TARGET MYRAID gets ~22fps.

This suggests in real usage the main bottlenecks are not the actual inference but all the overhead of getting the data in and out and using the inference.



0 Kudos
New Contributor I

@fu, cfu    @Kulecz, Walter

I think main problem with Raspberry Pi board is low RAM size which badly affect blob generation performance. performance of blob generation is even less than inference!. In the case of Yolov3, it is very very worse that Mobilnet+SSD.

As also suggested by @Kulecz, Walter, using alternative ARM boards with 2GB of memory seems to be only solution for me.

0 Kudos
New Contributor I
and this is my update for Raspberry Pi 4 B + NCS2:

║ 4-CPU ║    NCS2 on USB2   ║   NCS2 on USB2   ║    NCS2 on USB3   ║   NCS2 on USB3   ║
║       ║ blob.depth=CV_32F ║ blob.depth=CV_8U ║ blob.depth=CV_32F ║ blob.depth=CV_8U ║
║  3.8  ║         14        ║       21.5       ║        25.8       ║        29        ║


0 Kudos