Solved: and this is my update for - Page 2

hamze60 · ‎01-24-2019

Hello,

I've tested below configurations, for Mobilnet+SSD object detection model, and got below results

"OPENCV":
   net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
   net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
   net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

"OPENVINO_CPU":
   net = cv2.dnn.readNet(args["xml"], args["bin"])
   net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
   net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

"OPENVINO_NCS":
   net = cv2.dnn.readNet(args["xml"], args["bin"])
   net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
   net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

╔═════════════════════════╦══════════╦══════════════════╦══════════════════╗
║                         ║ OpenCV 4 ║ OpenCV-OpenVino  ║ OpenCV-OpenVino  ║
║                         ║          ║     (IR FP32)    ║  + NCS2(IR FP16) ║
╠═════════════════════════╬══════════╬══════════════════╬══════════════════╣
║ Ubuntu 18 on VirtualBox ║  11 FPS  ║      26 FPS      ║         ?        ║
╠═════════════════════════╬══════════╬══════════════════╬══════════════════╣
║ Raspberry Pi 3 B+       ║  0.6 FPS ║         ?        ║       8 FPS      ║
╚═════════════════════════╩══════════╩══════════════════╩══════════════════╝

According to what was reported by NCS2 official homepage, I expected better performance from NCS2, but I saw similar performance reported by other people. I have below questions:

Q.1) is it possible that the communication between Raspberry and NCS2 be the bottleneck of system? and if move to a board with USB3 port, it get better?

Q.2) while my NCS2 is properly detected by Virtual Box and I can run the demo on get-started page, but for running programs in python, I get below error:

E: [xLink] [    782564] dispatcherEventSend:908	Write failed event -1
E: [xLink] [    794413] dispatcherEventReceive:308	dispatcherEventReceive() Read failed -1 | event 0x7fd96affce80 
E: [xLink] [    794413] eventReader:256	eventReader stopped
E: [ncAPI] [    794413] ncGraphAllocate:1409	Can't read input tensor descriptors of the graph, rc: X_LINK_ERROR

Q.3) while on Ubuntu, I could run FP32 models on CPU target, running same program on Raspberry, generates "failed to initialize Inference Engine backend: Cannot find plugin to use"

Thanks

Dmitry_K_Intel3 · ‎01-27-2019

The thing is that Raspberry Pi has USB 2.0 and to reduce data transfer delay, you can pass uint8 data instead float32. Using IR you may include preprocessing into the model (scaling and mean subtraction). In case of origin model - you can pass it by setInput.

Please try the following code.

import cv2 as cv
import numpy as np
import time

# Load the model
net = cv.dnn.readNet('MobileNetSSD/models/MobileNetSSD_deploy.caffemodel',
                     'MobileNetSSD/models/MobileNetSSD_deploy.prototxt')

# Specify target device
net.setPreferableBackend(cv.dnn.DNN_BACKEND_INFERENCE_ENGINE)
net.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)

# Read an image

img = cv.imread('/home/pi/004545.jpg')

# Prepare input blob and perform an inference
blob = cv.dnn.blobFromImage(img, size=(300, 300), ddepth=cv.CV_8U)
net.setInput(blob, scalefactor=1.0/127.5, mean=[127.5, 127.5, 127.5])

# Warmup
out = net.forward()

start = time.time()

numRuns = 100
for _ in range(numRuns):
  net.forward()

print('FPS: ', numRuns / (time.time() - start))

For my Raspberry Pi 2 model B I can achieve the following efficiency:

NCS1: 9.78 FPS

NCS2: 19.8 FPS

View solution in original post

hamze60 · ‎04-21-2019

@fu, cfu @Kulecz, Walter

I think main problem with Raspberry Pi board is low RAM size which badly affect blob generation performance. performance of blob generation is even less than inference!. In the case of Yolov3, it is very very worse that Mobilnet+SSD.

As also suggested by @Kulecz, Walter, using alternative ARM boards with 2GB of memory seems to be only solution for me.

hamze60 · ‎09-05-2019

and this is my update for Raspberry Pi 4 B + NCS2:

╔═══════╦═══════════════════╦══════════════════╦═══════════════════╦══════════════════╗
║ 4-CPU ║    NCS2 on USB2   ║   NCS2 on USB2   ║    NCS2 on USB3   ║   NCS2 on USB3   ║
║       ║ blob.depth=CV_32F ║ blob.depth=CV_8U ║ blob.depth=CV_32F ║ blob.depth=CV_8U ║
╠═══════╬═══════════════════╬══════════════════╬═══════════════════╬══════════════════╣
║  3.8  ║         14        ║       21.5       ║        25.8       ║        29        ║
╚═══════╩═══════════════════╩══════════════════╩═══════════════════╩══════════════════╝

Raspberry + NCS2 : performance comparison