Re: LattePanda Alpha + OpenVINO + "CPU (Core m3) vs NCS1 vs NCS2", Performance comparison - Page 2

Hi, Thank you all for the very detailed information. I am doing a project about image segmentation and try first with mask RCNN. However the inference time is quite long, I get about 0.5 FPS.

Today I came across the post of @PINTO using Deeplab and was very impressed about its speed. I get now 10 times faster (5FPS). However, I saw you said that the implementation is not beautiful. Is that mean the performance reduce a lot when we optimize it with Openvino ? How about others models like Unet ? I saw you said it is about 1FPS but do you have some update about it ?

Thank you in advance,

idata · ‎02-13-2019

@dhoa

However, I saw you said that the implementation is not beautiful. Is that mean the performance reduce a lot when we optimize it with Openvino ?

No.

The meaning of "not beautiful" is below.

I have replaced OpenVINO's noncompliant layer with a pure Tensorflow call.

I call Tensorflow a total of two times, pre-processing and post-processing.

Since the OpenVINO tutorial "Offloading Computations to TensorFlow" does not function properly, some layers infer the CPU without using OpenVINO.

https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer#offloading-computations-tensorflow

I just want to process all the layers with the OpenVINO function.

And, I just wanted to say that the program is cumbersome and not concise.

How about others models like Unet ?

Because UNet works only with the OpenVINO function, it is very simple.

I saw you said it is about 1FPS but do you have some update about it ?

No. Since Intel does not publish "caffemodel" and "solver", I want to customize it, but I can not customize it.

idata · ‎02-13-2019

Thanks for your response @PINTO . Anyway in this moment Deeplab is the best choice for speed right ? It can get this performance compare to others model like Unet or Mask RCNN is because of the light architecture or because of Openvino function ? From my case with mask RCNN, I find that Openvino doesn't improve much the speed.

Regards,

Hoa

idata · ‎02-13-2019

@dhoa

It is the difference of the performance of the model.

accuracy：DeepLab Mask RCNN

Japanese articles DeepLab vs Mask RCNN

https://jyuko49.hatenablog.com/entry/2018/11/17/145904

However, the performance improvement of CPU inference by OpenVINO is spectacular. (with MKL-DNN)

It seems to be a mechanism that improves performance as CPU core number and thread number increase.

I think that performance comparison between different models has little meaning.

Accuracy and speed are always a trade-off.

idata · ‎02-13-2019

@PINTO thank you so much for your explanation. I have just one more question. How about at this stage using other framework than Tensorflow with OpenVino. Actually I am more familiar with Pytorch (Fastai) and see that OpenVino not supported yet Pytorch and to use it, one need to convert to ONNX. I have tried one time with SSD and can not run the model optimizer. What is your idea about using Pytorch and Openvino ? Sorry if it is not related much for this thread but I'm quite new in this and have idea from expert is really valuable for me.

idata · ‎02-13-2019

@dhoa

Actually I am more familiar with Pytorch (Fastai) and see that OpenVino not supported yet Pytorch and to use it, one need to convert to ONNX.

What is your idea about using Pytorch and Openvino ?

You do not have to stick to ONNX.

You can try converting to Tensorflow or Caffe with reference to the following.

Since the type of layer supported by OpenVINO varies depending on the conversion target framework, I recommend that you try multiple types of conversion.

PyTorch -> Tensorflow -> OpenVINO

PyTorch -> Caffe -> OpenVINO

https://github.com/PINTO0309/Keras-OneClassAnomalyDetection#13-model-convert

idata · ‎02-13-2019

Thank you so much @PINTO

idata · ‎03-01-2019

Thank you for sharing about performance results. I wanted to follow up with another question regarding performance. From the results above and my own tests, I understand that a i7-7700 cpu is expected to run much faster than with a single NCS. Has anyone tried running multiple NCS with an i7 cpu and observed an improved inference speed? if not,what are some possible ways to further improve inference speeds observed on an i7 cpu with open-vino installed?

I am trying to run inference with Mobilenet/Inception-SSD.

idata · ‎03-01-2019

@sri

Due to the influence of thread switching overhead, when using two NCSs, the performance will not be doubled.

However, you will get 1.5 times more performance.

Although I did not shoot movies, I actually used two or more NCS2 and confirmed the performance improvement.

MobileNet-SSD

Corei7 + NCS2 x1 (48 FPS) 320x240

https://github.com/PINTO0309/MobileNet-SSD-RealSense#usb-camera-mode-ncs2-x-1-stick--core-i7asynchronous-screen-drawing--multistickssdwithrealsense_openvino_ncs2py

https://www.youtube.com/watch?v=Nx_rVDgT8uY

The following is a movie that boosted full size Yolo V3 with NCS2 (x4).

Corei7 + NCS2 x1 (4 FPS) -> Corei7 + NCS2 x4 (13 FPS)

I do not strongly recommend using NCS2 if the size of the model exceeds 100 MB.

https://www.youtube.com/watch?v=AT75LBIOAck

When using multiple sticks, inference will be very fast, but the transfer rate of the image may be slow and it may become a bottleneck.

Depending on the resolution, for example, if you are using the USB 2.0 interface, you will not get a performance above a certain FPS.

idata · ‎03-01-2019

Thanks for replying @PINTO. Have you tried using NCS instead of NCS2? I tried doing so and I have seen a very low fps when running cpu vs cpu with ncs. It just seems that its better to not use NCS at all. i7-7700 cpu without NCS gives me a much higher inference rate than running my cpu with MYRIAD. I dont seem to see any performance improvement by running NCS with CPU. Is that something you have observed as well?

idata · ‎03-01-2019

@sri

It just seems that its better to not use NCS at all.

If NCS, you are right.

I dont seem to see any performance improvement by running NCS with CPU. Is that something you have observed as well?

Yes. That's right.

MYRIAD2(NCS) is too low in performance.

Instead, CPU optimization with OpenVINO will demonstrate a very good performance.

idata · ‎03-30-2019

@chicagobob123 @PINTO Correct me if I'm wrong, but I thought you'd get no performance gain on the Z8350 because it uses Gen8-LP. Am I right?