LattePanda Alpha + OpenVINO + "CPU (Core m3) vs NCS1 vs NCS2", Performance comparison

idata · ‎11-24-2018

Hello everyone.

The "UNet" model did not work in NCSDK, but it worked in OpenVINO.

"UNet" is Semantic Segmentation model.

https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb

Interestingly, The CPU had better performance than "Neural Compute Stick" and "Neural Compute Stick 2".

For the moment, I do not feel utility value for NCS2.

◆Japanese Article

Introducing Ubuntu 16.04 + OpenVINO to Latte Panda Alpha 864 (without OS included) and enjoying Semantic Segmentation with Neural Compute Stick and Neural Compute Stick 2

idata · ‎11-25-2018

I bought four NCS2, I will verify how useful Multiple NCS Devices below is at a later date.

Multiple NCS Devices

https://software.intel.com/en-us/articles/transitioning-from-intel-movidius-neural-compute-sdk-to-openvino-toolkit

idata · ‎11-26-2018

Did you compare the performance of NCS on NCSDK and OpenVINO? I just ran a customized Densenet on NCS@NCSDK, NCS@OpenVINO, NCS@OpenVINO. Only Conv, Concat, Relu and BatchNorm layers exist in this network, and the results are so incredibly different that I'm wondering if I did something wrong… NCS@NCSDK takes 0.45s for one inference while NCS@OpenVINO takes 0.65s, and NCS2@OpenVINO takes 0.001s !?

idata · ‎11-26-2018

@Gemini91

Did you compare the performance of NCS on NCSDK and OpenVINO?

No. My "UNet" model did not work in NCSDK.

Therefore, unfortunately it can not be verified with NCSDK.

NCS, NCS2 = FP16

CPU = FP32

I used the conversion script below.

For FP16 (For NCS/NCS2)

$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP16 \
--input input \
--output output/BiasAdd \
--data_type FP16 \
--batch 1

For FP32 (For CPU)

$ sudo python3 mo_tf.py \
--input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
--output_dir 10_lrmodels/UNet/FP32 \
--input input \
--output output/BiasAdd \
--data_type FP32 \
--batch 1

Because your model and my model type are different, I can not simply compare performance.

If your model can be provided, I may be able to verify.

Same issue

https://ncsforum.movidius.com/discussion/1320/slow-fps-on-neural-compute-stick-2

idata · ‎11-26-2018

I put my model file in dropbox. Do you mind running a test with it on your hardware? The input node is named "input" and has a shape of (1, 32, 840,3). The output node is named "output" and has a shape of (1,1,794745).

https://www.dropbox.com/s/snbgwzj9p2xkwpm/densenet_frozen.pb?dl=0

idata · ‎11-26-2018

@Gemini91

OK.

However, Japan is already late night so I do not have any working hours. Please wait for a few days.

idata · ‎11-26-2018

@Gemini91

I made free time so I measured it.

Unfortunately, NCS2 got the following error and it was impossible to measure.

E: [xLink] [         0] dispatcherEventReceive:308    dispatcherEventReceive() Read failed -4 | event 0x7fa9fb7fdef0 USB_READ_REL_RESP
E: [xLink] [         0] eventReader:254    eventReader stopped
E: [xLink] [         0] dispatcherWaitEventComplete:694    waiting is timeout, sending reset remote event
E: [ncAPI] [         0] ncFifoReadElem:2853    Packet reading is failed.
E: [ncAPI] [         0] ncFifoDestroy:2672    Failed to write to fifo before deleting it!

Again, the CPU is overwhelmingly faster.

All measurement units are milliseconds.

idata · ‎11-28-2018

My latest test results are pretty much consistent with yours. I think OpenVINO is doing some very tricky optimization for CPU Arch inside, so it benefits their CPU the most and the performance boost kinda depends on network structure too.

In fact, the demo programs inside OpenVINO should be an easy an fair test for NCS2. I ran demo_squeezenet_download_convert_run.sh on 1 super old CPU, 1 modern CPU, NCS and NCS2, and the results are as follows,

| | | |

|:---------------------------------:|:------------------:|:---------------------------------------------------:|

| Hardware | Time Consumption | Command |

| Intel® Celeron® Processor J1900 | 42.52ms | demo_squeezenet_download_convert_run.sh -d CPU |

| Intel(R) Xeon(R) CPU E5-1603 v4 | 3.61ms | demo_squeezenet_download_convert_run.sh -d CPU |

| NCS | 28.67ms | demo_squeezenet_download_convert_run.sh -d MYRIAD |

| NCS2 | 9.34ms | demo_squeezenet_download_convert_run.sh -d MYRIAD |

It seems that a modern CPU with OpenVINO is indeed much faster than NCS2

idata · ‎11-28-2018

@Gemini91

Thank you for providing detailed information.

It was very helpful.

It seems to be meaningful for the combination of low performance CPU and NCS2.

idata · ‎11-29-2018

@PINTO Thanks for provide info about NCS and NCS2 performance.But Power Consumption is also important. btw, may I ask NCS2 can achieve MTCNN with OpenVINO?

idata · ‎11-29-2018

@curry_best

310mA - 370mA with USB2.0 port.

Unfortunately, my measuring device does not support USB 3.0.

may I ask NCS2 can achieve MTCNN with OpenVINO?

Since OpenVINO accepts only input of fixed scale and fixed batch, I think that it will not move without trying.

Probably, unless you devise something, the standard repository program will not work.

https://github.com/ipazc/mtcnn.git

https://github.com/AITTSMD/MTCNN-Tensorflow.git

https://github.com/CongWeilin/mtcnn-caffe.git

If possible, I would like you to try it.

idata · ‎11-29-2018

@PINTO thanks, I would like to try it if I get a NCS2.

idata · ‎11-29-2018

@curry_best

Here's a demo sample of face landmark detection.

1.face detection

2.gender

3.head pose

4.emotions

5.facial landmarks

https://software.intel.com/en-us/articles/OpenVINO-InferEngine#inpage-nav-7-12

idata · ‎11-29-2018

Hello.

I implemented real-time semantic segmentation with OpenVINO and CPU only (LattePanda Alpha).

0.9 FPS - 1.0 FPS

OpenVINO + ADAS(Semantic Segmentaion) + Python3.5

https://github.com/PINTO0309/OpenVINO-ADAS.git

https://youtu.be/R0dtm30qazM

idata · ‎11-29-2018

So skip the stick spend more money on an i7?

idata · ‎11-29-2018

@chicagobob123

By the way, my CPU is Core m3, so I think it's a bit cheap.

Based on the results of the survey, if you boost the speed to the maximum without GPU, I think you should use i7 or higher.

However, I do not recommend it at all because buying NCS2 and i7 costs high.

And, power consumption also increases according to CPU performance.

I think that it is good to purchase NCS2 after waiting until OpenVINO is compatible with ARM.

idata · ‎11-30-2018

Did you get an up board when it was sale for $40 from intel? I got two since they are way more powerful than a pi. Going to see how that works with the ncs.

idata · ‎11-30-2018

@chicagobob123

Did you get an up board when it was sale for $40 from intel?

$40 !? Is not it a mistake of $170?

How affordable!!

It seems I missed the opportunity…

Going to see how that works with the ncs.

If possible, please tell us the result.

Is CPU "Intel Atom"?

idata · ‎11-30-2018

Yes its an Atom processor with similar connections to Pi.

4GB DDR3L-1600

Intel® Atom™ x5-Z8350

Sadly they are 89 dollars again

https://click.intel.com/aaeon-up-board.html

Bob

idata · ‎11-30-2018

@chicagobob123

Thank you for providing the information. Bob.

I am very interested in how much performance it get.

OpenVINO uses "MKL-DNN" to make parallel inference by multithreading inside the CPU and seems to realize high speed.

This seems to be a mechanism to increase the performance depending on the number of CPU cores, the performance of each core, and the total number of threads.

idata · ‎12-01-2018

The boards shipped but now they wont get here until Tuesday. So by the end of the week should have it.

I posted here and suddenly got my download link and grabbed as much as I could while at work. Will try to install the linux version on my old laptop