Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6494 Discussions

LattePanda Alpha + OpenVINO + "CPU (Core m3) vs NCS1 vs NCS2", Performance comparison

idata
Employee
7,183 Views

Hello everyone.

 

The "UNet" model did not work in NCSDK, but it worked in OpenVINO.

 

"UNet" is Semantic Segmentation model.

 

https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb

 

Interestingly, The CPU had better performance than "Neural Compute Stick" and "Neural Compute Stick 2".

 

For the moment, I do not feel utility value for NCS2.

 

 

◆Japanese Article

 

Introducing Ubuntu 16.04 + OpenVINO to Latte Panda Alpha 864 (without OS included) and enjoying Semantic Segmentation with Neural Compute Stick and Neural Compute Stick 2
0 Kudos
34 Replies
idata
Employee
4,300 Views

I bought four NCS2, I will verify how useful Multiple NCS Devices below is at a later date.

 

Multiple NCS Devices

 

https://software.intel.com/en-us/articles/transitioning-from-intel-movidius-neural-compute-sdk-to-openvino-toolkit
0 Kudos
idata
Employee
4,300 Views

Did you compare the performance of NCS on NCSDK and OpenVINO? I just ran a customized Densenet on NCS@NCSDK, NCS@OpenVINO, NCS@OpenVINO. Only Conv, Concat, Relu and BatchNorm layers exist in this network, and the results are so incredibly different that I'm wondering if I did something wrong… NCS@NCSDK takes 0.45s for one inference while NCS@OpenVINO takes 0.65s, and NCS2@OpenVINO takes 0.001s !?

0 Kudos
idata
Employee
4,299 Views

@Gemini91

 

 

Did you compare the performance of NCS on NCSDK and OpenVINO?

 

No. My "UNet" model did not work in NCSDK.

 

Therefore, unfortunately it can not be verified with NCSDK.

 

 

NCS, NCS2 = FP16

 

CPU = FP32

 

I used the conversion script below.

 

For FP16 (For NCS/NCS2)

 

$ sudo python3 mo_tf.py \ --input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \ --output_dir 10_lrmodels/UNet/FP16 \ --input input \ --output output/BiasAdd \ --data_type FP16 \ --batch 1

 

For FP32 (For CPU)

 

$ sudo python3 mo_tf.py \ --input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \ --output_dir 10_lrmodels/UNet/FP32 \ --input input \ --output output/BiasAdd \ --data_type FP32 \ --batch 1

 

Because your model and my model type are different, I can not simply compare performance.

 

If your model can be provided, I may be able to verify.

 

Same issue

 

https://ncsforum.movidius.com/discussion/1320/slow-fps-on-neural-compute-stick-2
0 Kudos
idata
Employee
4,298 Views

I put my model file in dropbox. Do you mind running a test with it on your hardware? The input node is named "input" and has a shape of (1, 32, 840,3). The output node is named "output" and has a shape of (1,1,794745).

 

https://www.dropbox.com/s/snbgwzj9p2xkwpm/densenet_frozen.pb?dl=0
0 Kudos
idata
Employee
4,300 Views

@Gemini91

 

OK.

 

However, Japan is already late night so I do not have any working hours. Please wait for a few days.
0 Kudos
idata
Employee
4,300 Views

@Gemini91

 

I made free time so I measured it.

 

Unfortunately, NCS2 got the following error and it was impossible to measure.

 

E: [xLink] [ 0] dispatcherEventReceive:308 dispatcherEventReceive() Read failed -4 | event 0x7fa9fb7fdef0 USB_READ_REL_RESP E: [xLink] [ 0] eventReader:254 eventReader stopped E: [xLink] [ 0] dispatcherWaitEventComplete:694 waiting is timeout, sending reset remote event E: [ncAPI] [ 0] ncFifoReadElem:2853 Packet reading is failed. E: [ncAPI] [ 0] ncFifoDestroy:2672 Failed to write to fifo before deleting it!

 

Again, the CPU is overwhelmingly faster.

 

All measurement units are milliseconds.

 

0 Kudos
idata
Employee
4,300 Views

My latest test results are pretty much consistent with yours. I think OpenVINO is doing some very tricky optimization for CPU Arch inside, so it benefits their CPU the most and the performance boost kinda depends on network structure too.

 

In fact, the demo programs inside OpenVINO should be an easy an fair test for NCS2. I ran demo_squeezenet_download_convert_run.sh on 1 super old CPU, 1 modern CPU, NCS and NCS2, and the results are as follows,

 

| | | |

 

|:---------------------------------:|:------------------:|:---------------------------------------------------:|

 

| Hardware | Time Consumption | Command |

 

| Intel® Celeron® Processor J1900 | 42.52ms | demo_squeezenet_download_convert_run.sh -d CPU |

 

| Intel(R) Xeon(R) CPU E5-1603 v4 | 3.61ms | demo_squeezenet_download_convert_run.sh -d CPU |

 

| NCS | 28.67ms | demo_squeezenet_download_convert_run.sh -d MYRIAD |

 

| NCS2 | 9.34ms | demo_squeezenet_download_convert_run.sh -d MYRIAD |

 

It seems that a modern CPU with OpenVINO is indeed much faster than NCS2

0 Kudos
idata
Employee
4,300 Views

@Gemini91

 

Thank you for providing detailed information.

 

It was very helpful.

 

It seems to be meaningful for the combination of low performance CPU and NCS2.
0 Kudos
idata
Employee
4,300 Views

@PINTO Thanks for provide info about NCS and NCS2 performance.But Power Consumption is also important. btw, may I ask NCS2 can achieve MTCNN with OpenVINO?

0 Kudos
idata
Employee
4,300 Views

@curry_best

 

310mA - 370mA with USB2.0 port.

 

Unfortunately, my measuring device does not support USB 3.0.

 

 

 

may I ask NCS2 can achieve MTCNN with OpenVINO?

 

 

Since OpenVINO accepts only input of fixed scale and fixed batch, I think that it will not move without trying.

 

Probably, unless you devise something, the standard repository program will not work.

 

https://github.com/ipazc/mtcnn.git

 

https://github.com/AITTSMD/MTCNN-Tensorflow.git

 

https://github.com/CongWeilin/mtcnn-caffe.git

 

If possible, I would like you to try it.

0 Kudos
idata
Employee
4,300 Views

@PINTO thanks, I would like to try it if I get a NCS2.

0 Kudos
idata
Employee
4,300 Views

@curry_best

 

Here's a demo sample of face landmark detection.

 

1.face detection

 

2.gender

 

3.head pose

 

4.emotions

 

5.facial landmarks

 

https://software.intel.com/en-us/articles/OpenVINO-InferEngine#inpage-nav-7-12

 

0 Kudos
idata
Employee
4,300 Views

Hello.

 

I implemented real-time semantic segmentation with OpenVINO and CPU only (LattePanda Alpha).

 

0.9 FPS - 1.0 FPS

 

OpenVINO + ADAS(Semantic Segmentaion) + Python3.5

 

https://github.com/PINTO0309/OpenVINO-ADAS.git

 

https://youtu.be/R0dtm30qazM
0 Kudos
idata
Employee
4,300 Views

So skip the stick spend more money on an i7?

0 Kudos
idata
Employee
4,300 Views

@chicagobob123

 

By the way, my CPU is Core m3, so I think it's a bit cheap.

 

Based on the results of the survey, if you boost the speed to the maximum without GPU, I think you should use i7 or higher.

 

However, I do not recommend it at all because buying NCS2 and i7 costs high.

 

And, power consumption also increases according to CPU performance.

 

I think that it is good to purchase NCS2 after waiting until OpenVINO is compatible with ARM.
0 Kudos
idata
Employee
4,300 Views

Did you get an up board when it was sale for $40 from intel? I got two since they are way more powerful than a pi. Going to see how that works with the ncs.

0 Kudos
idata
Employee
4,300 Views

@chicagobob123

 

 

Did you get an up board when it was sale for $40 from intel?

 

$40 !? Is not it a mistake of $170?

 

How affordable!!

 

It seems I missed the opportunity…

 

Going to see how that works with the ncs.

 

If possible, please tell us the result.

 

 

Is CPU "Intel Atom"?

0 Kudos
idata
Employee
4,300 Views

Yes its an Atom processor with similar connections to Pi.

 

4GB DDR3L-1600

 

Intel® Atom™ x5-Z8350

 

Sadly they are 89 dollars again

 

https://click.intel.com/aaeon-up-board.html

 

Bob

0 Kudos
idata
Employee
4,300 Views

@chicagobob123

 

Thank you for providing the information. Bob.

 

I am very interested in how much performance it get.

 

OpenVINO uses "MKL-DNN" to make parallel inference by multithreading inside the CPU and seems to realize high speed.

 

This seems to be a mechanism to increase the performance depending on the number of CPU cores, the performance of each core, and the total number of threads.
0 Kudos
idata
Employee
3,765 Views

The boards shipped but now they wont get here until Tuesday. So by the end of the week should have it.

 

I posted here and suddenly got my download link and grabbed as much as I could while at work. Will try to install the linux version on my old laptop
0 Kudos
Reply