about 10 FPS
It was 15 FPS in Core i7.
From now on, I will add an implementation of MultiProcess + MultiStick.
I have tried to run vehicle-detection-adas-0002 on Raspberry Pi with NCS2 and it's performance was pretty poor (~4 FPS). Then I tried to run the python script on i7 machine and the performance was also poor (6.46 FPS). Then I tried to use NCS with OpenVINO and the speed was 7.44 FPS, which I don't understand, how is that possible.
I tested this on the video, captured with GoPRO.. haven't tested with webcam yet, but nevertheless don't know how is it possible that NCS performs better than NCS2.
Did you encounter any specific issues that you had to solve in order to achieve such performance? I would appreciate any suggestions in order to achieve better FPS rate.
Then I tried to run the python script on i7 machine and the performance was also poor (6.46 FPS).
Our benchmark knows that NCS2 is slower than Atom / Core i7 / Core i5 / Core m3.
Using OpenVINO on CPU seems to greatly optimize internal processing with the MKLDNN plugin.
NCS2 only outperforms Celeron's CPU.
Then I tried to use NCS with OpenVINO and the speed was 7.44 FPS, which I don't understand, how is that possible.
I will arrange them in order of high performance.
- OpenVINO + GPU (FP16)
- OpenVINO + Intel's CPU (Core i7 or Core i5 or Atom) (FP32)
- OpenVINO + Intel's CPU (Core i7 or Core i5 or Atom) + NCS2 (FP16)
- OpenVINO + armv7l + NCS2 (FP16)
- OpenVINO + armv7l + NCS (FP16)
- OpenVINO + armv7l
Points are described below.
- NCS2 is slower than Intel's CPU.
- NCS / NCS2 demonstrates its power by combining it with a low-performance CPU like RaspberryPi.
- When used in combination with a high performance CPU, the performance of NCS / NCS2 is very bad.
- When Intel CPU is used, it seems that the inference is parallelized within the CPU by the number of cores times the number of threads.
NCS and NCS2 have no meaning unless carefully selected environment to use.
No, I have not tried it yet.
I just started implementing MultiStick since yesterday.
However, it is not a difficult task, so I intend to commit to Github within a few days.
Pinto have you tried 3 movidius ncs 2 sticks
RaspberryPi3 + NCS2.
NCS2 x2 ---> 15 FPS
NCS2 x3 ---> 20 FPS
NCS2 x4 ---> 24 FPS
The OpenVINO API is inconvenient.
MultiProcess can not be used efficiently.
With about $300 of ncs2 sticks there is sometuing very wrong. The ROI is gone and results subpar. I don't think the sticks are useful dro you? You can but a larre panda at 299 and probably do better.
I deliberately made meaningless verification.
Actually, the ARM processor knew that it could not maximize the performance of NCS.
And I realized that ROI was the worst shortly after purchasing NCS2.
As you say, I understand that it is better to use LattePanda Delta / Alpha.
I just dared to show the worst benchmark so that world engineers will not make the wrong choice.
I borrowed 3 out of 4 NCS2 I used for confirmation, so the loss is small.
I am not making products, but a stupid hobby programmer.
Your anything but stupid. The results you have shown has made me move into a different direction. I am not seeing the hardware Movidius provides as useful when you want optimal performance. They are low power which may be useful and if you are not concerned with FPS, as in a doorbell video sensor I think your OK. But if you want to use it on a drone or anything that moves quickly I am not sure its useful.
thinking about this, do you think Movidius can speed up their chip to make it a useful co processor? I guess I just don't understand what the bottleneck is. I like the idea of a low cost coprocessor that can handle the inference.
as in a doorbell video sensor I think your OK. But if you want to use it on a drone or anything that moves quickly I am not sure its useful.
I think the same thing.
do you think Movidius can speed up their chip to make it a useful co processor? I guess I just don't understand what the bottleneck is. I like the idea of a low cost coprocessor that can handle the inference.
I believe that proper performance will not be obtained unless MyriadX is incorporated as SoC.
btw, I am interested in the following devices now.
The first tpu at least has specs. It works with
224x224 max input size; 1.0 max depth multiplier
MobileNet SSD V1/V2
320x320 max input size; 1.0 max depth multiplier
224x224 fixed input size
299x299 fixed input size
Which you can get from any 640x480 camera. Hd not needed.
The third item is kind of pricey, 429$ seems like a lot.
Thank you, bob.
I think I will try the following. The price is affordable and high performance.
- 16.8 TOPs @ 700mW
- 24 TOPs/Watt
- 16.8 TOPs @ 300MHz
- There is a USB type development kit
Hi @PINTO ,
A few days ago, I got a NCS2, and when I run a sample image-classification demo on RaspberryPi+NCS2, I got unexpectedly bad performance. Then I found your Github and forum discussions.
Do you know how it is possible that on official NCS2 page, a large and different number is reported? can you run any project to confirm Movidius benchmark results?
Do you know how it is possible that on official NCS2 page, a large and different number is reported?
Yes. I know.
Intel's benchmark results are obviously benchmark results other than ARM processor + USB 2.0.
As long as RaspberryPi3 is used, 8 times performance is absolutely not obtained.
Because the load of preprocessing and post-processing is high.
It is better to use SBC with Intel processor to maximize performance than using SBC of ARM processor.
OpenVINO + NCS2 is optimized for Intel processors.
can you run any project to confirm Movidius benchmark results?
I have never seen such a benchmark.
However, if you devise logic, 24FPS performance can be obtained with NCS2 x1 even with MobileNet-SSD + RaspberryPi3.