Dearest naing, nyan,

nnain1 · ‎05-01-2019

I have Upsquared board with Intel® Atom ™ E3940 (up to 1.8Ghz), GPU: Intel® Gen 9 HD, supporting 4K Codec Decode and Encode for HEVC4, H.264, VP8.

Running OpenVino's sample code as shown in the link

gave me 10 fps running on CPU with FP32.

But I got only 13 fps running on GPU with FP16.

I was expecting 2 times faster than CPU.

Is that all I can expect 13 fps running on GPU or else I can optimize to have faster fps on GPU.

The image size I am running is 2M resolution.

Shubha_R_Intel · ‎05-01-2019

Dearest naing, nyan,

Perhaps you can experiment with the OpenVino benchmark_app . With this app, you can try things like Async API, different numbers of iterations, different number of infer requests, different batch sizes, etc...

You are right, normally GPU/FP16 should produce much better performance than CPU/FP32 compared to what you are seeing. My suggestion is to perform the same experiments first with OpenVino samples. Do you still witness that GPU performance is not significantly better than CPU ? If the OpenVino samples produce more reasonable results, then please compare your code to the OpenVino samples.

Thanks for using OpenVino,

Shubha

nnain1 · ‎05-05-2019

Hi Shubha,

As you suggested I tested a few models.

(1)Testing with person-vehicle-bike-detection-crossroad-0078 for 32 and 16 floating points

CPU(Async) 3 fps

GPU(Async) 5 fps

MYRIDA2(Async) 13 fps

MYRIDA1(Async) 3 fps

So between CPU and GPU is approximately 2 times, is that acceptable?

(2)MobileNetV2 with SSDLite for 32 and 16 floating points

GPU(Async) 14 fps
CPU(Async) 13 fps
Neuro Stick2(Async) 20 fps
Neuro Stick2(Async) 12 fps

Why GPU and CPU are not obviously different?

(3)The application uses two models (MobileNetV2 SSDLite + MobileNetV1 FRCNN)

GPU(Async) 5 fps
CPU(Async) 3 fps

So is that acceptable?

I can't run on Neuro stick 2. Takes very long time to load and stuck at

API version ............ 1.6
   Build .................. 22443
   Description ....... myriadPlugin
[ INFO ] Loading network files:
   /home/upsquared/NumberPlate/recognition/frcnn_mobilenet_v1_0.5/OpenvinoModel_2019/fp16/openvino_frcnn_mobilenetv1.xml
   /home/upsquared/NumberPlate/recognition/frcnn_mobilenet_v1_0.5/OpenvinoModel_2019/fp16/openvino_frcnn_mobilenetv1.bin
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs

Why stuck at there?

Why running on GPU is not significantly faster than running on CPU?