Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6430 Discussions

Why running on GPU is not significantly faster than running on CPU?

New Contributor I

I have Upsquared board with Intel® Atom ™ E3940 (up to 1.8Ghz), GPU: Intel® Gen 9 HD, supporting 4K Codec Decode and Encode for HEVC4, H.264, VP8.

Running OpenVino's sample code as shown in the link

gave me 10 fps running on CPU with FP32.

But I got only 13 fps running on GPU with FP16.

I was expecting 2 times faster than CPU.

Is that all I can expect 13 fps running on GPU or else I can optimize to have faster fps on GPU.

The image size I am running is 2M resolution.

0 Kudos
2 Replies

Dearest naing, nyan,

Perhaps you can experiment with the OpenVino benchmark_app . With this app, you can try things like Async API, different numbers of iterations, different number of infer requests, different batch sizes, etc...

You are right, normally GPU/FP16 should produce much better performance than  CPU/FP32 compared to what you are seeing. My suggestion is to perform the same experiments first with OpenVino samples. Do you still witness that GPU performance is not significantly better than CPU ? If the OpenVino samples produce more reasonable results, then please compare your code to the OpenVino samples. 

Thanks for using OpenVino,


0 Kudos
New Contributor I

Hi Shubha,

As you suggested I tested a few models.

(1)Testing with person-vehicle-bike-detection-crossroad-0078 for 32 and 16 floating points

CPU(Async) 3 fps

GPU(Async) 5 fps

MYRIDA2(Async) 13 fps

MYRIDA1(Async) 3 fps

So between CPU and GPU is approximately 2 times, is that acceptable?

(2)MobileNetV2 with SSDLite for 32 and 16 floating points

GPU(Async) 14 fps
CPU(Async) 13 fps
Neuro Stick2(Async) 20 fps
Neuro Stick2(Async) 12 fps

Why GPU and CPU are not obviously different?

(3)The application uses two models (MobileNetV2 SSDLite + MobileNetV1 FRCNN)

GPU(Async) 5 fps
CPU(Async) 3 fps

So is that acceptable?

I can't run on Neuro stick 2. Takes very long time to load and stuck at

API version ............ 1.6
    Build .................. 22443
    Description ....... myriadPlugin
[ INFO ] Loading network files:
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs

Why stuck at there?

0 Kudos