Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6506 Discussions

Run Demo Script with GPU mode took much more time for a iteration than with CPU mode

Yang__Edward
Beginner
756 Views

OpenVINO version is 2018.2.300

OS Ubuntu 16.04.03

Platform Core i5-8400

=======================================================================================

[Summary]

Compare the performance of GPU & CPU by running demo script "demo_squeezenet_download_convert_run.sh

- CPU mode,  Average running time of one iteration: 2.27222 ms

- GPU mode, Average running time of one iteration: 5.77997 ms

It's unreasonable that GPU took much longer for an iteration. Could anyone advise what happened and how to improve GPU performance? Thanks!

Edward

=======================================================================================

[Output Message: CPU Mode]

Run Inference Engine classification sample
 
Run ./classification_sample -d CPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml
 
[ INFO ] InferenceEngine: 
API version ............ 1.1
Build .................. 11653
[ INFO ] Parsing input parameters
[ INFO ] Loading plugin
 
API version ............ 1.1
Build .................. lnx_20180510
Description ....... MKLDNNPlugin
[ INFO ] Loading network files:
/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml
/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Average running time of one iteration: 2.27222 ms
[ INFO ] Processing output blobs
 
Top 10 results:
 
Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
 
817 0.8363345 label sports car, sport car
511 0.0946488 label convertible
479 0.0419131 label car wheel
751 0.0091071 label racer, race car, racing car
436 0.0068161 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
656 0.0037564 label minivan
586 0.0025741 label half track
717 0.0016069 label pickup, pickup truck
864 0.0012027 label tow truck, tow car, wrecker
581 0.0005882 label grille, radiator grille
 
[ INFO ] Execution successful

 

=======================================================================================

[Output Message: GPU Mode]

Run Inference Engine classification sample
 
Run ./classification_sample -d GPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml
 
[ INFO ] InferenceEngine: 
API version ............ 1.1
Build .................. 11653
[ INFO ] Parsing input parameters
[ INFO ] Loading plugin
 
API version ............ 1.1
Build .................. ci-main-03703
Description ....... clDNNPlugin
[ INFO ] Loading network files:
/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml
/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Average running time of one iteration: 5.77997 ms
[ INFO ] Processing output blobs
 
Top 10 results:
 
Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png
 
817 0.8363329 label sports car, sport car
511 0.0946493 label convertible
479 0.0419136 label car wheel
751 0.0091072 label racer, race car, racing car
436 0.0068162 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
656 0.0037564 label minivan
586 0.0025741 label half track
717 0.0016069 label pickup, pickup truck
864 0.0012027 label tow truck, tow car, wrecker
581 0.0005882 label grille, radiator grille
 
[ INFO ] Execution successful

 

0 Kudos
1 Solution
Seunghyuk_P_Intel
756 Views

Hi Edward,

Sorry for the confusion.

Actually, the sample you choose is not good one to compare performance.

"classification_sample" is to show how to load inference engine, read network, run inference, and read output.

And just one time execution with still image.

There is one thing to understand when you run inference on GPU engine. (clDNN)

It bases OpenCL and it compiles kernels at run time, just once per execution.

So, single run with still image is not a good sample which you can check performance difference.

You'd be better choose other samples if you want to see performance difference.

Please try script I attached in this thread,

You should modify input files and components paths according to your installation env.

This is example for face, age, gender, head pose detection with USB camera.

One is using CPU for all detection and the other is using GPU for all detection.

You will see performance differences, definitely.

Please check "fps" numbers from screen and check CPU usage from "System monitor".

Regards,

Peter.

View solution in original post

0 Kudos
2 Replies
Seunghyuk_P_Intel
757 Views

Hi Edward,

Sorry for the confusion.

Actually, the sample you choose is not good one to compare performance.

"classification_sample" is to show how to load inference engine, read network, run inference, and read output.

And just one time execution with still image.

There is one thing to understand when you run inference on GPU engine. (clDNN)

It bases OpenCL and it compiles kernels at run time, just once per execution.

So, single run with still image is not a good sample which you can check performance difference.

You'd be better choose other samples if you want to see performance difference.

Please try script I attached in this thread,

You should modify input files and components paths according to your installation env.

This is example for face, age, gender, head pose detection with USB camera.

One is using CPU for all detection and the other is using GPU for all detection.

You will see performance differences, definitely.

Please check "fps" numbers from screen and check CPU usage from "System monitor".

Regards,

Peter.

0 Kudos
Yang__Edward
Beginner
756 Views

Hi Peter,

Thanks a lot for sharing the script. I had tried it. It showed significant performance difference between CPU and GPU modes. 

regards!

Edward

0 Kudos
Reply