Solved: Run Demo Script with GPU mode took much more time for a iteration than with CPU mode

Yang__Edward · ‎07-25-2018

OpenVINO version is 2018.2.300

OS Ubuntu 16.04.03

Platform Core i5-8400

=======================================================================================

[Summary]

Compare the performance of GPU & CPU by running demo script "demo_squeezenet_download_convert_run.sh"

- CPU mode, Average running time of one iteration: 2.27222 ms

- GPU mode, Average running time of one iteration: 5.77997 ms

It's unreasonable that GPU took much longer for an iteration. Could anyone advise what happened and how to improve GPU performance? Thanks!

Edward

=======================================================================================

[Output Message: CPU Mode]

Run Inference Engine classification sample

Run ./classification_sample -d CPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml

[ INFO ] InferenceEngine:

API version ............ 1.1

Build .................. 11653

[ INFO ] Parsing input parameters

[ INFO ] Loading plugin

API version ............ 1.1

Build .................. lnx_20180510

Description ....... MKLDNNPlugin

[ INFO ] Loading network files:

/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml

/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.bin

[ INFO ] Preparing input blobs

[ WARNING ] Image is resized from (787, 259) to (227, 227)

[ INFO ] Batch size is 1

[ INFO ] Preparing output blobs

[ INFO ] Loading model to the plugin

[ INFO ] Starting inference (1 iterations)

[ INFO ] Average running time of one iteration: 2.27222 ms

[ INFO ] Processing output blobs

Top 10 results:

Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png

817 0.8363345 label sports car, sport car

511 0.0946488 label convertible

479 0.0419131 label car wheel

751 0.0091071 label racer, race car, racing car

436 0.0068161 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon

656 0.0037564 label minivan

586 0.0025741 label half track

717 0.0016069 label pickup, pickup truck

864 0.0012027 label tow truck, tow car, wrecker

581 0.0005882 label grille, radiator grille

[ INFO ] Execution successful

=======================================================================================

[Output Message: GPU Mode]

Run Inference Engine classification sample

Run ./classification_sample -d GPU -i /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png -m /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml

[ INFO ] InferenceEngine:

API version ............ 1.1

Build .................. 11653

[ INFO ] Parsing input parameters

[ INFO ] Loading plugin

API version ............ 1.1

Build .................. ci-main-03703

Description ....... clDNNPlugin

[ INFO ] Loading network files:

/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.xml

/opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/ir/squeezenet1.1/squeezenet1.1.bin

[ INFO ] Preparing input blobs

[ WARNING ] Image is resized from (787, 259) to (227, 227)

[ INFO ] Batch size is 1

[ INFO ] Preparing output blobs

[ INFO ] Loading model to the plugin

[ INFO ] Starting inference (1 iterations)

[ INFO ] Average running time of one iteration: 5.77997 ms

[ INFO ] Processing output blobs

Top 10 results:

Image /opt/intel/computer_vision_sdk/deployment_tools/demo/../demo/car.png

817 0.8363329 label sports car, sport car

511 0.0946493 label convertible

479 0.0419136 label car wheel

751 0.0091072 label racer, race car, racing car

436 0.0068162 label beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon

656 0.0037564 label minivan

586 0.0025741 label half track

717 0.0016069 label pickup, pickup truck

864 0.0012027 label tow truck, tow car, wrecker

581 0.0005882 label grille, radiator grille

[ INFO ] Execution successful

Seunghyuk_P_Intel · ‎07-29-2018

Hi Edward,

Sorry for the confusion.

Actually, the sample you choose is not good one to compare performance.

"classification_sample" is to show how to load inference engine, read network, run inference, and read output.

And just one time execution with still image.

There is one thing to understand when you run inference on GPU engine. (clDNN)

It bases OpenCL and it compiles kernels at run time, just once per execution.

So, single run with still image is not a good sample which you can check performance difference.

You'd be better choose other samples if you want to see performance difference.

Please try script I attached in this thread,

You should modify input files and components paths according to your installation env.

This is example for face, age, gender, head pose detection with USB camera.

One is using CPU for all detection and the other is using GPU for all detection.

You will see performance differences, definitely.

Please check "fps" numbers from screen and check CPU usage from "System monitor".

Regards,

Peter.

View solution in original post

Seunghyuk_P_Intel · ‎07-29-2018

Hi Edward,

Sorry for the confusion.

Actually, the sample you choose is not good one to compare performance.

"classification_sample" is to show how to load inference engine, read network, run inference, and read output.

And just one time execution with still image.

There is one thing to understand when you run inference on GPU engine. (clDNN)

It bases OpenCL and it compiles kernels at run time, just once per execution.

So, single run with still image is not a good sample which you can check performance difference.

You'd be better choose other samples if you want to see performance difference.

Please try script I attached in this thread,

You should modify input files and components paths according to your installation env.

This is example for face, age, gender, head pose detection with USB camera.

One is using CPU for all detection and the other is using GPU for all detection.

You will see performance differences, definitely.

Please check "fps" numbers from screen and check CPU usage from "System monitor".

Regards,

Peter.

Yang__Edward · ‎07-31-2018

Hi Peter,

Thanks a lot for sharing the script. I had tried it. It showed significant performance difference between CPU and GPU modes.

regards!

Edward