Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

OpenVINO + IntelGPU performance for larger images

Hamaji__Shinichiro
904 Views

While we evaluate OpenVINO with Intel GPU, we found efficiency becomes worse for high resolution input images than for low resolution ones. Some GFMA/sec numbers we have obtained so far:

ResNet50 (GFMA/sec)    224   384   448   672   896
IrisPlus650_fp16     266.1 265.4 242.4 215.3 155.8
IrisPlus650_fp32     183.6 145.0 151.7 121.5 129.0
UHD630_fp16          193.4 205.5 190.5 170.6 124.2
UHD630_fp32          130.1 124.6 119.7 106.5 102.8
UHD600_fp16           47.0  45.3  33.9  27.6  23.1
UHD600_fp32           30.7  35.2  16.1  19.4  18.2

We ran inference 100 times for each image size (thus GPU was throttled due to heat, IIUC), took the median of elapsed time, and calculated GFMA/sec. Estimated numbers of FMA are ~4GFMA, ~12GFMA, ~16GFMA, ~36GFMA, and ~64GFMA, for resolution 224, 384, 448, 672, and 896, respectively.

 

I think it's a bit unorthodox since accelerators (e.g., nVidia GPU) usually perform better for large inputs, so I'm guessing this is not the best performance of the GPU and the frameworks is failing to fully utilize the GPU. I observed similar tendencies for other CV models, too. As nGraph performs similarly, I'm guessing the culprit is libclDNN, but I'm not sure at all.

 

Here's a shell script to reproduce my experiment:

# Please change this path.

DLDT=/path/to/dldt

curl -O https://raw.githubusercontent.com/pfnet-research/chainer-compiler/eccf127c7bfdd4f16f4e34f4b625d6f5810e7fe0/utils/run_onnx_dldt.py
curl -O https://raw.githubusercontent.com/pfnet-research/chainer-compiler/eccf127c7bfdd4f16f4e34f4b625d6f5810e7fe0/utils/run_onnx_util.py
curl -O http://shinh.skr.jp/t/resnet50.tgz
tar -xvzf resnet50.tgz
for i in resnet50/resnet50_*; do
  PYTHONPATH=$DLDT/model-optimizer:$DLDT/inference-engine/bin/intel64/Release/lib/python_api/python3.6 LD_LIBRARY_PATH=$DLDT/inference-engine/bin/intel64/Release/lib python3 run_onnx_dldt.py $i -I 6 --device=GPU --data_type=FP16
done

Any comments/suggestions will be really appreciate. Thanks!

 

0 Kudos
1 Solution
Shubha_R_Intel
Employee
904 Views

Dear Dear Hamaji, Shinichiro,

Upon further researching this issue we have found that performance improvement is not expected with bigger images in this case. The reason for this is, with your  topology -resnet50 gpu plugin is choosing fully optimized kernels. Such kernels are implemented for this specific case with image_size=224x224 and they are already utilizing gpus very well. With such optimal kernels inferencing bigger images leads to worse performance, so your observations are expected behavior, not a bug. 

Hope it helps.

Thanks,

Shubha

View solution in original post

0 Kudos
4 Replies
Shubha_R_Intel
Employee
904 Views

Dear Hamaji, Shinichiro,

I have escalated this issue to an OpenVino developer. In the meantime, can you kindly download OpenVino 2019R2 and redo your measurements ? Many performance issues have been fixed in this release.

Thanks. Please report your findings here.

Sincerely,

Shubha

0 Kudos
Shubha_R_Intel
Employee
905 Views

Dear Dear Hamaji, Shinichiro,

Upon further researching this issue we have found that performance improvement is not expected with bigger images in this case. The reason for this is, with your  topology -resnet50 gpu plugin is choosing fully optimized kernels. Such kernels are implemented for this specific case with image_size=224x224 and they are already utilizing gpus very well. With such optimal kernels inferencing bigger images leads to worse performance, so your observations are expected behavior, not a bug. 

Hope it helps.

Thanks,

Shubha

0 Kudos
Hamaji__Shinichiro
904 Views

Got it. Thanks for your answer!

0 Kudos
Shubha_R_Intel
Employee
904 Views

Dear Hamaji, Shinichiro,

Of course. Thanks for using OpenVino !

Shubha

0 Kudos
Reply