Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Concat extremly slow on HD 600 GPU

Gerald
Beginner
1,514 Views

Hello,

I am playing with the VoVNet-Architecture which uses concatenation layers in every module:

    def OSAModule(self, input_tensor, channel, bottleneck, aggr_times=5):
        x = input_tensor
        aggr = []
        for i in range(aggr_times):
            x = self.conv_bn_relu(x, channel)
            aggr.append(x)

        x = Concatenate()(aggr)
        x = self.conv_bn_relu(x, bottleneck, kernel=1)
        return x

Executing this architecture on CPU works fine but running it on an HD 600 GPU device, it gets horribly slow. So I made created a  performance report which shows, that the concatenation is "optimized out" but obviously in a worse way.

Name              | execStatus        | layerType   | execType                       | realTime (ms)   | cpuTime (ms)

OSA_10-0_Conv     | EXECUTED          | Convolution | convolution_gpu_bfyx_f1        | 69.320000       | 0.023000

OSA_10-1_Conv     | EXECUTED          | Convolution | fused_conv_eltwise_gpu_ref     | 293.476000      | 0.024000

OSA_10-2_Conv     | EXECUTED          | Convolution | fused_conv_eltwise_gpu_ref     | 293.353000      | 0.022000

OSA_10_Concat     | OPTIMIZED_OUT     | Concat      | undef                          | 0.000000        | 0.000000

OSA_10_Projection | EXECUTED          | Convolution | convolution_gpu_bfyx_f16_1x1   | 5.434000        | 0.024000

 

Any ideas on how to "unoptimize" this (avoid this optimization)?

 

System: Intel Celeron N4000 - GPU Intel® UHD Graphics 600

OS: Ubuntu 18.04

OpenVino: 2020.2

 

Best Gerald

0 Kudos
1 Solution
Munesh_Intel
Moderator
1,514 Views

Hi Gerald,

Thank you for sharing information about your model and providing the updates.

Optimization wise, the GPU plugin supports algorithms that fuse several operations into one optimized operation.  Among them is ‘Optimizing Layers Out’, where Concatenate layer is optimized out under certain conditions.

More information is available at the following link:

https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_CL_DNN.html#optimizing_layers_out

 

Apart from that, for your information, the GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks, and it is important to note that clDNN support is not optimized for Intel® UHD Graphics 600 processor.

The list of integrated graphics processors that clDNN is optimized for is available at the following link under the section ‘System Requirements’.

https://github.com/intel/clDNN

 

The following paper, ‘Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0’ contains more relevant information, and is available at the following link:

https://software.intel.com/content/www/us/en/develop/articles/accelerate-deep-learning-inference-with-integrated-intel-processor-graphics-rev-2-0.html

 

Regards,

Munesh

 

View solution in original post

0 Kudos
3 Replies
Munesh_Intel
Moderator
1,514 Views

Hi Gerald,

Greetings to you.

VoVNet-Architecture is not currently a supported topology of OpenVINO.

Having said that, moving to your question on how to turn off Concatenate layer from being 'optimized_out', do try adding the following general (framework-agnostic) parameter --finegrain_fusing to your Model Optimizer launch script.

You can obtain more information at the following pages:

https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimization_Techniques.html#disable_fusing

https://docs.openvinotoolkit.org/2020.2/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html

Please share more information about your model, whether it's an object/classification model, the layers used if it's a custom model, command given to Model Optimizer to convert the trained model to Intermediate Representation (IR), sample codes to run the model, and also environment details (versions of Python, CMake, etc.). If possible, please share the trained model files for us to reproduce your issue (files can be shared via Private Message).

Also, do share with us on how you created the performance report that you’ve posted.

 

Regards,

Munesh

0 Kudos
Gerald
Beginner
1,514 Views

Hi Munesh,

thanks for your answer. I'm not allowed to share the model but I hope the following details will you give you a better picture.

Source model format : Caffe 1.0 (Training is done using TF 1.14 and the keras model then converted to caffe)

Programs used/tried : benchmark demo to create the performance report with --detailed_counters

Model code: VoV-Net for keras

I've tried to disable the optimization by using --finegrain_fusing OSA_10_Concat,OSA_20_Concat etc. but the resulting model xml and bin files look exactly the same as the ones without the flag set. And the performance report also shows that the concatenation layer is being optimized.

Additionally, I tried to add the --disable_fusing flag but that also doesn't change anything.

If I execute the same model on a MYRIAD (NCS2) device I can see that the concatenation is being executed.

To me, it seems to be some kind of "on-the-fly"-optimization of the GPU-plug in. Could it be?

0 Kudos
Munesh_Intel
Moderator
1,515 Views

Hi Gerald,

Thank you for sharing information about your model and providing the updates.

Optimization wise, the GPU plugin supports algorithms that fuse several operations into one optimized operation.  Among them is ‘Optimizing Layers Out’, where Concatenate layer is optimized out under certain conditions.

More information is available at the following link:

https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_CL_DNN.html#optimizing_layers_out

 

Apart from that, for your information, the GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks, and it is important to note that clDNN support is not optimized for Intel® UHD Graphics 600 processor.

The list of integrated graphics processors that clDNN is optimized for is available at the following link under the section ‘System Requirements’.

https://github.com/intel/clDNN

 

The following paper, ‘Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0’ contains more relevant information, and is available at the following link:

https://software.intel.com/content/www/us/en/develop/articles/accelerate-deep-learning-inference-with-integrated-intel-processor-graphics-rev-2-0.html

 

Regards,

Munesh

 

0 Kudos
Reply