- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am playing with the VoVNet-Architecture which uses concatenation layers in every module:
def OSAModule(self, input_tensor, channel, bottleneck, aggr_times=5): x = input_tensor aggr = [] for i in range(aggr_times): x = self.conv_bn_relu(x, channel) aggr.append(x) x = Concatenate()(aggr) x = self.conv_bn_relu(x, bottleneck, kernel=1) return x
Executing this architecture on CPU works fine but running it on an HD 600 GPU device, it gets horribly slow. So I made created a performance report which shows, that the concatenation is "optimized out" but obviously in a worse way.
Name | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) OSA_10-0_Conv | EXECUTED | Convolution | convolution_gpu_bfyx_f1 | 69.320000 | 0.023000 OSA_10-1_Conv | EXECUTED | Convolution | fused_conv_eltwise_gpu_ref | 293.476000 | 0.024000 OSA_10-2_Conv | EXECUTED | Convolution | fused_conv_eltwise_gpu_ref | 293.353000 | 0.022000 OSA_10_Concat | OPTIMIZED_OUT | Concat | undef | 0.000000 | 0.000000 OSA_10_Projection | EXECUTED | Convolution | convolution_gpu_bfyx_f16_1x1 | 5.434000 | 0.024000
Any ideas on how to "unoptimize" this (avoid this optimization)?
System: Intel Celeron N4000 - GPU Intel® UHD Graphics 600
OS: Ubuntu 18.04
OpenVino: 2020.2
Best Gerald
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gerald,
Thank you for sharing information about your model and providing the updates.
Optimization wise, the GPU plugin supports algorithms that fuse several operations into one optimized operation. Among them is ‘Optimizing Layers Out’, where Concatenate layer is optimized out under certain conditions.
More information is available at the following link:
Apart from that, for your information, the GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks, and it is important to note that clDNN support is not optimized for Intel® UHD Graphics 600 processor.
The list of integrated graphics processors that clDNN is optimized for is available at the following link under the section ‘System Requirements’.
https://github.com/intel/clDNN
The following paper, ‘Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0’ contains more relevant information, and is available at the following link:
Regards,
Munesh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gerald,
Greetings to you.
VoVNet-Architecture is not currently a supported topology of OpenVINO.
Having said that, moving to your question on how to turn off Concatenate layer from being 'optimized_out', do try adding the following general (framework-agnostic) parameter --finegrain_fusing to your Model Optimizer launch script.
You can obtain more information at the following pages:
Please share more information about your model, whether it's an object/classification model, the layers used if it's a custom model, command given to Model Optimizer to convert the trained model to Intermediate Representation (IR), sample codes to run the model, and also environment details (versions of Python, CMake, etc.). If possible, please share the trained model files for us to reproduce your issue (files can be shared via Private Message).
Also, do share with us on how you created the performance report that you’ve posted.
Regards,
Munesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Munesh,
thanks for your answer. I'm not allowed to share the model but I hope the following details will you give you a better picture.
Source model format : Caffe 1.0 (Training is done using TF 1.14 and the keras model then converted to caffe)
Programs used/tried : benchmark demo to create the performance report with --detailed_counters
Model code: VoV-Net for keras
I've tried to disable the optimization by using --finegrain_fusing OSA_10_Concat,OSA_20_Concat etc. but the resulting model xml and bin files look exactly the same as the ones without the flag set. And the performance report also shows that the concatenation layer is being optimized.
Additionally, I tried to add the --disable_fusing flag but that also doesn't change anything.
If I execute the same model on a MYRIAD (NCS2) device I can see that the concatenation is being executed.
To me, it seems to be some kind of "on-the-fly"-optimization of the GPU-plug in. Could it be?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gerald,
Thank you for sharing information about your model and providing the updates.
Optimization wise, the GPU plugin supports algorithms that fuse several operations into one optimized operation. Among them is ‘Optimizing Layers Out’, where Concatenate layer is optimized out under certain conditions.
More information is available at the following link:
Apart from that, for your information, the GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks, and it is important to note that clDNN support is not optimized for Intel® UHD Graphics 600 processor.
The list of integrated graphics processors that clDNN is optimized for is available at the following link under the section ‘System Requirements’.
https://github.com/intel/clDNN
The following paper, ‘Accelerate Deep Learning Inference with Integrated Intel® Processor Graphics Rev 2.0’ contains more relevant information, and is available at the following link:
Regards,
Munesh
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page