- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I recently purchased an Intel Movidius Neural Compute Stick 2 and I've managed to install OpenVINO on my raspberry pi following the instructions provided on the forum (https://software.intel.com/en-us/articles/OpenVINO-Install-RaspberryPI). What I'm trying to do now is to convert my Keras model to a supported version in order to run it on the Movidius Stick. First of all, is it possible to run a neural model that doesn't take an image as an input?
Thank you in advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Fotis,
> First of all, is it possible to run a neural model that doesn't take an image as an input?
OpenVino supports this. "Other-than-image input" worked fine in my products on both CPU and GPU devices but not sure if I also tried on NCS2. Will try tomorrow and update.. (just having a build issue I need to resolve first before I can test on NCS2).
Cheers,
Nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikos,
First of all thanks for the prompt reply.
The thing is that I have a keras model for audio signal processing and I want to run it on my NCS2, connected on a raspberry pi. I have successfully installed the openVINO on the raspberry pi (according to the instructions provided on the forum), so what I am trying to do now is to convert the keras model in order to run it on the NCS2. From what I understood the model conversion is not possible on the raspberry pi, but even on an ubuntu machine I am still not sure how to convert an "other-than-image input" model.
Cheers,
Fotis
UPDATE: I managed to convert the model with the mo_tf.py script but now I am not sure how to run it on the NCS2. After converting the model to the format needed for the NCS2 (bin and xml files) I tried to load it on the raspberry pi but when I type the command net = IENetwork(model="tf_model.bin",weights="tf_model.xml") I get the following error:
RuntimeError: Error reading network: input must have dimensions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you used the --input_shape parameter of mo_tf.py ?
BTW computer_vision_sdk_2018.5.445/deployment_tools/documentation/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html has some examples how to convert networks for speech.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also please make sure you set input / output precision correctly , like for example
input_data->setPrecision(Precision::U8); input_data->setLayout(Layout::NCHW); // ?
There are a few options
** * @enum Layout * @brief Layouts that the inference engine supports */ enum Layout : uint8_t { ANY = 0, // "any" layout // I/O data layouts NCHW = 1, NHWC = 2, NCDHW = 3, NDHWC = 4, // weight layouts OIHW = 64, // bias layouts C = 96, // Single image layout (for mean image) CHW = 128, // 2D HW = 192, NC = 193, CN = 194, BLOCKED = 200, };
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, I wasn't able to convert the model without setting the input shape. Thanks, I'll check the examples for speech models. Regarding the precision parameters, where do I define these?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Regarding the precision parameters, where do I define these?
In the inference application - for example in the case of C++ see line 619 of
computer_vision_sdk_2018.5.445/deployment_tools/inference_engine/samples/speech_sample/main.cpp
inputPrecision and layout is set
/** configure input precision if model loaded from IR **/ for (auto &item : inputInfo) { Precision inputPrecision = Precision::FP32; // specify Precision::I16 to provide quantized inputs item.second->setPrecision(inputPrecision); item.second->getInputData()->layout = NC; // row major layout }
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Niko,
The thing is that I can't even load the model to specify these parameters or change anything. I'm also using python so I'm trying to figure out what's going on because there are no examples for speech. I tried many different things on the model conversion but I'm still getting the "input must have dimensions" error when I try to load the model.
Thanks
EDIT: I got it finally to work, after specifying certain parameters for the conversion. I just need to figure out how to run the model now, I'll post if I have any more issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
So, I was able to convert the model and run it on the NCS2 and on the raspberry pi, but so far I'm getting noisy outputs and I don't know the cause. First of all, I used data type FP16 to convert the model and run it on the NCS2 (it wasn't possible with FP32) but I've noticed that the output has 'float32' dtype whatever the input data type is. How can I check the input/output precision of the converted model on Python?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Foti,
May be better to validate FP32 on CPU device first and then move to NCS2 FP16 (-d MYRAD ); would be less deltas and easier to track discrepancies.
Sorry I am not sure about Python API and input/output precision or validation options. My end to end to workflow is using C++ and offers flexibility to adjust precision and also validate and compare results to my reference implementation. I am sure Python API allows all this but never used it :-)
Cheers,
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again Niko!
I'm not able to test the FP32 on a raspberry pi because there is no plugin for CPU right now (only for MYRIAD). I've checked the input/outputs layers of the FP16 model and they are FP16 precision, however the output that I am getting from the NCS2 is in 'float32' format. I am still getting a kind of periodic noise to the output...
However, I found that there is a number of unsupported layers (Const) by the plugin for MYRIAD. What should I do in this case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Foti,
> not able to test the FP32 on a raspberry pi because there is no plugin for CPU right now (only for MYRIAD).
Perhaps you could try on a x86 Linux or even Windows platform. Validating on FP32 CPU is essential in your case before moving to pi and NCS for a number of reasons.
> they are FP16 precision, however the output that I am getting from the NCS2 is in 'float32'
That's not an issue. I am also getting FP32 out from FP16 inference. Again this becomes irrelevant in the case of FP32 valldation.
> However, I found that there is a number of unsupported layers (Const) by the plugin for MYRIAD. What should I do in this case?
I think this is the most important issue. Again on CPU FP32 may have no issues. Need to test CPU FP32 and see if they are supported there.
FWIW in my experience a smoother validation and dev workflow is
Native -> CPU FP32 -> run validation app
CPU FP32 -> GPU FP16 -> validate FP16
GPU FP16 -> NCS FP16 -> Validate on NCS
It is slower but makes it easier to track issues.
If it all fails then comparison of results layer by layer as done in another post ( reference ( https://software.intel.com/en-us/forums/computer-vision/topic/801760 by Nikolaev, Viktor )
Cheers,
Nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Niko,
Yesterday I tried running the FP32 model on an ubuntu machine using CPU but I got a "buffer overrun" error (when I tried to load the network to the plugin). I looked a bit for a solution but I didn't find anything. I guess I'll try this layer by layer comparison to see what happens. Thanks!
Cheers,
Fotis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, hard to see what the issue is without more information on model optimizer parameters or workflow in general. Are you converting frozen or non-frozen TensorFlow models or using Caffe or other?
If Caffe supported layers are in https://software.intel.com/en-us/articles/OpenVINO-Using-Caffe#caffe-supported-layers
Tensorflow supported layers in https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow#tensorflow-supported-layers
> Keras model
I can see Keras so assuming you are on Tensorflow backend and freeze to a pb.
For TF custom layers, if needed, there is good documentation how to offload but I am not sure if it would make sense in terms of performance in the case of pi+NCS. Some info in https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer#Tensorflow-models-with-custom-layersSome ideas here from DeepSpeech may help in case more mo_tf.py parameters are needed. Sometimes it is not straightforward to convert TF to IR and it could be the case here that you just need one more parameter and the problem will be solved.
https://software.intel.com/en-us/articles/OpenVINO-Using-tensorflow ( also see section : Supported Layers and the Mapping to Intermediate Representation Layers )
To generate the DeepSpeech Intermediate Representation (IR), provide TensorFlow DeepSpeech model to the Model Optimizer with parameters:
python3 ./mo_tf.py --input_model path_to_model/output_graph.pb \ --freeze_placeholder_with_value input_lengths->[16] \ --input input_node,previous_state_h/read,previous_state_c/read \ --input_shape [1,16,19,26],[1,2048],[1,2048] \ --output raw_logits,lstm_fused_cell/Gather,lstm_fused_cell/Gather_1
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you think that the "Buffer overrun" error that I got for the FP32 model could be caused because of an incorrect conversion?
To get more into detail, I'm converting a keras model to an IR representation and I tried doing both with a frozen and a non-frozen model. I am specifying the input layer name and size and the output layer name to the conversion command (as shown in the example) but I will experiment a bit with the parameters tomorrow to see if this will make a difference.
Fotis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> could be caused because of an incorrect conversion?
Yes, assuming you have no unsupported layers, I think it is possible to be a conversion parameter issue causing the inference engine buffer issue when loading weights. Coincidentally also got the same error two weeks ago and fixed but do not remember the exact issue, poor short-term memory :-) I think it was related to input shape or NCHW vs. NHWC but it was with 2D images not 1D case.
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again Niko,
I tried changing all the different parameters during the conversion but I still get a buffer overrun error when I try to run the FP32 model using the CPU.
Additionally, I changed the 'ReLu' on my keras model and now most of the unsupported layers on the FP16 model for the MYRIAD are gone, but I still get the Input layer as an unsupported layer and the same noisy output. I was wondering what is the correct representation of the Input layer for a model on MYRIAD, because it's weird that the input layer is unsupported.
I also tried to convert and try out the deepspeech model mentioned above, but when I do the conversion I get the following error:
[ ERROR ] -------------------------------------------------
[ ERROR ] ----------------- INTERNAL ERROR ----------------
[ ERROR ] Unexpected exception happened.
[ ERROR ] Please contact Model Optimizer developers and forward the following information:
[ ERROR ] Exception occurred during running replacer "None (<class 'extensions.front.tf.BlockLSTM.BlockLSTM'>)": 7
[ ERROR ] Traceback (most recent call last):
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 114, in apply_replacements
replacer.find_and_replace_pattern(graph)
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/front/common/replacement.py", line 125, in find_and_replace_pattern
apply_pattern(graph, action=self.replace_sub_graph, **self.pattern())
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/middle/pattern_match.py", line 95, in apply_pattern
action(graph, match)
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/front/common/replacement.py", line 189, in replace_sub_graph
self.replace_output_edges(graph, self.gen_output_edges_match(node, self.replace_op(graph, node)))
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/extensions/front/tf/BlockLSTM.py", line 84, in replace_op
[graph.remove_edge(node.in_node(p).id, node.id) for p, input_data in node.in_nodes().items() if p in [5, 6, 7]]
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/extensions/front/tf/BlockLSTM.py", line 84, in <listcomp>
[graph.remove_edge(node.in_node(p).id, node.id) for p, input_data in node.in_nodes().items() if p in [5, 6, 7]]
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/graph/graph.py", line 329, in in_node
return self.in_nodes(control_flow=control_flow)[key]
KeyError: 7The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/main.py", line 325, in main
return driver(argv)
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/main.py", line 267, in driver
mean_scale_values=mean_scale)
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/pipeline/tf.py", line 248, in tf2nx
class_registration.apply_replacements(graph, class_registration.ClassType.FRONT_REPLACER)
File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 127, in apply_replacements
)) from err
Exception: Exception occurred during running replacer "None (<class 'extensions.front.tf.BlockLSTM.BlockLSTM'>)": 7[ ERROR ] ---------------- END OF BUG REPORT --------------
[ ERROR ] -------------------------------------------------
EDIT: I finally managed to (maybe) get a correct output from the converted model using the MYRIAD plugin. I used the "--disable_nhwc_to_nchw" parameter in the conversion and now I don't see this noisy output. However, I now get a new list of unsupported layers and the most important part is that the IR model suddenly got really slow (it takes around 190 ms for 1 iteration). What could be the cause? Also, if I compare the two xml files (before and after the "--disable_nhwc_to_nchw" addition) I see different dimensions for each layer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Foti,
Good find with the --disable_nhwc_to_nchw parameter. Just for the record were you able to run now on CPU FP32 and get valid results?
> However, I now get a new list of unsupported layers
Are you using LSTM? Not sure if is supported or validated yet for MYRIAD. I will ask this question in my old post ( https://software.intel.com/en-us/forums/computer-vision/topic/755432 )
Based on 2018 R5 release notes:
New Features in the 2018 R5 include:
Extends neural network support to include LSTM (long short-term memory) from ONNX*, TensorFlow*& MXNet* frameworks, & 3D convolutional-based networks in preview mode (CPU-only) to support additional, new use cases beyond computer vision.
> and the most important part is that the IR model suddenly got really slow (it takes around 190 ms for 1 iteration).
For that you may want to use the profiler that reports ms per layer and get a better idea of what slows down the execution. of course functionality first, much higher priority.
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Niko,
No, even with the --disable_nhwc_to_nchw parameter the model doesn't work on the CPU. I tried every possible parameter but I am still getting the "cannot create internal buffer. buffer can be overrun" so I don't know how to proceed with this.
The unsupported layers are again of type "Const" but originate from conv layers of the original model. What I did before to remove the unsupported layers was to train the model with a leakyReLu instead, but now I don't really know how to substitute the convolution layers.
The thing is that the model now works on the MYRIAD (I'll verify the output tomorrow but with a first glance I think that it produces a correct output) but it is really slow. How could I find the cause of this at least?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
try to get performance counts (us per layer) using get_perf_counts.
perf_counts = infer_request_handle.get_perf_counts() log.info("Performance counters:") print("{:<70} {:<15} {:<15} {:<15} {:<10}".format('name', 'layer_type', 'exet_type', 'status', 'real_time, us')) for layer, stats in perf_counts.items(): print("{:<70} {:<15} {:<15} {:<15} {:<10}".format(layer, stats['layer_type'], stats['exec_type'], stats['status'], stats['real_time']))
Some examples in
grep perf ./computer_vision_sdk/deployment_tools/inference_engine/samples/python_samples/*
or check the python API docs if more information is needed for performance counters.
cheers
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos wrote:try to get performance counts (us per layer) using get_perf_counts.
perf_counts = infer_request_handle.get_perf_counts() log.info("Performance counters:") print("{:<70} {:<15} {:<15} {:<15} {:<10}".format('name', 'layer_type', 'exet_type', 'status', 'real_time, us')) for layer, stats in perf_counts.items(): print("{:<70} {:<15} {:<15} {:<15} {:<10}".format(layer, stats['layer_type'], stats['exec_type'], stats['status'], stats['real_time']))Some examples in
grep perf ./computer_vision_sdk/deployment_tools/inference_engine/samples/python_samples/*or check the python API docs if more information is needed for performance counters.
cheers
nikos
Hi Niko,
This is what I got with the performance counters:
[ INFO ] Performance counters: name layer_type exet_type status real_time, us LeakyReLU_ ReLU LeakyRelu EXECUTED 80 LeakyReLU_1178 ReLU LeakyRelu EXECUTED 60 LeakyReLU_1179 ReLU LeakyRelu EXECUTED 51 LeakyReLU_1180 ReLU LeakyRelu EXECUTED 71 LeakyReLU_1181 ReLU LeakyRelu EXECUTED 31 LeakyReLU_1182 ReLU LeakyRelu EXECUTED 60 LeakyReLU_1183 ReLU LeakyRelu EXECUTED 31 LeakyReLU_1184 ReLU LeakyRelu EXECUTED 50 LeakyReLU_1185 ReLU LeakyRelu EXECUTED 35 LeakyReLU_1186 ReLU LeakyRelu EXECUTED 36 LeakyReLU_1187 ReLU LeakyRelu EXECUTED 48 LeakyReLU_1188 ReLU LeakyRelu EXECUTED 51 LeakyReLU_1189 ReLU LeakyRelu EXECUTED 46 Receive-Tensor Receive-Tensor Receive-Tensor EXECUTED 0 main_input_noisy@FP16 <Extra> Convert_f32f16 EXECUTED 54 model_1/G_gtlayer/add Eltwise Sum EXECUTED 53 model_1/G_gtlayer/add/Broadcast/ Tile Tile EXECUTED 41465 model_1/G_gtlayer/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 2647 model_1/G_gtlayer/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/G_gtlayer/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 369 model_1/G_gtlayer/convolution/Conv2D/Permute_ Permute Permute EXECUTED 23 model_1/G_gtlayer/convolution/Conv2D/Permute_1124 Permute Permute EXECUTED 56 model_1/G_gtlayer/convolution/ExpandDims Reshape Reshape OPTIMIZED_OUT 0 model_1/G_gtlayer/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/concatenate_1/concat@0@compact Concat Copy EXECUTED 23 model_1/concatenate_1/concat@1@compact Concat Copy EXECUTED 10 model_1/concatenate_2/concat@0@compact Concat Copy EXECUTED 22 model_1/concatenate_2/concat@1@compact Concat Copy EXECUTED 10 model_1/concatenate_3/concat@0@compact Concat Copy EXECUTED 22 model_1/concatenate_3/concat@1@compact Concat Copy EXECUTED 11 model_1/concatenate_4/concat@0@compact Concat Copy EXECUTED 23 model_1/concatenate_4/concat@1@compact Concat Copy EXECUTED 11 model_1/concatenate_5/concat@0@compact Concat Copy EXECUTED 29 model_1/concatenate_5/concat@1@compact Concat Copy EXECUTED 16 model_1/concatenate_6/concat@0@compact Concat Copy EXECUTED 35 model_1/concatenate_6/concat@1@compact Concat Copy EXECUTED 26 model_1/conv1d_1/add Eltwise Sum EXECUTED 39 model_1/conv1d_1/add/Broadcast/ Tile Tile EXECUTED 20753 model_1/conv1d_1/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 1325 model_1/conv1d_1/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_1/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 200 model_1/conv1d_1/convolution/Conv2D/Permute_ Permute Permute EXECUTED 50 model_1/conv1d_1/convolution/Conv2D/Permute_1128 Permute Permute EXECUTED 44 model_1/conv1d_1/convolution/ExpandDims Reshape Reshape EXECUTED 25 model_1/conv1d_1/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_2/add Eltwise Sum EXECUTED 37 model_1/conv1d_2/add/Broadcast/ Tile Tile EXECUTED 20769 model_1/conv1d_2/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 683 model_1/conv1d_2/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_2/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 179 model_1/conv1d_2/convolution/Conv2D/Permute_ Permute Permute EXECUTED 49 model_1/conv1d_2/convolution/Conv2D/Permute_1132 Permute Permute EXECUTED 54 model_1/conv1d_2/convolution/ExpandDims Reshape Reshape EXECUTED 24 model_1/conv1d_2/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_3/add Eltwise Sum EXECUTED 36 model_1/conv1d_3/add/Broadcast/ Tile Tile EXECUTED 10383 model_1/conv1d_3/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 348 model_1/conv1d_3/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_3/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 191 model_1/conv1d_3/convolution/Conv2D/Permute_ Permute Permute EXECUTED 45 model_1/conv1d_3/convolution/Conv2D/Permute_1136 Permute Permute EXECUTED 38 model_1/conv1d_3/convolution/ExpandDims Reshape Reshape EXECUTED 23 model_1/conv1d_3/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_4/add Eltwise Sum EXECUTED 37 model_1/conv1d_4/add/Broadcast/ Tile Tile EXECUTED 10384 model_1/conv1d_4/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 185 model_1/conv1d_4/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_4/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 218 model_1/conv1d_4/convolution/Conv2D/Permute_ Permute Permute EXECUTED 42 model_1/conv1d_4/convolution/Conv2D/Permute_1140 Permute Permute EXECUTED 38 model_1/conv1d_4/convolution/ExpandDims Reshape Reshape EXECUTED 22 model_1/conv1d_4/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_5/add Eltwise Sum EXECUTED 38 model_1/conv1d_5/add/Broadcast/ Tile Tile EXECUTED 5202 model_1/conv1d_5/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 99 model_1/conv1d_5/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_5/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 209 model_1/conv1d_5/convolution/Conv2D/Permute_ Permute Permute EXECUTED 42 model_1/conv1d_5/convolution/Conv2D/Permute_1144 Permute Permute EXECUTED 38 model_1/conv1d_5/convolution/ExpandDims Reshape Reshape EXECUTED 23 model_1/conv1d_5/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_6/add Eltwise Sum EXECUTED 37 model_1/conv1d_6/add/Broadcast/ Tile Tile EXECUTED 5213 model_1/conv1d_6/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 56 model_1/conv1d_6/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_6/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 189 model_1/conv1d_6/convolution/Conv2D/Permute_ Permute Permute EXECUTED 41 model_1/conv1d_6/convolution/Conv2D/Permute_1148 Permute Permute EXECUTED 38 model_1/conv1d_6/convolution/ExpandDims Reshape Reshape EXECUTED 24 model_1/conv1d_6/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv2d_transpose_1/BiasAdd ScaleShift ScaleShift EXECUTED 37 model_1/conv2d_transpose_1/conv2d_transpose Deconvolution Deconvolution EXECUTED 610 model_1/conv2d_transpose_1/conv2d_transpose/Permute_ Permute Permute EXECUTED 57 model_1/conv2d_transpose_1/conv2d_transpose/Permute_1152 Permute Permute EXECUTED 53 model_1/conv2d_transpose_2/BiasAdd ScaleShift ScaleShift EXECUTED 35 model_1/conv2d_transpose_2/conv2d_transpose Deconvolution Deconvolution EXECUTED 1251 model_1/conv2d_transpose_2/conv2d_transpose/Permute_ Permute Permute EXECUTED 70 model_1/conv2d_transpose_2/conv2d_transpose/Permute_1156 Permute Permute EXECUTED 61 model_1/conv2d_transpose_3/BiasAdd ScaleShift ScaleShift EXECUTED 35 model_1/conv2d_transpose_3/conv2d_transpose Deconvolution Deconvolution EXECUTED 1392 model_1/conv2d_transpose_3/conv2d_transpose/Permute_ Permute Permute EXECUTED 97 model_1/conv2d_transpose_3/conv2d_transpose/Permute_1160 Permute Permute EXECUTED 62 model_1/conv2d_transpose_4/BiasAdd ScaleShift ScaleShift EXECUTED 38 model_1/conv2d_transpose_4/conv2d_transpose Deconvolution Deconvolution EXECUTED 1373 model_1/conv2d_transpose_4/conv2d_transpose/Permute_ Permute Permute EXECUTED 97 model_1/conv2d_transpose_4/conv2d_transpose/Permute_1164 Permute Permute EXECUTED 89 model_1/conv2d_transpose_5/BiasAdd ScaleShift ScaleShift EXECUTED 37 model_1/conv2d_transpose_5/conv2d_transpose Deconvolution Deconvolution EXECUTED 2680 model_1/conv2d_transpose_5/conv2d_transpose/Permute_ Permute Permute EXECUTED 151 model_1/conv2d_transpose_5/conv2d_transpose/Permute_1168 Permute Permute EXECUTED 69 model_1/conv2d_transpose_6/BiasAdd ScaleShift ScaleShift EXECUTED 40 model_1/conv2d_transpose_6/conv2d_transpose Deconvolution Deconvolution EXECUTED 5247 model_1/conv2d_transpose_6/conv2d_transpose/Permute_ Permute Permute EXECUTED 103 model_1/conv2d_transpose_6/conv2d_transpose/Permute_1172 Permute Permute EXECUTED 98 model_1/conv2d_transpose_7/BiasAdd Power Power EXECUTED 40 model_1/conv2d_transpose_7/conv2d_transpose Deconvolution Deconvolution EXECUTED 10310 model_1/conv2d_transpose_7/conv2d_transpose/Permute_ Permute Permute EXECUTED 159 model_1/conv2d_transpose_7/conv2d_transpose/Permute_1176 Permute Permute EXECUTED 22 model_1/g_output/Reshape Reshape Reshape OPTIMIZED_OUT 0 model_1/g_output/Reshape@FP16 <Extra> Convert_f16f32 EXECUTED 32 model_1/reshape_1/Reshape Reshape Reshape EXECUTED 62 model_1/reshape_10/Reshape Reshape Reshape EXECUTED 1336 model_1/reshape_11/Reshape Reshape Reshape EXECUTED 232 model_1/reshape_12/Reshape Reshape Reshape EXECUTED 2659 model_1/reshape_13/Reshape Reshape Reshape EXECUTED 293 model_1/reshape_2/Reshape Reshape Reshape EXECUTED 107 model_1/reshape_3/Reshape Reshape Reshape EXECUTED 127 model_1/reshape_4/Reshape Reshape Reshape EXECUTED 189 model_1/reshape_5/Reshape Reshape Reshape EXECUTED 241 model_1/reshape_6/Reshape Reshape Reshape EXECUTED 351 model_1/reshape_7/Reshape Reshape Reshape EXECUTED 373 model_1/reshape_8/Reshape Reshape Reshape EXECUTED 693 model_1/reshape_9/Reshape Reshape Reshape EXECUTED 388
And this is what I get from the converted model if I don't use the --disable_nhwc_to_nchw flag
[ INFO ] Performance counters: name layer_type exet_type status real_time, us LeakyReLU_ ReLU LeakyRelu EXECUTED 64 LeakyReLU_1129 ReLU LeakyRelu EXECUTED 44 LeakyReLU_1130 ReLU LeakyRelu EXECUTED 28 LeakyReLU_1131 ReLU LeakyRelu EXECUTED 80 LeakyReLU_1132 ReLU LeakyRelu EXECUTED 60 LeakyReLU_1133 ReLU LeakyRelu EXECUTED 67 LeakyReLU_1134 ReLU LeakyRelu EXECUTED 33 LeakyReLU_1135 ReLU LeakyRelu EXECUTED 28 LeakyReLU_1136 ReLU LeakyRelu EXECUTED 50 LeakyReLU_1137 ReLU LeakyRelu EXECUTED 44 LeakyReLU_1138 ReLU LeakyRelu EXECUTED 44 LeakyReLU_1139 ReLU LeakyRelu EXECUTED 51 LeakyReLU_1140 ReLU LeakyRelu EXECUTED 33 Receive-Tensor Receive-Tensor Receive-Tensor EXECUTED 0 main_input_noisy@FP16 <Extra> Convert_f32f16 EXECUTED 56 model_1/G_gtlayer/add ScaleShift ScaleShift EXECUTED 83 model_1/G_gtlayer/convolution/Conv2D Convolution Conv EXECUTED 576 model_1/G_gtlayer/convolution/Conv2D/Permute_ Permute Permute EXECUTED 106 model_1/G_gtlayer/convolution/ExpandDims Reshape Reshape OPTIMIZED_OUT 0 model_1/G_gtlayer/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/concatenate_1/concat@0@compact Concat Copy EXECUTED 24 model_1/concatenate_1/concat@1@compact Concat Copy EXECUTED 10 model_1/concatenate_2/concat@0@compact Concat Copy EXECUTED 22 model_1/concatenate_2/concat@1@compact Concat Copy EXECUTED 11 model_1/concatenate_3/concat@0@compact Concat Copy EXECUTED 23 model_1/concatenate_3/concat@1@compact Concat Copy EXECUTED 10 model_1/concatenate_4/concat@0@compact Concat Copy EXECUTED 24 model_1/concatenate_4/concat@1@compact Concat Copy EXECUTED 12 model_1/concatenate_5/concat@0@compact Concat Copy EXECUTED 28 model_1/concatenate_5/concat@1@compact Concat Copy EXECUTED 20 model_1/concatenate_6/concat@0@compact Concat Copy EXECUTED 35 model_1/concatenate_6/concat@1@compact Concat Copy EXECUTED 26 model_1/conv1d_1/add ScaleShift ScaleShift EXECUTED 57 model_1/conv1d_1/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 209 model_1/conv1d_1/convolution/Conv2D/Permute_ Permute Permute EXECUTED 43 model_1/conv1d_1/convolution/ExpandDims Reshape Reshape EXECUTED 322 model_1/conv1d_1/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_2/add ScaleShift ScaleShift EXECUTED 54 model_1/conv1d_2/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 181 model_1/conv1d_2/convolution/Conv2D/Permute_ Permute Permute EXECUTED 38 model_1/conv1d_2/convolution/ExpandDims Reshape Reshape EXECUTED 182 model_1/conv1d_2/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_3/add ScaleShift ScaleShift EXECUTED 46 model_1/conv1d_3/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 198 model_1/conv1d_3/convolution/Conv2D/Permute_ Permute Permute EXECUTED 37 model_1/conv1d_3/convolution/ExpandDims Reshape Reshape EXECUTED 249 model_1/conv1d_3/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_4/add ScaleShift ScaleShift EXECUTED 46 model_1/conv1d_4/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 213 model_1/conv1d_4/convolution/Conv2D/Permute_ Permute Permute EXECUTED 36 model_1/conv1d_4/convolution/ExpandDims Reshape Reshape EXECUTED 209 model_1/conv1d_4/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_5/add ScaleShift ScaleShift EXECUTED 42 model_1/conv1d_5/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 214 model_1/conv1d_5/convolution/Conv2D/Permute_ Permute Permute EXECUTED 36 model_1/conv1d_5/convolution/ExpandDims Reshape Reshape EXECUTED 195 model_1/conv1d_5/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv1d_6/add ScaleShift ScaleShift EXECUTED 40 model_1/conv1d_6/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 192 model_1/conv1d_6/convolution/Conv2D/Permute_ Permute Permute EXECUTED 36 model_1/conv1d_6/convolution/ExpandDims Reshape Reshape EXECUTED 106 model_1/conv1d_6/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 model_1/conv2d_transpose_1/conv2d_transpose Deconvolution Deconvolution EXECUTED 612 model_1/conv2d_transpose_1/conv2d_transpose@biases Deconvolution Bias EXECUTED 41 model_1/conv2d_transpose_2/conv2d_transpose Deconvolution Deconvolution EXECUTED 1258 model_1/conv2d_transpose_2/conv2d_transpose@biases Deconvolution Bias EXECUTED 41 model_1/conv2d_transpose_3/conv2d_transpose Deconvolution Deconvolution EXECUTED 1400 model_1/conv2d_transpose_3/conv2d_transpose@biases Deconvolution Bias EXECUTED 42 model_1/conv2d_transpose_4/conv2d_transpose Deconvolution Deconvolution EXECUTED 1381 model_1/conv2d_transpose_4/conv2d_transpose@biases Deconvolution Bias EXECUTED 42 model_1/conv2d_transpose_5/conv2d_transpose Deconvolution Deconvolution EXECUTED 2683 model_1/conv2d_transpose_5/conv2d_transpose@biases Deconvolution Bias EXECUTED 42 model_1/conv2d_transpose_6/conv2d_transpose Deconvolution Deconvolution EXECUTED 5253 model_1/conv2d_transpose_6/conv2d_transpose@biases Deconvolution Bias EXECUTED 44 model_1/conv2d_transpose_7/conv2d_transpose Deconvolution Deconvolution EXECUTED 10313 model_1/conv2d_transpose_7/conv2d_transpose@biases Deconvolution Bias EXECUTED 46 model_1/g_output/Reshape Reshape Reshape OPTIMIZED_OUT 0 model_1/g_output/Reshape@FP16 <Extra> Convert_f16f32 EXECUTED 27 model_1/reshape_1/Reshape Reshape Reshape EXECUTED 60 model_1/reshape_10/Reshape Reshape Reshape EXECUTED 1340 model_1/reshape_10/Reshape/Permute_ Permute Permute EXECUTED 63 model_1/reshape_11/Reshape Reshape Reshape EXECUTED 396 model_1/reshape_12/Reshape Reshape Reshape EXECUTED 2644 model_1/reshape_12/Reshape/Permute_ Permute Permute EXECUTED 93 model_1/reshape_13/Reshape Reshape Reshape EXECUTED 605 model_1/reshape_2/Reshape Reshape Reshape EXECUTED 103 model_1/reshape_2/Reshape/Permute_ Permute Permute EXECUTED 41 model_1/reshape_3/Reshape Reshape Reshape EXECUTED 108 model_1/reshape_4/Reshape Reshape Reshape EXECUTED 191 model_1/reshape_4/Reshape/Permute_ Permute Permute EXECUTED 55 model_1/reshape_5/Reshape Reshape Reshape EXECUTED 201 model_1/reshape_6/Reshape Reshape Reshape EXECUTED 352 model_1/reshape_6/Reshape/Permute_ Permute Permute EXECUTED 55 model_1/reshape_7/Reshape Reshape Reshape EXECUTED 378 model_1/reshape_8/Reshape Reshape Reshape EXECUTED 708 model_1/reshape_8/Reshape/Permute_ Permute Permute EXECUTED 83 model_1/reshape_9/Reshape Reshape Reshape EXECUTED 485
Sorry for the long post... The times are not visible here but maybe you could copy the text in a text editor. What I observe is that, except from all the times being higher in the first case, there are also these Broadcast layers added that are really slow to execute.
EDIT: I re-trained the model without a bias in the weights and now it works like a charm with a reasonable time needed for each iteration... It seems that this was the constant operation that was giving all the unsupported layers. However, the input layer still remains in the list of unsupported layers for some reason although it doesn't seem to affect the output. I am not sure though whether it is affecting the performance, because right now each iteration takes about 50 ms to complete for an input/output feature vector of 1024 size, which is reasonable but it's not significantly fast.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page