- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I recently purchased an Intel Movidius Neural Compute Stick 2 and I've managed to install OpenVINO on my raspberry pi following the instructions provided on the forum (https://software.intel.com/en-us/articles/OpenVINO-Install-RaspberryPI). What I'm trying to do now is to convert my Keras model to a supported version in order to run it on the Movidius Stick. First of all, is it possible to run a neural model that doesn't take an image as an input?
Thank you in advance.
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right, disabling_nhwc_to_nchw seems to cause overhead to reshape on device as it adds the following and gets really slow...
model_1/G_gtlayer/add/Broadcast/ Tile Tile EXECUTED 41465 model_1/G_gtlayer/add/Broadcast/Reshape/After Reshape Reshape EXECUTED 2647 model_1/G_gtlayer/add/Broadcast/Reshape/Before Reshape Reshape OPTIMIZED_OUT 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
> without a bias in the weights and now it works like a charm
Just saw the edit, good progress!
> but it's not significantly fast.
I think batching may be worth trying if latency allows it; it could speed up execution. MYRIAD plug-in allows / supports batching, right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos wrote:I think batching may be worth trying if latency allows it; it could speed up execution. MYRIAD plug-in allows / supports batching, right?
I think it does but, since I am trying to try out a real-time processing application, I need to get the output of each (one-sized) input as soon as possible so I don't think I can use batch processing.
In case that this nhwc to nchw conversion is the cause of the slow performance, do you think that I could somehow train the model from scratch to avoid using this parameter in the conversion and get a performance increase? I can post you tomorrow the performance counts for the model trained without the bias to check what is causing this delay.
Fotis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> real-time processing application
right - it depends on how much latency you can afford. unless of course you have multiple channels to process in which case you could batch frames from your channels hence no latency.
> nhwc to nchw conversion is the cause of the slow performance, do you think that I could somehow train the model from scratch to avoid using this parameter in the conversion and get a performance increase?
yes I believe that should be possible
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos wrote:right - it depends on how much latency you can afford. unless of course you have multiple channels to process in which case you could batch frames from your channels hence no latency.
The lower the better so I can't really increase batch size. Also, it is a single-channel processing algorithm...
Below you can find the performance counters for the latest model:
[ INFO ] Performance counters: name layer_type exet_type status real_time, us G_gtlayer/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 362 G_gtlayer/convolution/Conv2D/Permute_ Permute Permute EXECUTED 34 G_gtlayer/convolution/Conv2D/Permute_1012 Permute Permute EXECUTED 50 G_gtlayer/convolution/ExpandDims Reshape Reshape OPTIMIZED_OUT 0 G_gtlayer/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 LeakyReLU_ ReLU LeakyRelu EXECUTED 52 LeakyReLU_1066 ReLU LeakyRelu EXECUTED 72 LeakyReLU_1067 ReLU LeakyRelu EXECUTED 61 LeakyReLU_1068 ReLU LeakyRelu EXECUTED 38 LeakyReLU_1069 ReLU LeakyRelu EXECUTED 51 LeakyReLU_1070 ReLU LeakyRelu EXECUTED 79 LeakyReLU_1071 ReLU LeakyRelu EXECUTED 35 LeakyReLU_1072 ReLU LeakyRelu EXECUTED 63 LeakyReLU_1073 ReLU LeakyRelu EXECUTED 51 LeakyReLU_1074 ReLU LeakyRelu EXECUTED 56 LeakyReLU_1075 ReLU LeakyRelu EXECUTED 76 LeakyReLU_1076 ReLU LeakyRelu EXECUTED 79 LeakyReLU_1077 ReLU LeakyRelu EXECUTED 40 Receive-Tensor Receive-Tensor Receive-Tensor EXECUTED 0 concatenate_1/concat@0@compact Concat Copy EXECUTED 25 concatenate_1/concat@1@compact Concat Copy EXECUTED 11 concatenate_2/concat@0@compact Concat Copy EXECUTED 22 concatenate_2/concat@1@compact Concat Copy EXECUTED 10 concatenate_3/concat@0@compact Concat Copy EXECUTED 23 concatenate_3/concat@1@compact Concat Copy EXECUTED 11 concatenate_4/concat@0@compact Concat Copy EXECUTED 23 concatenate_4/concat@1@compact Concat Copy EXECUTED 12 concatenate_5/concat@0@compact Concat Copy EXECUTED 25 concatenate_5/concat@1@compact Concat Copy EXECUTED 13 concatenate_6/concat@0@compact Concat Copy EXECUTED 35 concatenate_6/concat@1@compact Concat Copy EXECUTED 25 conv1d_1/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 454 conv1d_1/convolution/Conv2D/Permute_ Permute Permute EXECUTED 51 conv1d_1/convolution/Conv2D/Permute_1016 Permute Permute EXECUTED 47 conv1d_1/convolution/ExpandDims Reshape Reshape EXECUTED 31 conv1d_1/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 conv1d_2/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 474 conv1d_2/convolution/Conv2D/Permute_ Permute Permute EXECUTED 51 conv1d_2/convolution/Conv2D/Permute_1020 Permute Permute EXECUTED 44 conv1d_2/convolution/ExpandDims Reshape Reshape EXECUTED 34 conv1d_2/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 conv1d_3/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 517 conv1d_3/convolution/Conv2D/Permute_ Permute Permute EXECUTED 44 conv1d_3/convolution/Conv2D/Permute_1024 Permute Permute EXECUTED 40 conv1d_3/convolution/ExpandDims Reshape Reshape EXECUTED 26 conv1d_3/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 conv1d_4/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 535 conv1d_4/convolution/Conv2D/Permute_ Permute Permute EXECUTED 43 conv1d_4/convolution/Conv2D/Permute_1028 Permute Permute EXECUTED 36 conv1d_4/convolution/ExpandDims Reshape Reshape EXECUTED 26 conv1d_4/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 conv1d_5/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 463 conv1d_5/convolution/Conv2D/Permute_ Permute Permute EXECUTED 42 conv1d_5/convolution/Conv2D/Permute_1032 Permute Permute EXECUTED 35 conv1d_5/convolution/ExpandDims Reshape Reshape EXECUTED 21 conv1d_5/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 conv1d_6/convolution/Conv2D Convolution Im2ColConvolution EXECUTED 513 conv1d_6/convolution/Conv2D/Permute_ Permute Permute EXECUTED 41 conv1d_6/convolution/Conv2D/Permute_1036 Permute Permute EXECUTED 36 conv1d_6/convolution/ExpandDims Reshape Reshape EXECUTED 21 conv1d_6/convolution/Squeeze Reshape Reshape OPTIMIZED_OUT 0 conv2d_transpose_1/conv2d_transpose Deconvolution Deconvolution EXECUTED 7526 conv2d_transpose_1/conv2d_transpose/Permute_ Permute Permute EXECUTED 55 conv2d_transpose_1/conv2d_transpose/Permute_1040 Permute Permute EXECUTED 64 conv2d_transpose_2/conv2d_transpose Deconvolution Deconvolution EXECUTED 10421 conv2d_transpose_2/conv2d_transpose/Permute_ Permute Permute EXECUTED 76 conv2d_transpose_2/conv2d_transpose/Permute_1044 Permute Permute EXECUTED 62 conv2d_transpose_3/conv2d_transpose Deconvolution Deconvolution EXECUTED 21130 conv2d_transpose_3/conv2d_transpose/Permute_ Permute Permute EXECUTED 97 conv2d_transpose_3/conv2d_transpose/Permute_1048 Permute Permute EXECUTED 89 conv2d_transpose_4/conv2d_transpose Deconvolution Deconvolution EXECUTED 11079 conv2d_transpose_4/conv2d_transpose/Permute_ Permute Permute EXECUTED 151 conv2d_transpose_4/conv2d_transpose/Permute_1052 Permute Permute EXECUTED 90 conv2d_transpose_5/conv2d_transpose Deconvolution Deconvolution EXECUTED 21736 conv2d_transpose_5/conv2d_transpose/Permute_ Permute Permute EXECUTED 151 conv2d_transpose_5/conv2d_transpose/Permute_1056 Permute Permute EXECUTED 98 conv2d_transpose_6/conv2d_transpose Deconvolution Deconvolution EXECUTED 43171 conv2d_transpose_6/conv2d_transpose/Permute_ Permute Permute EXECUTED 159 conv2d_transpose_6/conv2d_transpose/Permute_1060 Permute Permute EXECUTED 99 conv2d_transpose_7/conv2d_transpose Deconvolution Deconvolution EXECUTED 20233 conv2d_transpose_7/conv2d_transpose/Permute_ Permute Permute EXECUTED 157 conv2d_transpose_7/conv2d_transpose/Permute_1064 Permute Permute EXECUTED 23 g_output/Reshape Reshape Reshape OPTIMIZED_OUT 0 g_output/Reshape@FP16 <Extra> Convert_f16f32 EXECUTED 38 main_input_noisy@FP16 <Extra> Convert_f32f16 EXECUTED 55 reshape_1/Reshape Reshape Reshape EXECUTED 59 reshape_10/Reshape Reshape Reshape EXECUTED 1372 reshape_11/Reshape Reshape Reshape EXECUTED 454 reshape_12/Reshape Reshape Reshape EXECUTED 2664 reshape_13/Reshape Reshape Reshape EXECUTED 288 reshape_2/Reshape Reshape Reshape EXECUTED 111 reshape_3/Reshape Reshape Reshape EXECUTED 192 reshape_4/Reshape Reshape Reshape EXECUTED 193 reshape_5/Reshape Reshape Reshape EXECUTED 240 reshape_6/Reshape Reshape Reshape EXECUTED 371 reshape_7/Reshape Reshape Reshape EXECUTED 466 reshape_8/Reshape Reshape Reshape EXECUTED 691 reshape_9/Reshape Reshape Reshape EXECUTED 387
As you can see, the deconvolutions are the slowest operations right now, so is there maybe a way to fix this?
As I said before, I am trying right now to find a way to train the model without needing to use the disable_nhwc_to_nchw parameter, but I haven't found a way to do this properly yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> As you can see, the deconvolutions are the slowest operations right now, so is there maybe a way to fix this?
Sorry not aware of a way to fix this. I believe the MYRIAD plug-in is not even open sourced yet so we cannot profile or brainstorm optimization opportunities. The only way for this and for disable_nhwc_to_nchw would be to re-architect your network and work based on the new perf data.
Cheers,
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos wrote:Sorry not aware of a way to fix this. I believe the MYRIAD plug-in is not even open sourced yet so we cannot profile or brainstorm optimization opportunities. The only way for this and for disable_nhwc_to_nchw would be to re-architect your network and work based on the new perf data.
Cheers,
nikos
Hi Niko,
You have helped already a lot, I'll try to experiment a bit with the model's architecture and if anything comes up I'll post here again.
Thank you so much once again.
Cheers,
Fotis
EDIT: I changed the order of the channels of the model and re-trained it in order to avoid this disable_nhwc_to_nchw parameter and now it works on the CPU (and really fast also), but unfortunately I get a huge list of unsupported layers on the MYRIAD plugin (ReLU, Concat, Reshape, Permute, Deconvolution) and I get a "RuntimeError: [VPU] Permute has to provide order dimension 4. Layer name is: conv1d_1/transpose" error...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right.. I am a bit confused too on what is supported and what not supported when it comes to NCS
BTW have you seen this list of supported NCS layers from the other SDK V1.12.01 2018-10-05 :
https://movidius.github.io/ncsdk/release_notes.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos wrote:Right.. I am a bit confused too on what is supported and what not supported when it comes to NCS
BTW have you seen this list of supported NCS layers from the other SDK V1.12.01 2018-10-05 :
Yes I have and the layers that I use in my model are supposed to be supported and in fact they were supported before changing the nchw order and the dimensions and now they became unsupported. It is really confusing...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried yesterday to convert the keras model to float16 precision (since myriad uses fp16), but I get this error running the mo_tf script:
Unexpected exception happened during extracting attributes for node main_input_noisy.
Original exception message: Data type is unsupported: 19.
Have you tried using a 'float16' model and is it going to help to create the representation from a 'float16' model?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Have you tried using a 'float16' model
Interesting case! Have not seen any documentation on this - may be unsupported. FWIW I am always pushing FP32 to the optimizer and trust model_optimizer and calibrator to generate FP32 or FP16 / INT8 IR if needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Drakopoulos, Fotis wrote:Quote:
nikos wrote:
Sorry not aware of a way to fix this. I believe the MYRIAD plug-in is not even open sourced yet so we cannot profile or brainstorm optimization opportunities. The only way for this and for disable_nhwc_to_nchw would be to re-architect your network and work based on the new perf data.
Cheers,
nikos
Hi Niko,
You have helped already a lot, I'll try to experiment a bit with the model's architecture and if anything comes up I'll post here again.
Thank you so much once again.
Cheers,
Fotis
EDIT: I changed the order of the channels of the model and re-trained it in order to avoid this disable_nhwc_to_nchw parameter and now it works on the CPU (and really fast also), but unfortunately I get a huge list of unsupported layers on the MYRIAD plugin (ReLU, Concat, Reshape, Permute, Deconvolution) and I get a "RuntimeError: [VPU] Permute has to provide order dimension 4. Layer name is: conv1d_1/transpose" error...
Dear Forits,
I converted my tensorflow model into IR format. However, I got the error same as you . RuntimeError: [VPU] Permute has to provide order dimension 4 (I use the Python version).
I use MYRIAD to inference: python inference.py -m model/ir_format/asr_horovod_ir_test.xml -d MYRIAD -i model/27407_004.wav.
How can I fix the error? All my own layer use Permute have 3 dimensions instead of 4.
Thank you so much
Best regards,
Ha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I converted my tensorflow model into IR format. However, I got the error same as Fotis . RuntimeError: [VPU] Permute has to provide order dimension 4 (I use the Python version).
I use MYRIAD to inference: python inference.py -m model/ir_format/asr_horovod_ir_test.xml -d MYRIAD -i model/27407_004.wav.
How can I fix the error? All my own layer use Permute have 3 dimensions instead of 4.
Thank you so much
Best regards,
HA
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ha, minh quyet wrote:Dear all,
I converted my tensorflow model into IR format. However, I got the error same as Fotis . RuntimeError: [VPU] Permute has to provide order dimension 4 (I use the Python version).
I use MYRIAD to inference: python inference.py -m model/ir_format/asr_horovod_ir_test.xml -d MYRIAD -i model/27407_004.wav.
How can I fix the error? All my own layer use Permute have 3 dimensions instead of 4.
Thank you so much
Best regards,
HA
Unfortunately I still haven't found a solution for this. However, I didn't deal with this a lot because my main problem is that I get a really slow performance from NCS2.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »