Right, disable_nhwc_to_nchw - Page 2

Drakopoulos__Fotis · ‎01-03-2019

Hello,

I recently purchased an Intel Movidius Neural Compute Stick 2 and I've managed to install OpenVINO on my raspberry pi following the instructions provided on the forum (https://software.intel.com/en-us/articles/OpenVINO-Install-RaspberryPI). What I'm trying to do now is to convert my Keras model to a supported version in order to run it on the Movidius Stick. First of all, is it possible to run a neural model that doesn't take an image as an input?

Thank you in advance.

nikos1 · ‎01-23-2019

Right, disabling_nhwc_to_nchw seems to cause overhead to reshape on device as it adds the following and gets really slow...

model_1/G_gtlayer/add/Broadcast/                                       Tile            Tile            EXECUTED        41465     
model_1/G_gtlayer/add/Broadcast/Reshape/After                          Reshape         Reshape         EXECUTED        2647      
model_1/G_gtlayer/add/Broadcast/Reshape/Before                         Reshape         Reshape         OPTIMIZED_OUT   0

nikos1 · ‎01-23-2019

Hello again,

> without a bias in the weights and now it works like a charm

Just saw the edit, good progress!

> but it's not significantly fast.

I think batching may be worth trying if latency allows it; it could speed up execution. MYRIAD plug-in allows / supports batching, right?

Drakopoulos__Fotis · ‎01-23-2019

nikos wrote:
I think batching may be worth trying if latency allows it; it could speed up execution. MYRIAD plug-in allows / supports batching, right?

I think it does but, since I am trying to try out a real-time processing application, I need to get the output of each (one-sized) input as soon as possible so I don't think I can use batch processing.

In case that this nhwc to nchw conversion is the cause of the slow performance, do you think that I could somehow train the model from scratch to avoid using this parameter in the conversion and get a performance increase? I can post you tomorrow the performance counts for the model trained without the bias to check what is causing this delay.

Fotis

nikos1 · ‎01-23-2019

> real-time processing application

right - it depends on how much latency you can afford. unless of course you have multiple channels to process in which case you could batch frames from your channels hence no latency.

> nhwc to nchw conversion is the cause of the slow performance, do you think that I could somehow train the model from scratch to avoid using this parameter in the conversion and get a performance increase?

yes I believe that should be possible

nikos

Drakopoulos__Fotis · ‎01-24-2019

nikos wrote:
right - it depends on how much latency you can afford. unless of course you have multiple channels to process in which case you could batch frames from your channels hence no latency.

The lower the better so I can't really increase batch size. Also, it is a single-channel processing algorithm...

Below you can find the performance counters for the latest model:

[ INFO ] Performance counters:
name                                                                   layer_type      exet_type       status          real_time, us
G_gtlayer/convolution/Conv2D                                           Convolution     Im2ColConvolution EXECUTED        362       
G_gtlayer/convolution/Conv2D/Permute_                                  Permute         Permute         EXECUTED        34        
G_gtlayer/convolution/Conv2D/Permute_1012                              Permute         Permute         EXECUTED        50        
G_gtlayer/convolution/ExpandDims                                       Reshape         Reshape         OPTIMIZED_OUT   0         
G_gtlayer/convolution/Squeeze                                          Reshape         Reshape         OPTIMIZED_OUT   0         
LeakyReLU_                                                             ReLU            LeakyRelu       EXECUTED        52        
LeakyReLU_1066                                                         ReLU            LeakyRelu       EXECUTED        72        
LeakyReLU_1067                                                         ReLU            LeakyRelu       EXECUTED        61        
LeakyReLU_1068                                                         ReLU            LeakyRelu       EXECUTED        38        
LeakyReLU_1069                                                         ReLU            LeakyRelu       EXECUTED        51        
LeakyReLU_1070                                                         ReLU            LeakyRelu       EXECUTED        79        
LeakyReLU_1071                                                         ReLU            LeakyRelu       EXECUTED        35        
LeakyReLU_1072                                                         ReLU            LeakyRelu       EXECUTED        63        
LeakyReLU_1073                                                         ReLU            LeakyRelu       EXECUTED        51        
LeakyReLU_1074                                                         ReLU            LeakyRelu       EXECUTED        56        
LeakyReLU_1075                                                         ReLU            LeakyRelu       EXECUTED        76        
LeakyReLU_1076                                                         ReLU            LeakyRelu       EXECUTED        79        
LeakyReLU_1077                                                         ReLU            LeakyRelu       EXECUTED        40        
Receive-Tensor                                                         Receive-Tensor  Receive-Tensor  EXECUTED        0         
concatenate_1/concat@0@compact                                         Concat          Copy            EXECUTED        25        
concatenate_1/concat@1@compact                                         Concat          Copy            EXECUTED        11        
concatenate_2/concat@0@compact                                         Concat          Copy            EXECUTED        22        
concatenate_2/concat@1@compact                                         Concat          Copy            EXECUTED        10        
concatenate_3/concat@0@compact                                         Concat          Copy            EXECUTED        23        
concatenate_3/concat@1@compact                                         Concat          Copy            EXECUTED        11        
concatenate_4/concat@0@compact                                         Concat          Copy            EXECUTED        23        
concatenate_4/concat@1@compact                                         Concat          Copy            EXECUTED        12        
concatenate_5/concat@0@compact                                         Concat          Copy            EXECUTED        25        
concatenate_5/concat@1@compact                                         Concat          Copy            EXECUTED        13        
concatenate_6/concat@0@compact                                         Concat          Copy            EXECUTED        35        
concatenate_6/concat@1@compact                                         Concat          Copy            EXECUTED        25        
conv1d_1/convolution/Conv2D                                            Convolution     Im2ColConvolution EXECUTED        454       
conv1d_1/convolution/Conv2D/Permute_                                   Permute         Permute         EXECUTED        51        
conv1d_1/convolution/Conv2D/Permute_1016                               Permute         Permute         EXECUTED        47        
conv1d_1/convolution/ExpandDims                                        Reshape         Reshape         EXECUTED        31        
conv1d_1/convolution/Squeeze                                           Reshape         Reshape         OPTIMIZED_OUT   0         
conv1d_2/convolution/Conv2D                                            Convolution     Im2ColConvolution EXECUTED        474       
conv1d_2/convolution/Conv2D/Permute_                                   Permute         Permute         EXECUTED        51        
conv1d_2/convolution/Conv2D/Permute_1020                               Permute         Permute         EXECUTED        44        
conv1d_2/convolution/ExpandDims                                        Reshape         Reshape         EXECUTED        34        
conv1d_2/convolution/Squeeze                                           Reshape         Reshape         OPTIMIZED_OUT   0         
conv1d_3/convolution/Conv2D                                            Convolution     Im2ColConvolution EXECUTED        517       
conv1d_3/convolution/Conv2D/Permute_                                   Permute         Permute         EXECUTED        44        
conv1d_3/convolution/Conv2D/Permute_1024                               Permute         Permute         EXECUTED        40        
conv1d_3/convolution/ExpandDims                                        Reshape         Reshape         EXECUTED        26        
conv1d_3/convolution/Squeeze                                           Reshape         Reshape         OPTIMIZED_OUT   0         
conv1d_4/convolution/Conv2D                                            Convolution     Im2ColConvolution EXECUTED        535       
conv1d_4/convolution/Conv2D/Permute_                                   Permute         Permute         EXECUTED        43        
conv1d_4/convolution/Conv2D/Permute_1028                               Permute         Permute         EXECUTED        36        
conv1d_4/convolution/ExpandDims                                        Reshape         Reshape         EXECUTED        26        
conv1d_4/convolution/Squeeze                                           Reshape         Reshape         OPTIMIZED_OUT   0         
conv1d_5/convolution/Conv2D                                            Convolution     Im2ColConvolution EXECUTED        463       
conv1d_5/convolution/Conv2D/Permute_                                   Permute         Permute         EXECUTED        42        
conv1d_5/convolution/Conv2D/Permute_1032                               Permute         Permute         EXECUTED        35        
conv1d_5/convolution/ExpandDims                                        Reshape         Reshape         EXECUTED        21        
conv1d_5/convolution/Squeeze                                           Reshape         Reshape         OPTIMIZED_OUT   0         
conv1d_6/convolution/Conv2D                                            Convolution     Im2ColConvolution EXECUTED        513       
conv1d_6/convolution/Conv2D/Permute_                                   Permute         Permute         EXECUTED        41        
conv1d_6/convolution/Conv2D/Permute_1036                               Permute         Permute         EXECUTED        36        
conv1d_6/convolution/ExpandDims                                        Reshape         Reshape         EXECUTED        21        
conv1d_6/convolution/Squeeze                                           Reshape         Reshape         OPTIMIZED_OUT   0         
conv2d_transpose_1/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        7526      
conv2d_transpose_1/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        55        
conv2d_transpose_1/conv2d_transpose/Permute_1040                       Permute         Permute         EXECUTED        64        
conv2d_transpose_2/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        10421     
conv2d_transpose_2/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        76        
conv2d_transpose_2/conv2d_transpose/Permute_1044                       Permute         Permute         EXECUTED        62        
conv2d_transpose_3/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        21130     
conv2d_transpose_3/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        97        
conv2d_transpose_3/conv2d_transpose/Permute_1048                       Permute         Permute         EXECUTED        89        
conv2d_transpose_4/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        11079     
conv2d_transpose_4/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        151       
conv2d_transpose_4/conv2d_transpose/Permute_1052                       Permute         Permute         EXECUTED        90        
conv2d_transpose_5/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        21736     
conv2d_transpose_5/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        151       
conv2d_transpose_5/conv2d_transpose/Permute_1056                       Permute         Permute         EXECUTED        98        
conv2d_transpose_6/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        43171     
conv2d_transpose_6/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        159       
conv2d_transpose_6/conv2d_transpose/Permute_1060                       Permute         Permute         EXECUTED        99        
conv2d_transpose_7/conv2d_transpose                                    Deconvolution   Deconvolution   EXECUTED        20233     
conv2d_transpose_7/conv2d_transpose/Permute_                           Permute         Permute         EXECUTED        157       
conv2d_transpose_7/conv2d_transpose/Permute_1064                       Permute         Permute         EXECUTED        23        
g_output/Reshape                                                       Reshape         Reshape         OPTIMIZED_OUT   0         
g_output/Reshape@FP16                                                  <Extra>         Convert_f16f32  EXECUTED        38        
main_input_noisy@FP16                                                  <Extra>         Convert_f32f16  EXECUTED        55        
reshape_1/Reshape                                                      Reshape         Reshape         EXECUTED        59        
reshape_10/Reshape                                                     Reshape         Reshape         EXECUTED        1372      
reshape_11/Reshape                                                     Reshape         Reshape         EXECUTED        454       
reshape_12/Reshape                                                     Reshape         Reshape         EXECUTED        2664      
reshape_13/Reshape                                                     Reshape         Reshape         EXECUTED        288       
reshape_2/Reshape                                                      Reshape         Reshape         EXECUTED        111       
reshape_3/Reshape                                                      Reshape         Reshape         EXECUTED        192       
reshape_4/Reshape                                                      Reshape         Reshape         EXECUTED        193       
reshape_5/Reshape                                                      Reshape         Reshape         EXECUTED        240       
reshape_6/Reshape                                                      Reshape         Reshape         EXECUTED        371       
reshape_7/Reshape                                                      Reshape         Reshape         EXECUTED        466       
reshape_8/Reshape                                                      Reshape         Reshape         EXECUTED        691       
reshape_9/Reshape                                                      Reshape         Reshape         EXECUTED        387

As you can see, the deconvolutions are the slowest operations right now, so is there maybe a way to fix this?

As I said before, I am trying right now to find a way to train the model without needing to use the disable_nhwc_to_nchw parameter, but I haven't found a way to do this properly yet.

nikos1 · ‎01-24-2019

> As you can see, the deconvolutions are the slowest operations right now, so is there maybe a way to fix this?

Sorry not aware of a way to fix this. I believe the MYRIAD plug-in is not even open sourced yet so we cannot profile or brainstorm optimization opportunities. The only way for this and for disable_nhwc_to_nchw would be to re-architect your network and work based on the new perf data.

Cheers,

nikos

Drakopoulos__Fotis · ‎01-24-2019

nikos wrote:
Sorry not aware of a way to fix this. I believe the MYRIAD plug-in is not even open sourced yet so we cannot profile or brainstorm optimization opportunities. The only way for this and for disable_nhwc_to_nchw would be to re-architect your network and work based on the new perf data.
Cheers,
nikos

Hi Niko,

You have helped already a lot, I'll try to experiment a bit with the model's architecture and if anything comes up I'll post here again.

Thank you so much once again.

Cheers,

Fotis

EDIT: I changed the order of the channels of the model and re-trained it in order to avoid this disable_nhwc_to_nchw parameter and now it works on the CPU (and really fast also), but unfortunately I get a huge list of unsupported layers on the MYRIAD plugin (ReLU, Concat, Reshape, Permute, Deconvolution) and I get a "RuntimeError: [VPU] Permute has to provide order dimension 4. Layer name is: conv1d_1/transpose" error...

nikos1 · ‎01-25-2019

Right.. I am a bit confused too on what is supported and what not supported when it comes to NCS

BTW have you seen this list of supported NCS layers from the other SDK V1.12.01 2018-10-05 :

https://movidius.github.io/ncsdk/release_notes.html

Drakopoulos__Fotis · ‎01-25-2019

nikos wrote:
Right.. I am a bit confused too on what is supported and what not supported when it comes to NCS
BTW have you seen this list of supported NCS layers from the other SDK V1.12.01 2018-10-05 :
https://movidius.github.io/ncsdk/release_notes.html

Yes I have and the layers that I use in my model are supposed to be supported and in fact they were supported before changing the nchw order and the dimensions and now they became unsupported. It is really confusing...

Drakopoulos__Fotis · ‎01-30-2019

I tried yesterday to convert the keras model to float16 precision (since myriad uses fp16), but I get this error running the mo_tf script:

Unexpected exception happened during extracting attributes for node main_input_noisy.
Original exception message: Data type is unsupported: 19.

Have you tried using a 'float16' model and is it going to help to create the representation from a 'float16' model?

nikos1 · ‎01-30-2019

> Have you tried using a 'float16' model

Interesting case! Have not seen any documentation on this - may be unsupported. FWIW I am always pushing FP32 to the optimizer and trust model_optimizer and calibrator to generate FP32 or FP16 / INT8 IR if needed.

ha__minh_quyet · ‎03-04-2019

Drakopoulos, Fotis wrote:
Quote:
nikos wrote:

Sorry not aware of a way to fix this. I believe the MYRIAD plug-in is not even open sourced yet so we cannot profile or brainstorm optimization opportunities. The only way for this and for disable_nhwc_to_nchw would be to re-architect your network and work based on the new perf data.
Cheers,
nikos

Hi Niko,
You have helped already a lot, I'll try to experiment a bit with the model's architecture and if anything comes up I'll post here again.
Thank you so much once again.
Cheers,
Fotis
EDIT: I changed the order of the channels of the model and re-trained it in order to avoid this disable_nhwc_to_nchw parameter and now it works on the CPU (and really fast also), but unfortunately I get a huge list of unsupported layers on the MYRIAD plugin (ReLU, Concat, Reshape, Permute, Deconvolution) and I get a "RuntimeError: [VPU] Permute has to provide order dimension 4. Layer name is: conv1d_1/transpose" error...

Dear Forits,

I converted my tensorflow model into IR format. However, I got the error same as you . RuntimeError: [VPU] Permute has to provide order dimension 4 (I use the Python version).

I use MYRIAD to inference: python inference.py -m model/ir_format/asr_horovod_ir_test.xml -d MYRIAD -i model/27407_004.wav.

How can I fix the error? All my own layer use Permute have 3 dimensions instead of 4.

Thank you so much

Best regards,

Ha

ha__minh_quyet · ‎03-04-2019

Dear all,

I converted my tensorflow model into IR format. However, I got the error same as Fotis . RuntimeError: [VPU] Permute has to provide order dimension 4 (I use the Python version).

I use MYRIAD to inference: python inference.py -m model/ir_format/asr_horovod_ir_test.xml -d MYRIAD -i model/27407_004.wav.

How can I fix the error? All my own layer use Permute have 3 dimensions instead of 4.

Thank you so much

Best regards,

HA

Drakopoulos__Fotis · ‎03-05-2019

ha, minh quyet wrote:
Dear all,
I converted my tensorflow model into IR format. However, I got the error same as Fotis . RuntimeError: [VPU] Permute has to provide order dimension 4 (I use the Python version).
I use MYRIAD to inference: python inference.py -m model/ir_format/asr_horovod_ir_test.xml -d MYRIAD -i model/27407_004.wav.
How can I fix the error? All my own layer use Permute have 3 dimensions instead of 4.
Thank you so much
Best regards,
HA

Unfortunately I still haven't found a solution for this. However, I didn't deal with this a lot because my main problem is that I get a really slow performance from NCS2.

Running neural models on a raspberry pi