Community
cancel
Showing results for 
Search instead for 
Did you mean: 
271 Views

Running neural models on a raspberry pi

Hello,

I recently purchased an Intel Movidius Neural Compute Stick 2 and I've managed to install OpenVINO on my raspberry pi following the instructions provided on the forum (https://software.intel.com/en-us/articles/OpenVINO-Install-RaspberryPI). What I'm trying to do now is to convert my Keras model to a supported version in order to run it on the Movidius Stick. First of all, is it possible to run a neural model that doesn't take an image as an input?

Thank you in advance.

0 Kudos
34 Replies
nikos1
Valued Contributor I
217 Views

Hello Fotis,

>  First of all, is it possible to run a neural model that doesn't take an image as an input?

OpenVino supports this. "Other-than-image input" worked fine in my products on both CPU and GPU devices but not sure if I also tried on NCS2. Will try tomorrow and update.. (just having a build issue I need to resolve first before I can test on NCS2).

Cheers,

Nikos

217 Views

Hi Nikos,

First of all thanks for the prompt reply.

The thing is that I have a keras model for audio signal processing and I want to run it on my NCS2, connected on a raspberry pi. I have successfully installed the openVINO on the raspberry pi (according to the instructions provided on the forum), so what I am trying to do now is to convert the keras model in order to run it on the NCS2. From what I understood the model conversion is not possible on the raspberry pi, but even on an ubuntu machine I am still not sure how to convert an "other-than-image input" model.

Cheers,

Fotis

UPDATE: I managed to convert the model with the mo_tf.py script but now I am not sure how to run it on the NCS2. After converting the model to the format needed for the NCS2 (bin and xml files) I tried to load it on the raspberry pi but when I type the command net = IENetwork(model="tf_model.bin",weights="tf_model.xml") I get the following error:

RuntimeError: Error reading network: input must have dimensions

nikos1
Valued Contributor I
217 Views

Have you used the --input_shape parameter of mo_tf.py ?

BTW  computer_vision_sdk_2018.5.445/deployment_tools/documentation/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html has some examples how to convert networks for speech.

 

nikos1
Valued Contributor I
217 Views

Also please make sure you set input / output precision correctly , like for example

    input_data->setPrecision(Precision::U8);
    input_data->setLayout(Layout::NCHW);  // ? 

There are a few options

**
 * @enum Layout
 * @brief Layouts that the inference engine supports
 */
enum Layout : uint8_t {
    ANY = 0,           // "any" layout

    // I/O data layouts
    NCHW = 1,
    NHWC = 2,
    NCDHW = 3,
    NDHWC = 4,

    // weight layouts
    OIHW = 64,

    // bias layouts
    C = 96,

    // Single image layout (for mean image)
    CHW = 128,

    // 2D
    HW = 192,
    NC = 193,
    CN = 194,

    BLOCKED = 200,
};

 

217 Views

First of all, I wasn't able to convert the model without setting the input shape. Thanks, I'll check the examples for speech models. Regarding the precision parameters, where do I define these?

nikos1
Valued Contributor I
217 Views

> Regarding the precision parameters, where do I define these?

In the inference application - for example in the case of C++ see line 619 of

computer_vision_sdk_2018.5.445/deployment_tools/inference_engine/samples/speech_sample/main.cpp

inputPrecision and layout is set 

        /** configure input precision if model loaded from IR **/
        for (auto &item : inputInfo) {
            Precision inputPrecision = Precision::FP32;  // specify Precision::I16 to provide quantized inputs
            item.second->setPrecision(inputPrecision);
            item.second->getInputData()->layout = NC;  // row major layout
        }

nikos

217 Views

Hi Niko,

The thing is that I can't even load the model to specify these parameters or change anything. I'm also using python so I'm trying to figure out what's going on because there are no examples for speech. I tried many different things on the model conversion but I'm still getting the "input must have dimensions" error when I try to load the model.

Thanks

EDIT: I got it finally to work, after specifying certain parameters for the conversion. I just need to figure out how to run the model now, I'll post if I have any more issues.

217 Views

Hello again,

So, I was able to convert the model and run it on the NCS2 and on the raspberry pi, but so far I'm getting noisy outputs and I don't know the cause. First of all, I used data type FP16 to convert the model and run it on the NCS2 (it wasn't possible with FP32) but I've noticed that the output has 'float32' dtype whatever the input data type is. How can I check the input/output precision of the converted model on Python?

nikos1
Valued Contributor I
217 Views

Hello Foti,

May be better to validate FP32 on CPU device first and then move to NCS2 FP16 (-d MYRAD ); would be less deltas and easier to track discrepancies.

Sorry I am not sure about Python API and input/output precision or validation options. My end to end to workflow is using C++ and offers flexibility to adjust precision and also validate and compare results to my reference implementation. I am sure Python API allows all this but never used it :-)

Cheers,

nikos

 

 

217 Views

Thanks again Niko!

I'm not able to test the FP32 on a raspberry pi because there is no plugin for CPU right now (only for MYRIAD). I've checked the input/outputs layers of the FP16 model and they are FP16 precision, however the output that I am getting from the NCS2 is in 'float32' format. I am still getting a kind of periodic noise to the output...

However, I found that there is a number of unsupported layers (Const) by the plugin for MYRIAD. What should I do in this case?

nikos1
Valued Contributor I
217 Views

Hi Foti,

> not able to test the FP32 on a raspberry pi because there is no plugin for CPU right now (only for MYRIAD).

Perhaps you could try on a x86 Linux or even Windows platform. Validating on FP32 CPU is essential in your case before moving to pi and NCS for a number of reasons.

> they are FP16 precision, however the output that I am getting from the NCS2 is in 'float32' 

That's not an issue. I am also getting FP32 out from FP16 inference. Again this becomes irrelevant in the case of FP32 valldation.

> However, I found that there is a number of unsupported layers (Const) by the plugin for MYRIAD. What should I do in this case?

I think this is the most important issue. Again on CPU FP32 may have no issues. Need to test CPU FP32 and see if they are supported there.

FWIW in my experience a smoother validation and dev workflow is

Native -> CPU FP32 -> run validation app 

CPU FP32 -> GPU FP16 -> validate FP16

GPU FP16 -> NCS FP16 -> Validate on NCS

It is slower but makes it easier to track issues.

If it all fails then comparison of results layer by layer as done in another post  ( reference ( https://software.intel.com/en-us/forums/computer-vision/topic/801760 by Nikolaev, Viktor  ) 

Cheers,

Nikos

217 Views

Hi Niko,

Yesterday I tried running the FP32 model on an ubuntu machine using CPU but I got a "buffer overrun" error (when I tried to load the network to the plugin). I looked a bit for a solution but I didn't find anything. I guess I'll try this layer by layer comparison to see what happens. Thanks!

Cheers,

Fotis

nikos1
Valued Contributor I
217 Views

Sorry, hard to see what the issue is without more information on model optimizer parameters or workflow in general. Are you converting frozen or non-frozen TensorFlow models or using Caffe or other?

If Caffe supported layers are in https://software.intel.com/en-us/articles/OpenVINO-Using-Caffe#caffe-supported-layers

Tensorflow supported layers in  https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow#tensorflow-supported-layers

> Keras model

I can see Keras so assuming you are on Tensorflow backend and freeze to a pb.

For TF custom layers, if needed, there is good documentation how to offload but I am not sure if it would make sense in terms of performance in the case of pi+NCS. Some info in https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer#Tensorflow-models-with-custom-layersSome ideas here from DeepSpeech may help in case more mo_tf.py parameters are needed. Sometimes it is not straightforward to convert TF to IR and it could be the case here that you just need one more parameter and the problem will be solved.

https://software.intel.com/en-us/articles/OpenVINO-Using-tensorflow ( also see section : Supported Layers and the Mapping to Intermediate Representation Layers )

To generate the DeepSpeech Intermediate Representation (IR), provide TensorFlow DeepSpeech model to the Model Optimizer with parameters:

python3 ./mo_tf.py
  --input_model path_to_model/output_graph.pb                         \
  --freeze_placeholder_with_value input_lengths->[16]                \
  --input input_node,previous_state_h/read,previous_state_c/read  \
  --input_shape [1,16,19,26],[1,2048],[1,2048]                              \
  --output raw_logits,lstm_fused_cell/Gather,lstm_fused_cell/Gather_1

nikos

217 Views

Do you think that the "Buffer overrun" error that I got for the FP32 model could be caused because of an incorrect conversion?

To get more into detail, I'm converting a keras model to an IR representation and I tried doing both with a frozen and a non-frozen model. I am specifying the input layer name and size and the output layer name to the conversion command (as shown in the example) but I will experiment a bit with the parameters tomorrow to see if this will make a difference.

Fotis

nikos1
Valued Contributor I
217 Views

>  could be caused because of an incorrect conversion?

Yes, assuming you have no unsupported layers, I think it is possible to be a conversion parameter issue causing the inference engine buffer issue when loading weights. Coincidentally also got the same error two weeks ago and fixed but do not remember the exact issue, poor short-term memory :-)  I think it was related to input shape or NCHW vs. NHWC but it was with 2D images not 1D case.

nikos

217 Views

Hello again Niko,

I tried changing all the different parameters during the conversion but I still get a buffer overrun error when I try to run the FP32 model using the CPU.

Additionally, I changed the 'ReLu' on my keras model and now most of the unsupported layers on the FP16 model for the MYRIAD are gone, but I still get the Input layer as an unsupported layer and the same noisy output. I was wondering what is the correct representation of the Input layer for a model on MYRIAD, because it's weird that the input layer is unsupported.

I also tried to convert and try out the deepspeech model mentioned above, but when I do the conversion I get the following error:

[ ERROR ]  -------------------------------------------------
[ ERROR ]  ----------------- INTERNAL ERROR ----------------
[ ERROR ]  Unexpected exception happened.
[ ERROR ]  Please contact Model Optimizer developers and forward the following information:
[ ERROR ]  Exception occurred during running replacer "None (<class 'extensions.front.tf.BlockLSTM.BlockLSTM'>)": 7
[ ERROR ]  Traceback (most recent call last):
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 114, in apply_replacements
    replacer.find_and_replace_pattern(graph)
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/front/common/replacement.py", line 125, in find_and_replace_pattern
    apply_pattern(graph, action=self.replace_sub_graph, **self.pattern())
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/middle/pattern_match.py", line 95, in apply_pattern
    action(graph, match)
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/front/common/replacement.py", line 189, in replace_sub_graph
    self.replace_output_edges(graph, self.gen_output_edges_match(node, self.replace_op(graph, node)))
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/extensions/front/tf/BlockLSTM.py", line 84, in replace_op
    [graph.remove_edge(node.in_node(p).id, node.id) for p, input_data in node.in_nodes().items() if p in [5, 6, 7]]
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/extensions/front/tf/BlockLSTM.py", line 84, in <listcomp>
    [graph.remove_edge(node.in_node(p).id, node.id) for p, input_data in node.in_nodes().items() if p in [5, 6, 7]]
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/graph/graph.py", line 329, in in_node
    return self.in_nodes(control_flow=control_flow)[key]
KeyError: 7

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/main.py", line 325, in main
    return driver(argv)
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/main.py", line 267, in driver
    mean_scale_values=mean_scale)
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/pipeline/tf.py", line 248, in tf2nx
    class_registration.apply_replacements(graph, class_registration.ClassType.FRONT_REPLACER)
  File "/opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 127, in apply_replacements
    )) from err
Exception: Exception occurred during running replacer "None (<class 'extensions.front.tf.BlockLSTM.BlockLSTM'>)": 7

[ ERROR ]  ---------------- END OF BUG REPORT --------------
[ ERROR ]  -------------------------------------------------

EDIT: I finally managed to (maybe) get a correct output from the converted model using the MYRIAD plugin. I used the "--disable_nhwc_to_nchw" parameter in the conversion and now I don't see this noisy output. However, I now get a new list of unsupported layers and the most important part is that the IR model suddenly got really slow (it takes around 190 ms for 1 iteration). What could be the cause? Also, if I compare the two xml files (before and after the "--disable_nhwc_to_nchw" addition) I see different dimensions for each layer.

nikos1
Valued Contributor I
217 Views

Hello Foti,

Good find with the --disable_nhwc_to_nchw parameter. Just for the record were you able to run now on CPU FP32 and get valid results?

> However, I now get a new list of unsupported layers

Are you using LSTM? Not sure if is supported or validated yet for MYRIAD. I will ask this question in my old post (   https://software.intel.com/en-us/forums/computer-vision/topic/755432  )

Based on 2018 R5 release notes:

New Features in the 2018 R5 include:

Extends neural network support to include LSTM (long short-term memory) from ONNX*, TensorFlow*& MXNet* frameworks, & 3D convolutional-based networks in preview mode (CPU-only) to support additional, new use cases beyond computer vision.

> and the most important part is that the IR model suddenly got really slow (it takes around 190 ms for 1 iteration).

For that you may want to use the profiler that reports ms per layer and get a better idea of what slows down the execution. of course functionality first, much higher priority.

nikos

217 Views

Hi Niko,

No, even with the --disable_nhwc_to_nchw parameter the model doesn't work on the CPU. I tried every possible parameter but I am still getting the "cannot create internal buffer. buffer can be overrun" so I don't know how to proceed with this.

The unsupported layers are again of type "Const" but originate from conv layers of the original model. What I did before to remove the unsupported layers was to train the model with a leakyReLu instead, but now I don't really know how to substitute the convolution layers.

The thing is that the model now works on the MYRIAD (I'll verify the output tomorrow but with a first glance I think that it produces a correct output) but it is really slow. How could I find the cause of this at least?

nikos1
Valued Contributor I
217 Views

try to get performance counts (us per layer) using get_perf_counts.

        perf_counts = infer_request_handle.get_perf_counts()
        log.info("Performance counters:")
        print("{:<70} {:<15} {:<15} {:<15} {:<10}".format('name', 'layer_type', 'exet_type', 'status', 'real_time, us'))
        for layer, stats in perf_counts.items():
            print("{:<70} {:<15} {:<15} {:<15} {:<10}".format(layer, stats['layer_type'], stats['exec_type'],
                                                              stats['status'], stats['real_time']))

Some examples in

 grep  perf ./computer_vision_sdk/deployment_tools/inference_engine/samples/python_samples/*

or check the python API docs if more information is needed for performance counters.

cheers

nikos