Issue - Model Optimizer - Input shape

Jakub · ‎01-09-2019

I am writing to notify you about the issue I found in Model Optimizer.

I created a model in Keras/Tensorflow. It has input (1, 28, 28):

model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation='relu'))

After exporting it to pb format, I run Model Optimizer with "--input_shape [1,1,28,28]" argument:

python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ~/Documents/MNIST_OpenVino/output_graph.pb --input_shape [1,1,28,28] --output_dir ~/Documents/MNIST_OpenVino/ --scale 255 --data_type FP32

Unexpectedly, the input shape in XML file is different: [1,28,1,28]:

<?xml version="1.0" ?>
<net batch="1" name="output_graph" version="4">
	<layers>
		<layer id="0" name="conv2d_1_input" precision="FP32" type="Input">
			<output>
				<port id="0">
					<dim>1</dim>
					<dim>28</dim>
					<dim>1</dim>
					<dim>28</dim>
				</port>
			</output>
		</layer>
		<layer id="1" name="Mul1_" precision="FP32" type="Power">
			<data power="1" scale="0.00392156862745098" shift="0"/>
			<input>
				<port id="0">
					<dim>1</dim>
					<dim>28</dim>
					<dim>1</dim>
					<dim>28</dim>
				</port>
			</input>
			<output>
				<port id="1">
					<dim>1</dim>
					<dim>28</dim>
					<dim>1</dim>
					<dim>28</dim>
				</port>
			</output>
		</layer>
		<layer id="2" name="conv2d_1/transpose" precision="FP32" type="Permute">
			<data order="0,2,3,1"/>
			<input>
				<port id="0">
					<dim>1</dim>
					<dim>28</dim>
					<dim>1</dim>
					<dim>28</dim>
				</port>
			</input>
			<output>
				<port id="1">
					<dim>1</dim>
					<dim>1</dim>
					<dim>28</dim>
					<dim>28</dim>
				</port>
			</output>

I noticed the problem because after running the Model Optimizer, the model started giving wrong predictions. The workaround is to manually edit the dimensions of initial layers inside the XML file:

					<dim>1</dim>
					<dim>1</dim>
					<dim>28</dim>
					<dim>28</dim>

The problem exists in R4, R5 versions and with mo.py and mo_tf.py files

Jakub · ‎01-09-2019

One more finding: when playing with the same models on RaspberryPI and Python API:

1) when I keep the XML file not modified - the scripts runs without error, model makes bad predictions (so the result same as on Linux)

2) when I modify the XML file (the workaround specified in the post above) - the script returns error:

E: [xLink] [    300849] dispatcherEventReceive:308	dispatcherEventReceive() Read failed -1 | event 0x6b5fee20 USB_WRITE_RESP

E: [xLink] [    300849] eventReader:256	eventReader stopped
E: [watchdog] [    301747] sendPingMessage:164	Failed send ping message: X_LINK_ERROR
^CE: [xLink] [    301972] dispatcherWaitEventComplete:720	waiting is timeout, sending reset remote event
E: [ncAPI] [    301972] checkGraphMonitorResponse:1284	XLink error, rc: X_LINK_TIMEOUT
E: [ncAPI] [    301972] ncGraphQueueInference:3544	Can't get trigger response

nikos1 · ‎01-09-2019

Could you try

--input_shape [1,28,28,1]

Jakub · ‎01-10-2019

Hi @nikos,

when I try --input_shape [1,28,28,1] with unchanged pb file, so the command looks like the following:

python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ~/Documents/MNIST_OpenVino/output_graph.pb --input_shape [1,28,28,1] --output_dir ~/Documents/MNIST_OpenVino-ForumIdea/ --scale 255 --data_type FP32

I receive the following error of mo_tf.py file:

Model Optimizer version: 	1.5.12.49d067a0
[ ERROR ]  Shape [ 1 30 24 -1] is not fully defined for output 0 of "conv2d_1/add". Use --input_shape with positive integers to override model input shapes.
[ ERROR ]  Cannot infer shapes or values for node "conv2d_1/add".
[ ERROR ]  Not all output shapes were inferred or fully defined for node "conv2d_1/add". 
 For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #40. 
[ ERROR ]  
[ ERROR ]  It can happen due to bug in custom shape infer function <function tf_eltwise_ext.<locals>.<lambda> at 0x7f7bd1dc36a8>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ]  Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ]  Stopped shape/value propagation at "conv2d_1/add" node. 
 For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.

I am attaching the pb file, so you can play with it directly.

mahinlma · ‎01-10-2019

instead --input_shape [1,28,28,1]

try passing

-b 1

Just now tried with your attached .pb

Am able to generate IR

nikos1 · ‎01-10-2019

Hi Jakub,

--input_shape [1,1,28,28] seems to work with your attached pb

mo_tf.py --input_model output_graph.pb --input_shape [1,1,28,28] --scale 255 --data_type FP32

[ SUCCESS ] Generated IR model.

Could you test inference with the generated IR? Is it correct?

Cheers,

Nikos

Jakub · ‎01-11-2019

Hi Nikos,

well, yes, the MO is able to successfully generate bin/xml file with --input_shape [1,1,28,28] but the files are wrong:

1) in XML file the first few layers have shape 1,28,1,28

2) the model is giving wrong predictions

The workaround to fix the predictions is to manually edit the XML file so few first layers have shape 1,1,28,28. However, this workaround works only if the model is using CPU for computations - when tried on MYRIAD it prints errors mentioned in the post above.

In order to see the wrong predictions and workaround, I'm attaching script which is supposed to recognize number 5.

Please run:

1) myMNIST_v3.py (it uses XML file generated by MO) - the model predicts number 5 to be 8

2) myMNIST_v3-workaround.py (it uses XML file with manually reordered dimansions: 1,1,28,28) - the model correctly predicts 5 to be 5

nikos1 · ‎01-12-2019

Hi Jakub,

Sorry I am not seeing an issue with the model optimizer or inference on CPU or NCS. I don't know how you train and freeze to pb so I cannot test the end-to-end workflow. Let's forget OpenVino for now. Do you get the correct results if you run native TF? Could you possibly attach simple python code that loads your output_graph.pb, runs inference using tensorflow and gets the correct prediction on five.jpg

If that works, make sure the correct values are sued for mean subtraction and scaling of input - i.e. what --mean_values and --scale are you using for mo_tf.py for model optimization? Are the default values good or need to specify custom? There may be something in the image processing pipeline causing the discrepancy.

Another thing I noticed is that you run on Mac OS. Could that be causing any issues? Any chance to try the same on Linux? Not sure how much of OpenVino was validated on OS X and if there are any hidden endian issues somewhere in the pipeline.

Note that supported OS is

Ubuntu 16.04.3 LTS (64 bit)

Windows 10 (64 bit)

CentOS 7.4 (64 bit)

nikos

Jakub · ‎01-16-2019

Hi Nikos,

I run the OpenVino on Ubuntu 16.04.3 LTS (64 bit). I use Mac only for accessing this forum

I created scripts for the end-to-end workflow, starting with training a model, then exporting to 'pb' file, optimizing using OpenVino and running inference.

The attached ZIP file includes:

files:
- 1.py - Script used to train the model and save it as 'hdf5' file.
- 2.py - Script used to load the 'hdf5' file and verify it gives the same predictions as 1.py. It also exports the model to 'pb' file
- 3.py - Script used to load the 'pb' file and verify it gives the same predictions as 1.py.
- functions.py - contains definitions of common functions used by 1.py, 2.py and 3.py files
- keras_to_tensorflow.py - script to export a model as pb file. Executed by the 2.py script.
- myMNIST_v3.py - a script that loads the XML and BIN file generated by OpenVino and runs the inference.
- myMNIST_v3-workaround.py - a script that loads the BIN file generated by OpenVino and a modified version of the XML file
directories:
- Images - a directory with test images
- Output - a directory that will include files generated by the scripts. Initially empty

Steps to execute:

Extract the ZIP file and using a terminal enter the created 'Intel-Final' directory
In the directory execute the following commands:
- python3 1.py
- python3 2.py
- python3 3.py
- python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ./Output/model.pb --input_shape [1,1,28,28] --output_dir ./Output/ --scale 255 --data_type FP32
- python3 myMNIST_v3.py
In the 'Output' directory make a copy of the 'model.xml' file and save it as 'model-workaround.xml'
In the 'model-workaround.xml' file make the following chnages:
1. Locate first 4 definitions of dimensions. They are located in 3 first layers:
  1. conv2d_1_input / output
  2. Mul1_ / input
  3. Mul1_ / output
  4. conv2d_1/transpose / input
2. Change the definitions of dimensions:
  1. from:
    - <dim>1</dim>
    - <dim>28</dim>
    - <dim>1</dim>
    - <dim>28</dim>
  2. to:
    - <dim>1</dim>
    - <dim>1</dim>
    - <dim>28</dim>
    - <dim>28</dim>
Run the 'myMNIST_v3-workaround.py' script

The predictions of each script are as following (due to an issue of Keras, I am not able to freeze the seed, therefore when you run the scripts, your predictions probabilities will be slightly different, but consistent for '1.py', '2.py', '3.py', and 'myMNIST_v3-workaround.py' scripts):

1.py:
- Predicted digit: #5, Probability: 0.861584
2.py:
- Predicted digit: #5, Probability: 0.861584
3.py:
- Predicted digit: #5, Probability: 0.861584
myMNIST_v3.py:
- Predicted digit: #4, Probability: 0.660924
myMNIST_v3-workaround.py:
- Predicted digit: #5, Probability: 0.861584

As you can see, the 'myMNIST_v3.py' file, which uses the file generated by the 'mo_tf.py' file gives wrong predictions, different than all the other scripts. Moreover, slight edition of the XML file fixes the problem. However, the fix works only if the model is run on CPU - if the model with fix runs on MYRIAD, it crashes. As you can see, there is somewhere a bug, which causes the wrong predictions.

om77 · ‎01-16-2019

Hi Jakub,

just FYI, the model optimizer command line supports a parameter --disable_nhwc_to_nchw.

Did you play with it?

Thanks.

Jakub · ‎01-17-2019

Thanks @om77 for the interesting idea, but it doesn't work - the XML file has correct dimensions, but the prediction is still wrong.

Below you can see the summary of diferent tests run with diferent options. In all tests the following part of the command was the same:

python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ./Output/model.pb --input_shape [1,1,28,28] --output_dir ./Output/ --scale 255

In following tests I used additional parameters:

run on CPU:
1. --data_type FP32
  - Predicted digit: #4, Probability: 0.660924
2. --data_type FP32 + manually edited XML (as described in the post above)
  - Predicted digit: #5, Probability: 0.861584
3. --data_type FP32 --disable_nhwc_to_nchw
  - Predicted digit: #8, Probability: 0.932036
run on MYRIAD:
1. --data_type FP16
  - Predicted digit: #2, Probability: 0.336670
2. --data_type FP16 + manually edited XML (as described in the post above)
  - Model crashes
3. --data_type FP16 --disable_nhwc_to_nchw
  - Predicted digit: #8, Probability: 0.305176

As you can see, the predictions are inconsistent. The correct prediction is only for the case with "--data_type FP32 + manually edited XML"

Hill__Aaron · ‎02-20-2019

I am working on a similar example as the OP. A basic MNIST CNN from Keras to TF to IR and inferenced through the openVINO python API.

I was not experiencing this weird dimension mangling issue and after closer inspection I noticed the OP is using data_forma="channels_first" in their keras model. I was using the default data_format="channels_last". So my initial layer has input_shape=(28, 28, 1).

When I changed my keras model to use data_format='channels_first' I reproduced the exact same behavior seen from the OP. However, I do not get bad classification because I reshaped the data to fit the weird [1,28,1,28] dimensions and did not touch the xml file.

I did not do any FP32 vs FP16 tests, I am only inferencing on the NCS2 with FP16.

I am however running into errors when trying to implement batch inferencing with this model, but I won't hijack this thread.

Hill__Aaron · ‎02-20-2019

Actually, I was wrong. I am getting bad classification results when the dimensions are forced to [1,28,1,28]. Sorry, I forgot to verify that before posting.

Soni__Neha · ‎09-08-2019

HI I am getting this error net1 = IENetwork(model=model_xml1, weights=model_bin1) File "ie_api.pyx", line 415, in openvino.inference_engine.ie_api.IENetwork.__cinit__ RuntimeError: Error reading network: dimension (0) in node dim must be a positive integer: at offset 14819 I have tried --input_shape [1,32,32,1] and -b 1 . please help me in this regards.

Shubha_R_Intel · ‎09-09-2019

Dear Soni, Neha,

As I've said before it's difficult to say what happened there without stepping through the debugger. Since you are uncomfortable sharing your model, you can build a DEBUG version of IE using dldt github openvino . Follow the Inference engine README carefully and build a Debug release of Inference Engine. You have full source code available to you. So you can step through and figure out exactly why Inference Engine is throwing that error.

It's definitely odd because Model Optimizer should not produce IR with negative dimensions.

Hope it helps,

Shubha

Ben__Hsu · ‎04-23-2020

@Shubha, i also have the same issue, but i open the reproduced steps and artifacts in a new issue here.

Please take a look, thanks.

Ben.