I am writing to notify you about the issue I found in Model Optimizer.
I created a model in Keras/Tensorflow. It has input (1, 28, 28):
model = Sequential() model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation='relu'))
After exporting it to pb format, I run Model Optimizer with "--input_shape [1,1,28,28]" argument:
python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ~/Documents/MNIST_OpenVino/output_graph.pb --input_shape [1,1,28,28] --output_dir ~/Documents/MNIST_OpenVino/ --scale 255 --data_type FP32
Unexpectedly, the input shape in XML file is different: [1,28,1,28]:
<?xml version="1.0" ?> <net batch="1" name="output_graph" version="4"> <layers> <layer id="0" name="conv2d_1_input" precision="FP32" type="Input"> <output> <port id="0"> <dim>1</dim> <dim>28</dim> <dim>1</dim> <dim>28</dim> </port> </output> </layer> <layer id="1" name="Mul1_" precision="FP32" type="Power"> <data power="1" scale="0.00392156862745098" shift="0"/> <input> <port id="0"> <dim>1</dim> <dim>28</dim> <dim>1</dim> <dim>28</dim> </port> </input> <output> <port id="1"> <dim>1</dim> <dim>28</dim> <dim>1</dim> <dim>28</dim> </port> </output> </layer> <layer id="2" name="conv2d_1/transpose" precision="FP32" type="Permute"> <data order="0,2,3,1"/> <input> <port id="0"> <dim>1</dim> <dim>28</dim> <dim>1</dim> <dim>28</dim> </port> </input> <output> <port id="1"> <dim>1</dim> <dim>1</dim> <dim>28</dim> <dim>28</dim> </port> </output>
I noticed the problem because after running the Model Optimizer, the model started giving wrong predictions. The workaround is to manually edit the dimensions of initial layers inside the XML file:
<dim>1</dim> <dim>1</dim> <dim>28</dim> <dim>28</dim>
The problem exists in R4, R5 versions and with mo.py and mo_tf.py files
One more finding: when playing with the same models on RaspberryPI and Python API:
1) when I keep the XML file not modified - the scripts runs without error, model makes bad predictions (so the result same as on Linux)
2) when I modify the XML file (the workaround specified in the post above) - the script returns error:
E: [xLink] [ 300849] dispatcherEventReceive:308 dispatcherEventReceive() Read failed -1 | event 0x6b5fee20 USB_WRITE_RESP E: [xLink] [ 300849] eventReader:256 eventReader stopped E: [watchdog] [ 301747] sendPingMessage:164 Failed send ping message: X_LINK_ERROR ^CE: [xLink] [ 301972] dispatcherWaitEventComplete:720 waiting is timeout, sending reset remote event E: [ncAPI] [ 301972] checkGraphMonitorResponse:1284 XLink error, rc: X_LINK_TIMEOUT E: [ncAPI] [ 301972] ncGraphQueueInference:3544 Can't get trigger response
when I try --input_shape [1,28,28,1] with unchanged pb file, so the command looks like the following:
python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ~/Documents/MNIST_OpenVino/output_graph.pb --input_shape [1,28,28,1] --output_dir ~/Documents/MNIST_OpenVino-ForumIdea/ --scale 255 --data_type FP32
I receive the following error of mo_tf.py file:
Model Optimizer version: 18.104.22.168d067a0 [ ERROR ] Shape [ 1 30 24 -1] is not fully defined for output 0 of "conv2d_1/add". Use --input_shape with positive integers to override model input shapes. [ ERROR ] Cannot infer shapes or values for node "conv2d_1/add". [ ERROR ] Not all output shapes were inferred or fully defined for node "conv2d_1/add". For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #40. [ ERROR ] [ ERROR ] It can happen due to bug in custom shape infer function <function tf_eltwise_ext.<locals>.<lambda> at 0x7f7bd1dc36a8>. [ ERROR ] Or because the node inputs have incorrect values/shapes. [ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape). [ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information. [ ERROR ] Stopped shape/value propagation at "conv2d_1/add" node. For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #38.
I am attaching the pb file, so you can play with it directly.
--input_shape [1,1,28,28] seems to work with your attached pb
mo_tf.py --input_model output_graph.pb --input_shape [1,1,28,28] --scale 255 --data_type FP32
[ SUCCESS ] Generated IR model.
Could you test inference with the generated IR? Is it correct?
well, yes, the MO is able to successfully generate bin/xml file with --input_shape [1,1,28,28] but the files are wrong:
1) in XML file the first few layers have shape 1,28,1,28
2) the model is giving wrong predictions
The workaround to fix the predictions is to manually edit the XML file so few first layers have shape 1,1,28,28. However, this workaround works only if the model is using CPU for computations - when tried on MYRIAD it prints errors mentioned in the post above.
In order to see the wrong predictions and workaround, I'm attaching script which is supposed to recognize number 5.
1) myMNIST_v3.py (it uses XML file generated by MO) - the model predicts number 5 to be 8
2) myMNIST_v3-workaround.py (it uses XML file with manually reordered dimansions: 1,1,28,28) - the model correctly predicts 5 to be 5
Sorry I am not seeing an issue with the model optimizer or inference on CPU or NCS. I don't know how you train and freeze to pb so I cannot test the end-to-end workflow. Let's forget OpenVino for now. Do you get the correct results if you run native TF? Could you possibly attach simple python code that loads your output_graph.pb, runs inference using tensorflow and gets the correct prediction on five.jpg
If that works, make sure the correct values are sued for mean subtraction and scaling of input - i.e. what --mean_values and --scale are you using for mo_tf.py for model optimization? Are the default values good or need to specify custom? There may be something in the image processing pipeline causing the discrepancy.
Another thing I noticed is that you run on Mac OS. Could that be causing any issues? Any chance to try the same on Linux? Not sure how much of OpenVino was validated on OS X and if there are any hidden endian issues somewhere in the pipeline.
Note that supported OS is
Ubuntu 16.04.3 LTS (64 bit)
Windows 10 (64 bit)
CentOS 7.4 (64 bit)
I run the OpenVino on Ubuntu 16.04.3 LTS (64 bit). I use Mac only for accessing this forum
I created scripts for the end-to-end workflow, starting with training a model, then exporting to 'pb' file, optimizing using OpenVino and running inference.
The attached ZIP file includes:
Steps to execute:
The predictions of each script are as following (due to an issue of Keras, I am not able to freeze the seed, therefore when you run the scripts, your predictions probabilities will be slightly different, but consistent for '1.py', '2.py', '3.py', and 'myMNIST_v3-workaround.py' scripts):
As you can see, the 'myMNIST_v3.py' file, which uses the file generated by the 'mo_tf.py' file gives wrong predictions, different than all the other scripts. Moreover, slight edition of the XML file fixes the problem. However, the fix works only if the model is run on CPU - if the model with fix runs on MYRIAD, it crashes. As you can see, there is somewhere a bug, which causes the wrong predictions.
Thanks @om77 for the interesting idea, but it doesn't work - the XML file has correct dimensions, but the prediction is still wrong.
Below you can see the summary of diferent tests run with diferent options. In all tests the following part of the command was the same:
python3 /opt/intel/computer_vision_sdk_2018.5.445/deployment_tools/model_optimizer/mo_tf.py --input_model ./Output/model.pb --input_shape [1,1,28,28] --output_dir ./Output/ --scale 255
In following tests I used additional parameters:
As you can see, the predictions are inconsistent. The correct prediction is only for the case with "--data_type FP32 + manually edited XML"
I am working on a similar example as the OP. A basic MNIST CNN from Keras to TF to IR and inferenced through the openVINO python API.
I was not experiencing this weird dimension mangling issue and after closer inspection I noticed the OP is using data_forma="channels_first" in their keras model. I was using the default data_format="channels_last". So my initial layer has input_shape=(28, 28, 1).
When I changed my keras model to use data_format='channels_first' I reproduced the exact same behavior seen from the OP. However, I do not get bad classification because I reshaped the data to fit the weird [1,28,1,28] dimensions and did not touch the xml file.
I did not do any FP32 vs FP16 tests, I am only inferencing on the NCS2 with FP16.
I am however running into errors when trying to implement batch inferencing with this model, but I won't hijack this thread.
Dear Soni, Neha,
As I've said before it's difficult to say what happened there without stepping through the debugger. Since you are uncomfortable sharing your model, you can build a DEBUG version of IE using dldt github openvino . Follow the Inference engine README carefully and build a Debug release of Inference Engine. You have full source code available to you. So you can step through and figure out exactly why Inference Engine is throwing that error.
It's definitely odd because Model Optimizer should not produce IR with negative dimensions.
Hope it helps,