- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to convert my model with OpenVINO. I have succeeded in converting a simple custom model, and the inference process gives almost the same values with the ones by tensorflow (the difference is smaller than 1e-6).
However when I try to do the same thing on an inceptiion-v3 based model, most infered labels are correct, but the possibilities are quite different with the tensorflow results.
The scale and mean_values parameters have an impact on the result, and I'm currently using the default values (127.5 and [127.5]*6 channels) as suggested.
I'm wondering what caused the differences in the infered results, should I modify these two parameters more carefully or did I miss some other important parameters?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sijie,
In many cases the errors come from wrong scale_values or mean_values or reverse_input_channels
https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow
There could be other issues too. Are you running FP32 on CPU ?
Here is another thread that we found at least three issues causing minor or significant discrepancies
https://software.intel.com/comment/1933099
Let us know what the issue was when you resolve this.
Cheers,
Nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@nikos
Thanks for the reply.
Yes I'm running FP32 on a CPU, and I will run the model on an FPGA for the next stage.
It seems that the reverse_input_channels flag requires C euqal 3, and in my case it's 6.
I would like to share it here if I can make some progress through the thread you posted.
Best regards,
Sijie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sijie,
Forgot to mention, another option would be to compare values inspecting each layer as was also described in other posts in this forum. Good luck!
nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry one more, also check for layout issues like InferenceEngine::Layout::NCHW vs. InferenceEngine::Layout::NHWC vs. NCHW if applicable in your case.
> requires C euqal 3, and in my case it's 6.
What layout are you using for 6 ?
Cheers,
Nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for reminding.
I am using NCHW now.
Does that make a difference? The simple custom model I mentioned has an H33*W4*C4 input shape, and I didn't disable NHWC to NCHW when I convert it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I figured out I should use 128.0 for scale and mean_values, but the results are not affected very much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@nikos
I'm also wondering how the model optimizer treats the batch normalization?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Found this
> https://github.com/tensorflow/tensorflow/issues/9724
Interesting! Yes, the way you freeze and treat dropout and batch normalization is very important too.
There was also another thread here with best practice on how to avoid TF training phase and freeze properly. Let me find and post here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think I have made some progress.
Batch normalization and dropout layer should have been removed in the inference graph by setting is_training = False. I tried to do inference with tensorflow with batch = 1 and the possibilities are almost unchanged (~1e-5). I didn't test dropout in this model, but in the other model, removing dropout is successful.
By the way, when I use mo_tf.py with -b 100, the detected batch in the OpenVINO tool, classification_sample.py, is still 1. Do you have any ideas?
Another point is the original model used exponential moving average in inference.
The possibilities look much better now, but in some hard-to-tell cases the labels still can be wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sijie,
Good progress! Yes setting to training phase or not makes a difference in the frozen graph you generate; sorry I never sent you more details on this
> By the way, when I use mo_tf.py with -b 100, the detected batch in the OpenVINO tool, classification_sample.py, is still 1. Do you have any ideas?
Have not tried the python api with batch size > 1 but c++ samples worked fine
I am seeing
> parser.add_argument("-i", "--input", help="Path to a folder with images or path to an image files", required=True,
> . . .
> net.batch_size = len(args.input)
Do you have many (more than 100) images in your input folder?
Also maybe try classification_sample_async.py
Is the batch set properly if you examine the .xml file?
> The possibilities look much better now, but in some hard-to-tell cases the labels still can be wrong.
Could you elaborate?
Cheers,
Nikos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikos,
> net.batch_size = len(args.input)
> Do you have many (more than 100) images in your input folder?
You are right, I tried to put all inputs into one file, but I forgot to change this. Thanks.
> The possibilities look much better now, but in some hard-to-tell cases the labels still can be wrong.
> Could you elaborate?
These are cases like that one model says the possibilities are [0.01,0.49,0.50] and another says [0.02,0.50,0.48].
Best regards,
Sijie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I figured out the problem comes from the Exponential Moving Average.
I removed the EMA in inference from the original model, and the results are identical now (<1e-6).
But if I add EMA to the frozen graph using the write_pb_file.py from https://github.com/tensorflow/tensorflow/issues/9724, I don't get the values I expected. So the key problem is freezing an inference graph with EMA.
The code I'm using:
> MOVING_AVERAGE_DECAY = 0.9999
> variable_averages = tf.train.ExponentialMovingAverage(
> MOVING_AVERAGE_DECAY)
> for var in variables_to_restore:
> tf.add_to_collection(tf.GraphKeys.MOVING_AVERAGE_VARIABLES, var)
> variables_to_restore = variable_averages.variables_to_restore()
which is the same with the original model.
I will seek help from the tensorflow forum about this.
Best regards,
Sijie
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page