Scale and mean values parameters in model optimizer in OpenVINO

dai__sijie · ‎02-12-2019

I'm trying to convert my model with OpenVINO. I have succeeded in converting a simple custom model, and the inference process gives almost the same values with the ones by tensorflow (the difference is smaller than 1e-6).

However when I try to do the same thing on an inceptiion-v3 based model, most infered labels are correct, but the possibilities are quite different with the tensorflow results.

The scale and mean_values parameters have an impact on the result, and I'm currently using the default values (127.5 and [127.5]*6 channels) as suggested.

I'm wondering what caused the differences in the infered results, should I modify these two parameters more carefully or did I miss some other important parameters?

nikos1 · ‎02-12-2019

Hello Sijie,

In many cases the errors come from wrong scale_values or mean_values or reverse_input_channels

https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow

There could be other issues too. Are you running FP32 on CPU ?

Here is another thread that we found at least three issues causing minor or significant discrepancies

https://software.intel.com/comment/1933099

Let us know what the issue was when you resolve this.

Cheers,

Nikos

dai__sijie · ‎02-12-2019

@nikos

Thanks for the reply.

Yes I'm running FP32 on a CPU, and I will run the model on an FPGA for the next stage.

It seems that the reverse_input_channels flag requires C euqal 3, and in my case it's 6.

I would like to share it here if I can make some progress through the thread you posted.

Best regards,

Sijie

nikos1 · ‎02-12-2019

Hello Sijie,

Forgot to mention, another option would be to compare values inspecting each layer as was also described in other posts in this forum. Good luck!

nikos

nikos1 · ‎02-12-2019

Sorry one more, also check for layout issues like InferenceEngine::Layout::NCHW vs. InferenceEngine::Layout::NHWC vs. NCHW if applicable in your case.

> requires C euqal 3, and in my case it's 6.

What layout are you using for 6 ?

Cheers,

Nikos

dai__sijie · ‎02-12-2019

Thanks for reminding.

I am using NCHW now.

Does that make a difference? The simple custom model I mentioned has an H33*W4*C4 input shape, and I didn't disable NHWC to NCHW when I convert it.

dai__sijie · ‎02-13-2019

I figured out I should use 128.0 for scale and mean_values, but the results are not affected very much.

dai__sijie · ‎02-13-2019

Found this

https://github.com/tensorflow/tensorflow/issues/9724

dai__sijie · ‎02-13-2019

@nikos

I'm also wondering how the model optimizer treats the batch normalization?

nikos1 · ‎02-13-2019

> Found this

> https://github.com/tensorflow/tensorflow/issues/9724

Interesting! Yes, the way you freeze and treat dropout and batch normalization is very important too.

There was also another thread here with best practice on how to avoid TF training phase and freeze properly. Let me find and post here.

dai__sijie · ‎02-14-2019

I think I have made some progress.

Batch normalization and dropout layer should have been removed in the inference graph by setting is_training = False. I tried to do inference with tensorflow with batch = 1 and the possibilities are almost unchanged (~1e-5). I didn't test dropout in this model, but in the other model, removing dropout is successful.

By the way, when I use mo_tf.py with -b 100, the detected batch in the OpenVINO tool, classification_sample.py, is still 1. Do you have any ideas?

Another point is the original model used exponential moving average in inference.

The possibilities look much better now, but in some hard-to-tell cases the labels still can be wrong.

nikos1 · ‎02-14-2019

Hi sijie,

Good progress! Yes setting to training phase or not makes a difference in the frozen graph you generate; sorry I never sent you more details on this

> By the way, when I use mo_tf.py with -b 100, the detected batch in the OpenVINO tool, classification_sample.py, is still 1. Do you have any ideas?

Have not tried the python api with batch size > 1 but c++ samples worked fine

I am seeing

> parser.add_argument("-i", "--input", help="Path to a folder with images or path to an image files", required=True,

> . . .

> net.batch_size = len(args.input)

Do you have many (more than 100) images in your input folder?

Also maybe try classification_sample_async.py

Is the batch set properly if you examine the .xml file?

> The possibilities look much better now, but in some hard-to-tell cases the labels still can be wrong.

Could you elaborate?

Cheers,

Nikos

dai__sijie · ‎02-14-2019

Hi Nikos,

> net.batch_size = len(args.input)

> Do you have many (more than 100) images in your input folder?

You are right, I tried to put all inputs into one file, but I forgot to change this. Thanks.

> The possibilities look much better now, but in some hard-to-tell cases the labels still can be wrong.

> Could you elaborate?

These are cases like that one model says the possibilities are [0.01,0.49,0.50] and another says [0.02,0.50,0.48].

Best regards,

Sijie

dai__sijie · ‎02-19-2019

I figured out the problem comes from the Exponential Moving Average.

I removed the EMA in inference from the original model, and the results are identical now (<1e-6).

But if I add EMA to the frozen graph using the write_pb_file.py from https://github.com/tensorflow/tensorflow/issues/9724, I don't get the values I expected. So the key problem is freezing an inference graph with EMA.

The code I'm using:

> MOVING_AVERAGE_DECAY = 0.9999
> variable_averages = tf.train.ExponentialMovingAverage(
> MOVING_AVERAGE_DECAY)
> for var in variables_to_restore:
> tf.add_to_collection(tf.GraphKeys.MOVING_AVERAGE_VARIABLES, var)
> variables_to_restore = variable_averages.variables_to_restore()

which is the same with the original model.

I will seek help from the tensorflow forum about this.

Best regards,

Sijie