Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

the inference result is totally different after converting onnx to openvino IR




I try to convert pytorch pretrained model ( into openvino IR


First, I download the pretrained model and save it to onnx


import torch
from torchvision.models.resnet import resnet50

net = resnet50(pretrained=True)
torch.onnx._export(net, x, 'test_model.onnx', export_params=True)


and then use to convert IR

 python --input_model /home/tumh/pytorch-cifar/test_model.onnx  --output_dir /home/tumh/test_model_FP32 --framework onnx --data_type FP32

and then tesing with, for simplicity, the input is randomly generated with fixed seed.

Here is what I changed


import numpy as np

# load net... success

n, c, h, w = net.inputs[input_blob].shape
images = np.ndarray(shape=(n, c, h, w))
# fix seed
r = np.random.randn(3,224,224)
images[0] = r

# process output

# show only 10 of 1000
#[-0.4592718  -0.12941386 -0.11573323 -0.75521946  0.5491318  -1.4393116
# -0.3861863  -0.40474018  0.401676   -1.4279357 ]


However, the result from pytorch is quite different, Here is the output from pytorch

[-0.9367, -0.3480, -0.4053, -0.9139, -0.5280, -0.2631, -0.5692,  0.4866, 0.3328, -0.3892]


by the way, if I convert onnx to caffe first, and then convert caffe model to IR, the result is identical. So I doubt this might be onnx to openvino issues.

And openvino claims they do support resnet50, Any idea?

0 Kudos
13 Replies
Valued Contributor I

Hi Ming-Hsuan, 

Have you solved this issue?  Please make sure you are using the correct mean / scale values / reverse_input_channels. What happens for example if you convert using --input_model test_model.onnx --data_type FP32 --mean_values [104.0,117.0,123.0] --scale 255 --reverse_input_channels

Not sure what the actual values are - above is just an example of using --mean_values, reverse_input_channels and --scale .


0 Kudos



I have tried to add mean, scale and reverse channel. 

but the result is different with pytorch.

Have you ever tried to reproduce my steps and see the result? it's easy to reproduce.



0 Kudos



it's much better if openvino can give some examples to convert some pretrained pytorch onnx models in the document.

0 Kudos
Valued Contributor I

Hi Ming-Hsuan, 

Yes, I did try to reproduce your steps using the details you included. That's how I realized that missing --mean_values and  --scale could cause issues.

> I have tried to add mean, scale and reverse channel. 

What values did you use?

> Have you ever tried to reproduce my steps and see the result? it's easy to reproduce.

Not so easy to test your end-to-end pipeline without your test code. Could you please attach test code you use for inference? Since this was missing I tried testing from my C++ test code on my test data and found no issues when appropriate parameters were used.



0 Kudos



here is my testing code

import numpy as np
from torchvision.models.resnet import resnet50
import torchvision.transforms as transforms
from PIL import Image

transform_test = transforms.Compose([
    transforms.Resize(( 224,224 )),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),

net = resnet50(pretrained=True)
im ='/home/tumh/dog.jpeg')
x = transform_test(im)
x = x.unsqueeze(dim=0)
print (net(x)[0][0:10])


and the conversion code


python --input_model /home/tumh/pytorch-cifar/test_model.onnx --output_dir /home/tumh/test_model_FP32 --scale_values [51.5865,50.847,51.255] --mean_values [125.307,122.961,113.8575] --framework onnx --data_type FP32 --reverse_input_channels


I switch to the real image and test with (see the attachment)


python -m /home/tumh/test_model_FP32/test_model.xml -i ~/dog.jpeg


the result from openvino


[ 0.93463564 -3.0111456  -2.6969056  -0.7203666  -3.5170152  -1.5799323
 -4.9336224  -0.02198645  0.77573025 -0.03216804]


the result from pytorch


[-0.9243, -0.3208, -0.4276, -0.9591, -0.6213, -0.2226, -0.7890,  0.6524, 0.4050, -0.5093]


0 Kudos
Valued Contributor I

Excellent - thank you for the additional information. Will work on this over the weekend - too busy at work.... I want to root-cause this too because I have similar issues.

> it's much better if openvino can give some examples to convert some pretrained pytorch onnx models in the document.

I agree that would be nice but on the other hand I prefer them spending time optimizing the SDK and working on new features too instead of writing samples for every possible combination of framework conversion.

In the meantime, a few thoughts:

- this is a multi-stage pipeline using many frameworks and I would NOT expect same numbers. Maybe we have to statistically compare results. Even worse after a model optimization process we expect minor discrepancies, correct? 

- Are the original test networks trained on Imagenet or CIFAR - are weights loaded properly?

- Let's ignore output vectors for now. What is the classification result of pytorch, what is if run onnx inference (have you tried?) what is the classification result of openvino fp32 ?

- Have you tried the validation tool to get a better overall idea of accuracy?

- Still not convinced those are the right parameters - will test  --scale_values [51.5865,50.847,51.255] --mean_values [125.307,122.961,113.8575]  




0 Kudos


 this is a multi-stage pipeline using many frameworks and I would NOT expect same numbers. Maybe we have to statistically compare results. Even worse after a model optimization process we expect minor discrepancies, correct

Well, minor difference (1e-6) is accepted.  


- Are the original test networks trained on Imagenet or CIFAR - are weights loaded properly?


Let's ignore output vectors for now. What is the classification result of pytorch, what is if run onnx inference (have you tried?) what is the classification result of openvino fp32 ?

the original weights is for imagenet, it's from offical pytorch model zoo. Indeed there are 1000 output values, but for simplicity I just print 10 of 1000 values. I have not verified the classification result (whether it's dog or others). 

Have you tried the validation tool to get a better overall idea of accuracy?

Not yet. But it seems this issue should be solved first before I calculate the overall accuracy.

Still not convinced those are the right parameters - will test  --scale_values [51.5865,50.847,51.255] --mean_values [125.307,122.961,113.8575] 

 it's a simple math. since in pytorch, the input is always normalized to [0,1]. and for imagenet, the mean is (0.4914, 0.4822, 0.4465), and std is (0.2023, 0.1994, 0.2010).

so the overall preprocess for R channel is ((r/255)-0.4914)/0.2023, to get the equivalent steps in openvino, we have (r-0.4914*255)/ (0.2023*255). 

0 Kudos
Valued Contributor I

What do you get if set to testing mode using net.eval() like

net = resnet50(pretrained=True)

net.eval() #################################

im ='/home/tumh/dog.jpeg')
x = transform_test(im)
x = x.unsqueeze(dim=0)
print (net(x)[0][0:10])

My output seems now closer to OpenVino

pytorch  [ 0.89 -3.04 -2.70 -0.74 -3.56 -1.79 -4.84  0.10 0.86  0.05]
openvino [ 0.93 -3.01 -2.69 -0.72 -3.51 -1.57 -4.93 -0.02 0.77 -0.03]

and the classification result in OpenVino is correct too.

python3 --labels test_model.labels  -m test_model.xml -i dog.jpeg

[ INFO ] Loading network files:
[ INFO ] Preparing input blobs
[ WARNING ] Image dog.jpeg is resized from (216, 233) to (224, 224)
[ INFO ] Batch size is 1
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Average running time of one iteration: 16.50834083557129 ms
[ INFO ] Processing output blob
[ 0.93463564 -3.0111456  -2.6969056  -0.7203666  -3.5170152  -1.5799323
 -4.9336224  -0.02198645  0.77573025 -0.03216804]
[ INFO ] Top 10 results: 

Image dog.jpeg

15.3578529 label German shepherd
11.2073421 label Leonberg
10.9584837 label malinois
9.9125881 label Norwegian elkhound, elkhound
8.9993887 label Irish wolfhound
8.9059830 label groenendael
8.5530519 label African hunting dog
8.4389133 label Afghan hound
7.9750319 label borzoi
7.9166555 label kelpie
same labels as pytorch

15.4252 n02106662 German shepherd, German shepherd dog, German police dog, alsatian
11.2401 n02111129 Leonberg
11.0313 n02105162 malinois
9.7304  n02091467 Norwegian elkhound, elkhound
8.9736  n02090721 Irish wolfhound
8.8621  n02105056 groenendael
8.5262  n02116738 African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
8.4578  n02088094 Afghan hound, Afghan
7.9833  n02090622 borzoi, Russian wolfhound
7.8347  n02105412 kelpie


0 Kudos


it's good to see a better result after add .eval(). And I got the same result with you.

However, don't you think the error is too large? for example, the difference between 0.86 and 0.77 is about 0.1 and the sign (0.10 and -0.02 ) is different too

I think an issue may occur when the output of the network is embedding (for example, the typical dimension of the face embedding is 128).  In face reid,  we always compare the distance of two embeddings (vectors) to identify whether they are two identical persons. So the true positive verification rate may be changed if the embedding is changed after the conversion. 

by the way, if I convert the onnx to caffe, and then convert caffe to IR, the result is almost identical (only 10e-5 difference). So don't you think there might be an numerical issue when converting onnx to IR?


0 Kudos
Valued Contributor I

Hi Ming-Hsuan,

> However, don't you think the error is too large? for example, the difference between 0.86 and 0.77 is about 0.1 and the sign (0.10 and -0.02 ) is different too

In my experience I expect this kind of error but let's make sure. For that we would have to compare output on the same input - image processing is different in the two pipelines so i is not a fair comparison. Even in OpenVino comparing cv2.resize to "auto resize in OpenVino" will have different results.

Now that .eval() fixed most of the issues let's go back to your original idea of pushing the same fixed input vector

n, c, h, w = net.inputs[input_blob].shape
images = np.ndarray(shape=(n, c, h, w))
# fix seed
r = np.random.randn(3,224,224)
images[0] = r

We also have to make sure the input is in the same layout too ( NHCW vs NHWC )

I have the code running for OpenVino based on your modification above. Could you attach the pytorch test script you used for random / seed so that we can compare with fixed input?




0 Kudos
Valued Contributor I

Problem seems to be solved when you save input vector from pytorch and load to OpenVino so it seems it was the image processing causing discrepancies, try this

net = resnet50(pretrained=True)
im ='dog.jpeg')
x = transform_test(im)
x = x.unsqueeze(dim=0)

print (x.shape)"test_in_vector", x)

and then load from OpenVino to ensure same input

    r = np.load("test_in_vector.npy")

You would have to change the way we create IR for this experiment , also try to disable optimizations for first test --input_model test_model.onnx --data_type FP32  --disable_resnet_optimization --disable_fusing --disable_gfusing --data_type=FP32


OpenVino now agrees with pytorch

[ INFO ] Average running time of one iteration: 18.108606338500977 ms
[ INFO ] Processing output blob
[ 0.8969836  -3.0496185  -2.7041526  -0.7479727  -3.562203   -1.7981005
 -4.8486257   0.10939903  0.86848104  0.05356242]
[ INFO ] Top 10 results: 
Image dog.jpeg

15.4252462 label German shepherd
11.2401333 label Leonberg
11.0313234 label malinois
9.7304478 label Norwegian elkhound
8.9735994 label Irish wolfhound
8.8621044 label groenendael
8.5262327 label African huntingdog
8.4578342 label Afghanhound
7.9833093 label borzoi
7.8347163 label kelpie

same output as pytorch

15.4252 n02106662 German shepherd, German shepherd dog, German police dog, alsatian
11.2401 n02111129 Leonberg
11.0313 n02105162 malinois
9.7304  n02091467 Norwegian elkhound, elkhound
8.9736  n02090721 Irish wolfhound
8.8621  n02105056 groenendael
8.5262  n02116738 African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
8.4578  n02088094 Afghan hound, Afghan
7.9833  n02090622 borzoi, Russian wolfhound
7.8347  n02105412 kelpie

Finally try with model optimizer - very similar output --input_model test_model.onnx --data_type FP32  --data_type=FP32

15.4252472 label German shepherd
11.2401333 label Leonberg
11.0313206 label malinois
9.7304420 label Norwegian elkhound
8.9735994 label Irish wolfhound
8.8621025 label groenendael
8.5262289 label African huntingdog
8.4578362 label Afghan hound
7.9833107 label borzoi
7.8347163 label kelpie

JFTR captured some of this in

Can you verify? Could you also check if there is a normalization issue ( ) ?

Any more issues?

0 Kudos

pytorch version is 1.01 newest

openvino version is 2018R5

and inference result is totally different in pytorch and openvino !

i use code like this :

-------- pytorch model convert to onnx

    import onnx
    import torch

    from torchvision.models.resnet import resnet50

    net = resnet50(pretrained=True)


    torch.onnx._export(net, x, 'test_model.onnx', export_params=True)


--------convert to openvino

     python3 --input_model /home/forum-test/test_model.onnx  --output_dir /home/forum-test/mymodel --framework onnx --data_type FP32


----test model in pytorch      

    import numpy as np
    from torchvision.models.resnet import resnet50
    import torchvision.transforms as transforms
    from PIL import Image

    transform_test = transforms.Compose([transforms.Resize(( 224,224 )),transforms.ToTensor(),transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994,0.2010)),])

    net = resnet50(pretrained=True)
    im ='1.jpg')
    x = transform_test(im)
    x = x.unsqueeze(dim=0)
    print (net(x)[0][0:10])

result in pytorch:
    tensor([-1.1601, -0.5671, -0.4668, -1.2231, -0.6918, -0.3618, -0.7984,  0.3102, 0.1104, -0.7210], grad_fn=<SliceBackward>)


----test xml bin in openvino

python3 -m /home/forum-test/mymodel/test_model.xml -i 1.jpg


result in openvino:

[-3.3632674  -2.8450186  -1.418541   -3.3199158  -3.919244   -1.2973417
 -0.56975985 -0.22444369  1.0697088  -2.761873]


0 Kudos

Hi guys, 

Actually the main problem relates to different implementations of resize() function in PIL and OpenCV + .jpeg images.

Let's test on 224x224 .png image (so we don't use resize and .jpeg images)

----------- -------------

import onnx
import torch

from torchvision.models.resnet import resnet50
x = torch.randn((1, 3, 224, 224))

net = resnet50(pretrained=True)

torch.onnx.export(net, x, 'resnet_test.onnx', export_params=True)

Run Model Optimizer on exported .onnx model.

python /opt/intel/openvino/deployment_tools/model_optimizer/ --input_model ./resnet_test.onnx --data_type=FP32   --mean_values [123.675,116.28,103.53]  --scale_values [58.395,57.12,57.375] --reverse_input_channels

----------- -------------

import numpy as np
from torchvision.models.resnet import resnet50
import torchvision.transforms as transforms
from PIL import Image
import cv2 as cv

transforms_test = transforms.Compose([transforms.ToTensor(), 
                                                                transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                                                   std=[0.229, 0.224, 0.225])])
net = resnet50(pretrained=True)
img ='2.png')

x = transforms_test(img)
x = x.unsqueeze(dim=0)



tensor([-2.6386,  1.3077, -2.2054, -2.5019, -1.6188, -2.0899, -1.2727, -0.5596,
        -1.7433, -3.0217], grad_fn=<SliceBackward>)

In classification sample I added the line with print first 10 elements:

# Processing output blob"Processing output blob")
res = res[out_blob]"Top {} results: ".format(args.number_top))



python -m resnet_test.xml -i 2.png


 INFO ] Creating Inference Engine
[ INFO ] Loading network files:
[ INFO ] Preparing input blobs
[ INFO ] Batch size is 1
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference in synchronous mode
[ INFO ] Processing output blob
[ INFO ] Top 10 results: 
[-2.638615   1.3077483 -2.2053905 -2.5018773 -1.6187974 -2.089907
 -1.2726829 -0.5595517 -1.7432679 -3.0216992]
Image 2.png

classid probability
------- -----------
  159     13.1708260
  168     11.3585939
  211     8.2475233
  167     7.9942780
  166     7.8777084
  162     7.6203313
  237     7.4518971
  165     7.2671924
  434     6.9654808
  171     6.7230196


Input image is attached.

159 class in ImageNet relates to 'Rhodesian ridgeback' and  168 class relates to 'redbone', so it seems that classification result is correct:)

Hope it helps!

0 Kudos