the inference result is totally different after converting onnx to openvino IR

Tu__Ming-Hsuan · ‎01-17-2019

Hi,

I try to convert pytorch pretrained model (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) into openvino IR

First, I download the pretrained model and save it to onnx

import torch
from torchvision.models.resnet import resnet50

net = resnet50(pretrained=True)
x=torch.randn((1,3,224,224))
torch.onnx._export(net, x, 'test_model.onnx', export_params=True)

and then use mo.py to convert IR

 python mo.py --input_model /home/tumh/pytorch-cifar/test_model.onnx  --output_dir /home/tumh/test_model_FP32 --framework onnx --data_type FP32

and then tesing with classification_sample.py, for simplicity, the input is randomly generated with fixed seed.

Here is what I changed

import numpy as np

# load net... success

n, c, h, w = net.inputs[input_blob].shape
images = np.ndarray(shape=(n, c, h, w))
# fix seed
np.random.seed(133)
r = np.random.randn(3,224,224)
images[0] = r

# process output

# show only 10 of 1000
#[-0.4592718  -0.12941386 -0.11573323 -0.75521946  0.5491318  -1.4393116
# -0.3861863  -0.40474018  0.401676   -1.4279357 ]

However, the result from pytorch is quite different, Here is the output from pytorch

[-0.9367, -0.3480, -0.4053, -0.9139, -0.5280, -0.2631, -0.5692,  0.4866, 0.3328, -0.3892]

by the way, if I convert onnx to caffe first, and then convert caffe model to IR, the result is identical. So I doubt this might be onnx to openvino issues.

And openvino claims they do support resnet50, Any idea?

nikos1 · ‎01-23-2019

Hi Ming-Hsuan,

Have you solved this issue? Please make sure you are using the correct mean / scale values / reverse_input_channels. What happens for example if you convert using

mo_onnx.py --input_model test_model.onnx --data_type FP32 --mean_values [104.0,117.0,123.0] --scale 255 --reverse_input_channels

Not sure what the actual values are - above is just an example of using --mean_values, reverse_input_channels and --scale .

nikos

Tu__Ming-Hsuan · ‎01-23-2019

@nikos

I have tried to add mean, scale and reverse channel.

but the result is different with pytorch.

Have you ever tried to reproduce my steps and see the result? it's easy to reproduce.

Tu__Ming-Hsuan · ‎01-23-2019

@nikos

it's much better if openvino can give some examples to convert some pretrained pytorch onnx models in the document.

nikos1 · ‎01-23-2019

Hi Ming-Hsuan,

Yes, I did try to reproduce your steps using the details you included. That's how I realized that missing --mean_values and --scale could cause issues.

> I have tried to add mean, scale and reverse channel.

What values did you use?

> Have you ever tried to reproduce my steps and see the result? it's easy to reproduce.

Not so easy to test your end-to-end pipeline without your test code. Could you please attach test code you use for inference? Since this was missing I tried testing from my C++ test code on my test data and found no issues when appropriate parameters were used.

cheers,

nikos

Tu__Ming-Hsuan · ‎01-23-2019

@nikos

here is my testing code

import numpy as np
from torchvision.models.resnet import resnet50
import torchvision.transforms as transforms
from PIL import Image

transform_test = transforms.Compose([
    transforms.Resize(( 224,224 )),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

net = resnet50(pretrained=True)
im = Image.open('/home/tumh/dog.jpeg')
x = transform_test(im)
x = x.unsqueeze(dim=0)
print (net(x)[0][0:10])

and the conversion code

python mo.py --input_model /home/tumh/pytorch-cifar/test_model.onnx --output_dir /home/tumh/test_model_FP32 --scale_values [51.5865,50.847,51.255] --mean_values [125.307,122.961,113.8575] --framework onnx --data_type FP32 --reverse_input_channels

I switch to the real image and test with classification_sample.py (see the attachment)

python classification_sample.py -m /home/tumh/test_model_FP32/test_model.xml -i ~/dog.jpeg

the result from openvino

[ 0.93463564 -3.0111456  -2.6969056  -0.7203666  -3.5170152  -1.5799323
 -4.9336224  -0.02198645  0.77573025 -0.03216804]

the result from pytorch

[-0.9243, -0.3208, -0.4276, -0.9591, -0.6213, -0.2226, -0.7890,  0.6524, 0.4050, -0.5093]

nikos1 · ‎01-24-2019

Excellent - thank you for the additional information. Will work on this over the weekend - too busy at work.... I want to root-cause this too because I have similar issues.

> it's much better if openvino can give some examples to convert some pretrained pytorch onnx models in the document.

I agree that would be nice but on the other hand I prefer them spending time optimizing the SDK and working on new features too instead of writing samples for every possible combination of framework conversion.

In the meantime, a few thoughts:

- this is a multi-stage pipeline using many frameworks and I would NOT expect same numbers. Maybe we have to statistically compare results. Even worse after a model optimization process we expect minor discrepancies, correct?

- Are the original test networks trained on Imagenet or CIFAR - are weights loaded properly?

- Let's ignore output vectors for now. What is the classification result of pytorch, what is if run onnx inference (have you tried?) what is the classification result of openvino fp32 ?

- Have you tried the validation tool to get a better overall idea of accuracy?

- Still not convinced those are the right parameters - will test --scale_values [51.5865,50.847,51.255] --mean_values [125.307,122.961,113.8575]

Cheers,

nikos

Tu__Ming-Hsuan · ‎01-24-2019

this is a multi-stage pipeline using many frameworks and I would NOT expect same numbers. Maybe we have to statistically compare results. Even worse after a model optimization process we expect minor discrepancies, correct

Well, minor difference (1e-6) is accepted.

- Are the original test networks trained on Imagenet or CIFAR - are weights loaded properly?

Let's ignore output vectors for now. What is the classification result of pytorch, what is if run onnx inference (have you tried?) what is the classification result of openvino fp32 ?

the original weights is for imagenet, it's from offical pytorch model zoo. Indeed there are 1000 output values, but for simplicity I just print 10 of 1000 values. I have not verified the classification result (whether it's dog or others).

Have you tried the validation tool to get a better overall idea of accuracy?

Not yet. But it seems this issue should be solved first before I calculate the overall accuracy.

Still not convinced those are the right parameters - will test --scale_values [51.5865,50.847,51.255] --mean_values [125.307,122.961,113.8575]

it's a simple math. since in pytorch, the input is always normalized to [0,1]. and for imagenet, the mean is (0.4914, 0.4822, 0.4465), and std is (0.2023, 0.1994, 0.2010).

so the overall preprocess for R channel is ((r/255)-0.4914)/0.2023, to get the equivalent steps in openvino, we have (r-0.4914*255)/ (0.2023*255).

nikos1 · ‎01-25-2019

What do you get if set to testing mode using net.eval() like

net = resnet50(pretrained=True)

net.eval() #################################

im = Image.open('/home/tumh/dog.jpeg')
x = transform_test(im)
x = x.unsqueeze(dim=0)
print (net(x)[0][0:10])

My output seems now closer to OpenVino

pytorch  [ 0.89 -3.04 -2.70 -0.74 -3.56 -1.79 -4.84  0.10 0.86  0.05]
openvino [ 0.93 -3.01 -2.69 -0.72 -3.51 -1.57 -4.93 -0.02 0.77 -0.03]

and the classification result in OpenVino is correct too.

python3 classification_sample.py --labels test_model.labels  -m test_model.xml -i dog.jpeg

[ INFO ] Loading network files:
	test_model.xml
	test_model.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image dog.jpeg is resized from (216, 233) to (224, 224)
[ INFO ] Batch size is 1
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Average running time of one iteration: 16.50834083557129 ms
[ INFO ] Processing output blob
[ 0.93463564 -3.0111456  -2.6969056  -0.7203666  -3.5170152  -1.5799323
 -4.9336224  -0.02198645  0.77573025 -0.03216804]
[ INFO ] Top 10 results: 

Image dog.jpeg

15.3578529 label German shepherd
11.2073421 label Leonberg
10.9584837 label malinois
9.9125881 label Norwegian elkhound, elkhound
8.9993887 label Irish wolfhound
8.9059830 label groenendael
8.5530519 label African hunting dog
8.4389133 label Afghan hound
7.9750319 label borzoi
7.9166555 label kelpie

same labels as pytorch

15.4252 n02106662 German shepherd, German shepherd dog, German police dog, alsatian
11.2401 n02111129 Leonberg
11.0313 n02105162 malinois
9.7304  n02091467 Norwegian elkhound, elkhound
8.9736  n02090721 Irish wolfhound
8.8621  n02105056 groenendael
8.5262  n02116738 African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
8.4578  n02088094 Afghan hound, Afghan
7.9833  n02090622 borzoi, Russian wolfhound
7.8347  n02105412 kelpie

Tu__Ming-Hsuan · ‎01-26-2019

@nikos

it's good to see a better result after add .eval(). And I got the same result with you.

However, don't you think the error is too large? for example, the difference between 0.86 and 0.77 is about 0.1 and the sign (0.10 and -0.02 ) is different too

I think an issue may occur when the output of the network is embedding (for example, the typical dimension of the face embedding is 128). In face reid, we always compare the distance of two embeddings (vectors) to identify whether they are two identical persons. So the true positive verification rate may be changed if the embedding is changed after the conversion.

by the way, if I convert the onnx to caffe, and then convert caffe to IR, the result is almost identical (only 10e-5 difference). So don't you think there might be an numerical issue when converting onnx to IR?

nikos1 · ‎01-26-2019

Hi Ming-Hsuan,

> However, don't you think the error is too large? for example, the difference between 0.86 and 0.77 is about 0.1 and the sign (0.10 and -0.02 ) is different too

In my experience I expect this kind of error but let's make sure. For that we would have to compare output on the same input - image processing is different in the two pipelines so i is not a fair comparison. Even in OpenVino comparing cv2.resize to "auto resize in OpenVino" will have different results.

Now that .eval() fixed most of the issues let's go back to your original idea of pushing the same fixed input vector

n, c, h, w = net.inputs[input_blob].shape
images = np.ndarray(shape=(n, c, h, w))
# fix seed
np.random.seed(133)
r = np.random.randn(3,224,224)
images[0] = r

We also have to make sure the input is in the same layout too ( NHCW vs NHWC )

~~I have the code running for OpenVino based on your modification above. Could you attach the pytorch test script you used for random / seed so that we can compare with fixed input?~~

Cheers,

Nikos

nikos1 · ‎01-26-2019

Problem seems to be solved when you save input vector from pytorch and load to OpenVino so it seems it was the image processing causing discrepancies, try this

net = resnet50(pretrained=True)
net.eval()
im = Image.open('dog.jpeg')
x = transform_test(im)
x = x.unsqueeze(dim=0)

print (x.shape)
np.save("test_in_vector", x)

and then load from OpenVino to ensure same input

    r = np.load("test_in_vector.npy")

You would have to change the way we create IR for this experiment , also try to disable optimizations for first test

mo_onnx.py --input_model test_model.onnx --data_type FP32  --disable_resnet_optimization --disable_fusing --disable_gfusing --data_type=FP32

OpenVino now agrees with pytorch

[ INFO ] Average running time of one iteration: 18.108606338500977 ms
[ INFO ] Processing output blob
[ 0.8969836  -3.0496185  -2.7041526  -0.7479727  -3.562203   -1.7981005
 -4.8486257   0.10939903  0.86848104  0.05356242]
[ INFO ] Top 10 results: 
Image dog.jpeg

15.4252462 label German shepherd
11.2401333 label Leonberg
11.0313234 label malinois
9.7304478 label Norwegian elkhound
8.9735994 label Irish wolfhound
8.8621044 label groenendael
8.5262327 label African huntingdog
8.4578342 label Afghanhound
7.9833093 label borzoi
7.8347163 label kelpie

same output as pytorch

15.4252 n02106662 German shepherd, German shepherd dog, German police dog, alsatian
11.2401 n02111129 Leonberg
11.0313 n02105162 malinois
9.7304  n02091467 Norwegian elkhound, elkhound
8.9736  n02090721 Irish wolfhound
8.8621  n02105056 groenendael
8.5262  n02116738 African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
8.4578  n02088094 Afghan hound, Afghan
7.9833  n02090622 borzoi, Russian wolfhound
7.8347  n02105412 kelpie

Finally try with model optimizer - very similar output

mo_onnx.py --input_model test_model.onnx --data_type FP32 --data_type=FP32

15.4252472 label German shepherd
11.2401333 label Leonberg
11.0313206 label malinois
9.7304420 label Norwegian elkhound
8.9735994 label Irish wolfhound
8.8621025 label groenendael
8.5262289 label African huntingdog
8.4578362 label Afghan hound
7.9833107 label borzoi
7.8347163 label kelpie

JFTR captured some of this in https://github.com/ngeorgis/pytorch_onnx_openvino

Can you verify? Could you also check if there is a normalization issue ( https://github.com/ngeorgis/pytorch_onnx_openvino/issues/1 ) ?

Any more issues?

x__d · ‎04-27-2019

pytorch version is 1.01 newest

openvino version is 2018R5

and inference result is totally different in pytorch and openvino !

i use code like this ：

-------- pytorch model convert to onnx

import onnx
import torch

from torchvision.models.resnet import resnet50

net = resnet50(pretrained=True)

x=torch.randn((1,3,224,224))

torch.onnx._export(net, x, 'test_model.onnx', export_params=True)

--------convert to openvino

python3 mo.py --input_model /home/forum-test/test_model.onnx --output_dir /home/forum-test/mymodel --framework onnx --data_type FP32

----test model in pytorch

   import numpy as np
   from torchvision.models.resnet import resnet50
   import torchvision.transforms as transforms
   from PIL import Image

transform_test = transforms.Compose([transforms.Resize(( 224,224 )),transforms.ToTensor(),transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994,0.2010)),])

   net = resnet50(pretrained=True)

   net.eval()

   im = Image.open('1.jpg')
   x = transform_test(im)
   x = x.unsqueeze(dim=0)
   print (net(x)[0][0:10])

result in pytorch:
tensor([-1.1601, -0.5671, -0.4668, -1.2231, -0.6918, -0.3618, -0.7984, 0.3102, 0.1104, -0.7210], grad_fn=<SliceBackward>)

----test xml bin in openvino

python3 classification_sample.py -m /home/forum-test/mymodel/test_model.xml -i 1.jpg

result in openvino:

[-3.3632674 -2.8450186 -1.418541 -3.3199158 -3.919244 -1.2973417
-0.56975985 -0.22444369 1.0697088 -2.761873]

Alexey_G_Intel · ‎11-01-2019

Hi guys,

Actually the main problem relates to different implementations of resize() function in PIL and OpenCV + .jpeg images.

Let's test on 224x224 .png image (so we don't use resize and .jpeg images)

----------- resnet_export.py -------------

import onnx
import torch

from torchvision.models.resnet import resnet50
                                                               
x = torch.randn((1, 3, 224, 224))

net = resnet50(pretrained=True)
net.eval()

torch.onnx.export(net, x, 'resnet_test.onnx', export_params=True)

Run Model Optimizer on exported .onnx model.

python /opt/intel/openvino/deployment_tools/model_optimizer/mo_onnx.py --input_model ./resnet_test.onnx --data_type=FP32   --mean_values [123.675,116.28,103.53]  --scale_values [58.395,57.12,57.375] --reverse_input_channels

----------- resnet_sample.py -------------

import numpy as np
from torchvision.models.resnet import resnet50
import torchvision.transforms as transforms
from PIL import Image
import cv2 as cv

transforms_test = transforms.Compose([transforms.ToTensor(), 
                                                                transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                                                   std=[0.229, 0.224, 0.225])])
                                                               
net = resnet50(pretrained=True)
net.eval()
img = Image.open('2.png')

x = transforms_test(img)
x = x.unsqueeze(dim=0)

print(net(x)[0][0:10])

Output:

tensor([-2.6386, 1.3077, -2.2054, -2.5019, -1.6188, -2.0899, -1.2727, -0.5596,
-1.7433, -3.0217], grad_fn=<SliceBackward>)

In classification sample I added the line with print first 10 elements:

# Processing output blob
log.info("Processing output blob")
res = res[out_blob]
log.info("Top {} results: ".format(args.number_top))

print(res[0][:10])

python classification_sample.py -m resnet_test.xml -i 2.png

INFO ] Creating Inference Engine
[ INFO ] Loading network files:
resnet_test.xml
resnet_test.bin
[ INFO ] Preparing input blobs
[ INFO ] Batch size is 1
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference in synchronous mode
[ INFO ] Processing output blob
[ INFO ] Top 10 results:
[-2.638615 1.3077483 -2.2053905 -2.5018773 -1.6187974 -2.089907
-1.2726829 -0.5595517 -1.7432679 -3.0216992]
Image 2.png

classid probability
------- -----------
159 13.1708260
168 11.3585939
211 8.2475233
167 7.9942780
166 7.8777084
162 7.6203313
237 7.4518971
165 7.2671924
434 6.9654808
171 6.7230196

Input image is attached.

159 class in ImageNet relates to 'Rhodesian ridgeback' and 168 class relates to 'redbone', so it seems that classification result is correct:)

Hope it helps!