openVino inference using onnx model

timosy · ‎02-28-2023

We are using openVino-21 and 22 to examine the inference response, checking the output features of the inference results. The results show that the output features are different in both versions.

In openVino-22, the onnx model seems to be optimized(?) when makeing inference process inside the openVino library, does this mean that the operation layers of the onnx model is fused inside the openVino library, like conversion to the IR model from the onnx model was done in mo.py?

This is a simple question, thank you in advance.

Iffa_Intel · ‎03-01-2023

Hi,

thank you for reaching out to us.

We'll get back to you with thorough details as soon as possible.

We appreciate your patience.

Cordially,

Iffa

Iffa_Intel · ‎03-02-2023

Generally, OpenVINO can read ONNX models directly, and the optimization is done by OpenVINO runtime. But this was already possible in OpenVINO 2021, and mo.py is still available in 2022 (with pip install openvino-dev you get an MO executable).

Model Optimizer now uses the ONNX Frontend, so you get the same graph optimizations when you load an ONNX model directly, or when you use MO to convert to IR and then load the model.

Actually, it is not expected that the output of ONNX models is different between 2021 and 2022.

It will be helpful if you could provide:

Models used or custom models (share if possible)
OpenVINO sample application used or it's a custom inferencing code (share if possible)
Methods that you use to evaluate

Cordially,

Iffa

timosy · ‎03-02-2023

>Actually, it is not expected that the output of ONNX models is different between 2021 and 2022.

Thanks for the kind comment, it is already helpful.

I'd like to ask another question relating to the above comment.

the outputs of both versions can be completely same?

or it is possibly slightly different due to, for instance, data type such as fp16 or fp32, or rounding error?

best regards,

Iffa_Intel · ‎03-02-2023

There are a few factors that influence an inferencing performance (since the result is closely related to performance) and as you mentioned, precision (FP32/FP16,etc) is indeed one of them.

There are 4 key elements to measure a successful deep learning inference:

Throughput
Latency
Value
Efficiency

You may refer to this documentation since it has a thorough explanation of those 4 elements.

If you were comparing FP32 with FP16 precision, it is expected to have some differences in results, especially FPS and accuracy where FP16 is expected to perform less compared to FP32 since the size itself is halves. However, the one with FP16 precision should infer faster than FP32.

The slight off in accuracy may be the reason you were experiencing different results.

Cordially,

Iffa

timosy · ‎03-03-2023

following your kind offer below,

Actually, it is not expected that the output of ONNX models is different between 2021 and 2022.

It will be helpful if you could provide:

Models used or custom models (share if possible)
OpenVINO sample application used or it's a custom inferencing code (share if possible)
Methods that you use to evaluate

Different responce (different output) between openVino21 and 22 seem to appear when I use QuantizeLinear/DequantizeLinear layers, (thought the reason is not clear, it does not apper when using FakeQuantized layer).

To reproduce the problem, I posted a simple INT8 Alex models, which are quantized via nncf (FakeQuantized layer) and onnx-runtime (QuantizeLinear/DequantizeLinear layers). The models are cut at the beginning so that we focus on difference of calculation, not its strucutre. You can see the structure with Netron.

I suspect there is some bug in calculation with QuantizeLinear/DequantizeLinear layers.

Thanks in advance.

Iffa_Intel · ‎03-05-2023

Thanks for sharing the model.

I need your help to provide information on Question 2 as it would help me reproduce your result:

2.OpenVINO sample application used or it's a custom inferencing code (share if possible)

For example, Hello Classification Python Sample.

Cordially,

Iffa

timosy · ‎03-05-2023

Thanks for the reply.

It's possible to provide sample scripts and figures. Please wait a bit.

I actually prefer to send my data via email, not post them directory.

Is it possible? If not, I'll post it here.

Iffa_Intel · ‎03-08-2023

I did a few tests on the models that you shared through email.

I'm using 2 environments, OpenVINO 2022.1 and 2021.4.

These are the results within 2022.1 environment:

1. Benchmark_app on original ONNX model

2. Benchmark_app on model labeled 2022.1

3. Benchmark_app on model labeled 2021.4

4. Test model labeled 2022.1 on the official OpenVINO sample app (hello_classification.py)

5. Test model labeled 2021.4 on the official OpenVINO sample app (hello_classification.py)

6. Test original ONNX model on the official OpenVINO sample app (hello_classification.py)

7. Model labeled 2022.1 on customer's code

8. Model labeled 2021.4 on customer's code

These are the results within 2021.4 environment:

1. Benchmark_app on original ONNX model

2. Benchmark_app on model labeled 2022.1

3. Benchmark_app on model labeled 2021.4

4. Test model labeled 2022.1 on the official OpenVINO sample app (hello_classification.py)

5. Test model labeled 2021.4 on the official OpenVINO sample app (hello_classification.py)

6. Test original ONNX model on the official OpenVINO sample app (hello_classification.py)

7. Model labeled 2022.1 on customer's code

8. Model labeled 2021.4 on customer's code

My finding is that both the 2022.1 and 2021.4 would produce the same result if they were run within the same OpenVINO environment. However, if we compare the result within 2 different OpenVINO environments, they are different.:

1. Both models labeled as 2022.1 and 2021.4 in OpenVINO 2022.1 environment:

2. Both models labeled as 2022.1 and 2021.4 in OpenVINO 2021.4 environment:

We will further investigate this for a definite answer on the differences.

Cordially,

Iffa

timosy · ‎03-08-2023

Dear Iffa

Thank you for your help.

Just confirmation, (To avoid miscommunication,)

I again listed models below, and an isssue I encountered.

model.test-int8-Alex.onnx-rt-int8.cut.onnx
=> Quant/DeQuant Linear layers are used

model.test-int8-Alex.ovino-nncrf-int8.onnx
(=> I can't upload above original model which are used to generate below IR models)

model.test-int8-Alex.ovino-nncrf-int8.ovino21.4_mo.cut.bin
model.test-int8-Alex.ovino-nncrf-int8.ovino21.4_mo.cut.mapping
model.test-int8-Alex.ovino-nncrf-int8.ovino21.4_mo.cut.xml
model.test-int8-Alex.ovino-nncrf-int8.ovino22.1_mo.cut.bin
model.test-int8-Alex.ovino-nncrf-int8.ovino22.1_mo.cut.mapping
model.test-int8-Alex.ovino-nncrf-int8.ovino22.1_mo.cut.xml
=> FakeQuant layer is used.

The meaning of label "ovino21.4_mo" and "ovino22.1_mo" means

which mo.py (or mo.exe) is used to convert to the IR model.

It means ...

I though I have to use "mo of ovino21.4" to convert the onnx to the IR model so that I run it in the openVino 21.4 framework,

similary, I have to use "mo of ovino22.1" to convert the onnx to the IR model so that I run it in the openVino 22.1 framework.

That's why I labeld "ovino21.4_mo" and "ovino22.1_mo".

The issue apperes when using "Quant/DeQuant Linear layers" in different openVino version.
The issue is that different output features are shown when using different openVino version.

The above issue does not apper when uisng "FakeQuant layer"

Best regards

Iffa_Intel · ‎03-06-2023

Yes, you can send them to my Intel email if you prefer it.

siti.nur.nazhifahx.jais.meah@intel.com

Cordially,

Iffa

Iffa_Intel · ‎03-22-2023

Hi,

our findings is that, the differences that you see are because the accuracy within the newer version is improved and this is expected.

This can be seen in the results that I attached before, the best performance is 4.73 FPS which uses the IR converted (which shared by you) and runs within 2022.1 environment. It is best to use a newer version since they have some upgrades that the previous version doesn't and also be consistent in processing (convert etc) of the model within one environment.

Cordially,

Iffa

timosy · ‎03-23-2023

Thanks for the kind reply.

It means

Under OpenVINO 2022.1 environment:
the model quantized w/ onnx-runtime (Quant/DeQuant Linear layers)
the model converted to IR via mo.oy (21.4 ver.) (FakeQuant layer)
the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer)

output features from each model above are same.

Under OpenVINO 2021.4 environment:

the model quantized w/ onnx-runtime (Quant/DeQuant Linear layers)
the model converted to IR via mo.oy (21.4 ver.) (FakeQuant layer)
the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer)

output features from each model above are same. (but performance is a bit worse compared to 22.1 enviroment)

Is this what you mean?

Iffa_Intel · ‎03-23-2023

Accuracy influences your inferencing results, for example older version can detect apples, while the newer version (with better accuracy and performance) detects apples with their colours.

Definitely, you would see differences in the result obtained whether in numerical or graphical representation.

Cordially,

Iffa

timosy · ‎03-23-2023

Thank you for the reply,

but, umm still I dont understand well...

A model having FakeQuant layer provides same output features/values in both vino version 21.4 and 22.1, and
A model having Quant/DeQuant Linear layer provides different output features/values in vino both version 21.4 and 22.1

This above is my problem.

If I follow your comment, a model having FakeQuant layer should provide different features/values in both version.

If the situation is below, I can undestand. but it's not.

Both models provide "same" output features/values in both vino version, or

Both models provide "different" output features/values in both vino version.

Maybe I should check something different parts/aspects...

I'd like to confirm your observation again, the below is what you observed?

Under OpenVINO 2022.1 environment:
the model quantized w/ onnx-runtime (Quant/DeQuant Linear layers)
the model converted to IR via mo.oy (21.4 ver.) (FakeQuant layer)
the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer)
output features from each model above are same.

Best regards

Iffa_Intel · ‎03-23-2023

Only models that were converted into IR (Intermediate Representation) format produce the same result within one OpenVINO version.

That result is tested using official OpenVINO sample app (hello_classification.py), hope you had carefully observed the results that I attached before.

Result 1:

Moded converted to IR via mo.oy (21.4 ver.) (FakeQuant layer) and the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer) inferred in OpenVINO 2022.1 produce the same result

Result 2:

Moded converted to IR via mo.oy (21.4 ver.) (FakeQuant layer) and the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer) inferred in OpenVINO 2021.4 produce the same result

If you compare results 1 and 2 (both models inferred in different OpenVINO runtime version) they have different results, which referred back to your issue, the differences is expected due to improvements that were done within the newer version of OpenVINO.

I hope this clarifies your questions

Cordially,

Iffa

Iffa_Intel · ‎03-29-2023

Hi,

Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question

Cordially,

Iffa