Solved: Using Custom Quantization Parameters

karlo1 · ‎04-28-2023

I'm trying to use the model optimizer to convert my float32 ONNX model to OpenVINO IR. My end goal is to get an INT8 quantized model.

The problem is that I don't want to obtain the quantization parameters from the OpenVINO QAT or PTQ. I want to set my own quantization parameters which I obtained from another framework.

Is it possible to do this and how can I accomplish it?

IntelSupport · ‎05-14-2023

Hi Karlo1,

Thank you for your patience.

For your information, OpenVINO doesn’t have such features in either Model Optimizer or POT.

We would recommend two possible workarounds:

First:

Convert from ONNX to IR
Use POT or NNCF to quantize the model
Write a script that will read IR, integrate over the FakeQuantize nodes and change parameters

Second:

Use NNCF to quantize ONNX
Write a script that will read quantized ONNX and change parameters of Q/DQ ops
Convert the model to IR using MO or convert_model API

Regards,

Aznie

View solution in original post

IntelSupport · ‎04-30-2023

Hi Karlo1,

Thanks for reaching out.

OpenVINO offers two types of quantization which are Post-training Quantization with POT and Post-training Quantization with NNCF (new).The Post-training Quantization with NNC support PTQ API. Meanwhile, the POT supports the uniform integer quantization method. Any framework can be quantized by the POT as long as the models are in the OpenVINO Intermediate Representation (IR) format only. For that you need to convert your model to the IR format using Model Optimizer. You may refer to these examples to utilize the POT.

Regards,

Aznie

karlo1 · ‎05-02-2023

Hi Aznie!

Thank you for your answer but I don't think you answered my question.

I am aware of the types of quantization OpenVINO offers. If I am not mistaken it also offers QAT as well.

As I said, I don't want my quantization parameters to be obtained from OpenVINO PTQ or QAT.

I already have a set of quantization parameters (scales and zero points for weights and activations) that I want to use and I was wondering if it is possible to tell OpenVINO to use custom quantization parameters and quantize the model with them.

The ideal solution to my problem is a command line option that allows me to specify the quantization parameters in some predefined format. One example would be to use the AIMET specification.

I am hoping to use one of the following workflows:

Start with the ONNX model in fp32.
Use model optimizer to get OpenVINO IR in int8 using my quantization parameters

or

Start with the ONNX model in fp32.
Use model optimizer to get OpenVINO IR in fp32.
Use some other tool (perhaps PTO) to get an OpenVINO IR in int8 using my quantization parameters.

I am also open to other suggestions that involve using predefined quantization parameters.

Thank you for your help!

IntelSupport · ‎05-14-2023

Hi Karlo1,

Thank you for your patience.

For your information, OpenVINO doesn’t have such features in either Model Optimizer or POT.

We would recommend two possible workarounds:

First:

Convert from ONNX to IR
Use POT or NNCF to quantize the model
Write a script that will read IR, integrate over the FakeQuantize nodes and change parameters

Second:

Use NNCF to quantize ONNX
Write a script that will read quantized ONNX and change parameters of Q/DQ ops
Convert the model to IR using MO or convert_model API

Regards,

Aznie

Aznie_Intel · ‎05-23-2023

Hi Karlo1,

This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie