Solved: Quantization with NNCF and quantization range from FP32 to INT8

timosy · ‎06-29-2022

As for quantization of a trained model, I suppose that we have to know its dinamic range of FP32 of a trained model so that we decide proper range when the trained model is quantized to INT8 from FP32.

I guess... If the range of FP32 is extremly large, all feature (or feature map if it's 2d) that we can extract become a certain one value (or a flat image if it's 2d) .

I'm using NNCF framework for the quantization of the model. I'm curious...

1). Is it possible to know what range of FP32 is quantized to INT8 when the quantization is applied to the model?

2). What is this range originated from? pixcel RGB values? or a combination of pixcel RGB values and Convolution filter (karnel)? or other ?

3) Ususaly we use RGB images to train the model in classification task, a question is that RGB resulst in (or require) the wider range of FP32 compared to, for isntance, Gray scale images?

4) If I wanna make the original range of FP32 shorten (it leads/brings the wider range of INT8 when appling quantization), are there any nice way to do so?

https://docs.openvino.ai/latest/pot_compression_algorithms_quantization_default_README.html?highlight=minmaxquantization

In avove page, I found parameters "range_estimator" in "weight" and "activation", it seems that I can change range of quantization? But, I do not know how to change it... If I wanna expand a INT8 range (meaning a short FP32 range => a wider INT8 range. The right situation in below pic, Not the left situation. ) How should I change the parameter from default? Is below parametrization enough?

https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/examples/quantization/optimization/mobilenetV2_pytorch_int8_rangeopt.json

{
"name": "MinMaxQuantization",
"params": {
"preset": "mixed",
"stat_subset_size": 1000,
"weights": {
"bits": 8,
"mode": "asymmetric",
"granularity": "perchannel"
},
"activations": {
"bits": 8,
"mode": "asymmetric",
"granularity": "pertensor"
}
}
},

Peh_Intel · ‎06-30-2022

Hi timosy,

There are two main quantization methods:

· Default Quantization

· Accuracy-aware Quantization

Hence, algorithms name should be either “DefaultQuantization” or “AccuracyAwareQuantization”. For the “range_estimator”, you have to add in the “weight” and “activation”.

Please refer to these three examples:

· accuracy_aware_quantization_spec.json

· default_quantization_spec.json

· cascaded_model_default_quantizatoin_spec.json

Regards,

Peh

View solution in original post

Peh_Intel · ‎06-30-2022

Hi timosy,

There are two main quantization methods:

· Default Quantization

· Accuracy-aware Quantization

Hence, algorithms name should be either “DefaultQuantization” or “AccuracyAwareQuantization”. For the “range_estimator”, you have to add in the “weight” and “activation”.

Please refer to these three examples:

· accuracy_aware_quantization_spec.json

· default_quantization_spec.json

· cascaded_model_default_quantizatoin_spec.json

Regards,

Peh

Peh_Intel · ‎07-21-2022

Hi timosy,

This thread will no longer be monitored since we have provided answers. If you need any additional information from Intel, please submit a new question.

Regards,

Peh

Quantization with NNCF and quantization range from FP32 to INT8

Inference Engine

Post training Optimizer Tool