Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6404 Discussions

Quantization with NNCF and quantization range from FP32 to INT8

timosy
New Contributor I
587 Views

As for quantization of a trained model, I suppose that we have to know its dinamic range of FP32 of a trained model so that we decide proper range when the trained model is quantized to INT8 from FP32.

 

I guess... If the range of FP32 is extremly large, all feature (or feature map if it's 2d) that we can extract become a certain one value (or a flat image if it's 2d) .

 

I'm using NNCF framework for the quantization of the model. I'm curious... 

1). Is it possible to know what range of FP32 is quantized to INT8 when the quantization is applied to the model?

2). What is this range originated from? pixcel RGB values? or a combination of pixcel RGB values and Convolution filter (karnel)? or other ?   

3) Ususaly we use RGB images to train the model in classification task, a question is that RGB resulst in (or require) the wider range of FP32 compared to, for isntance, Gray scale images?

4) If I wanna make the original range of FP32 shorten (it leads/brings the wider range of INT8 when appling quantization), are there any nice way to do so? 

 

 

https://docs.openvino.ai/latest/pot_compression_algorithms_quantization_default_README.html?highlight=minmaxquantization

In avove page, I found parameters "range_estimator" in "weight" and "activation", it seems that I can change range of quantization? But, I do not know how to change it...  If I wanna expand a INT8 range (meaning a short FP32 range => a wider INT8 range.  The right situation in below pic, Not the left situation. ) How should I change the parameter from default? Is below parametrization enough? 

 

https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/examples/quantization/optimization/mobilenetV2_pytorch_int8_rangeopt.json

{
  "name": "MinMaxQuantization",
  "params": {
    "preset": "mixed",
    "stat_subset_size": 1000,
    "weights": {
    "bits": 8,
    "mode": "asymmetric",
    "granularity": "perchannel"
  },
  "activations": {
    "bits": 8,
    "mode": "asymmetric",
    "granularity": "pertensor"
  }
  }
},

 

 

 

 

0 Kudos
1 Solution
Peh_Intel
Moderator
547 Views

Hi timosy,

 

There are two main quantization methods:

·      Default Quantization

·      Accuracy-aware Quantization

 

Hence, algorithms name should be either “DefaultQuantization” or “AccuracyAwareQuantization”. For the “range_estimator”, you have to add in the “weight” and “activation”.

 

Please refer to these three examples:

·      accuracy_aware_quantization_spec.json

·      default_quantization_spec.json

·      cascaded_model_default_quantizatoin_spec.json

 

 

Regards,

Peh


View solution in original post

0 Kudos
2 Replies
Peh_Intel
Moderator
548 Views

Hi timosy,

 

There are two main quantization methods:

·      Default Quantization

·      Accuracy-aware Quantization

 

Hence, algorithms name should be either “DefaultQuantization” or “AccuracyAwareQuantization”. For the “range_estimator”, you have to add in the “weight” and “activation”.

 

Please refer to these three examples:

·      accuracy_aware_quantization_spec.json

·      default_quantization_spec.json

·      cascaded_model_default_quantizatoin_spec.json

 

 

Regards,

Peh


0 Kudos
Peh_Intel
Moderator
520 Views

Hi timosy,


This thread will no longer be monitored since we have provided answers. If you need any additional information from Intel, please submit a new question. 



Regards,

Peh


0 Kudos
Reply