Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Quantization aware training

timosy
New Contributor I
1,803 Views

I'm checking Quantization aware training in openVino, and I found two tutorials :
1). Post-Training Quantization of PyTorch models with NNCF
2). Quantization Aware Training with NNCF, using PyTorch framework

 

As for the 2nd one, I though that training is done by sandwiching layers w/
"Quantize" layer and "DeQuantize" layer as Pytorch does.

But it seems that QWT mentioned 2) is actully fine tuning
(just tune the parameters after the quantization).

 

So, openVino does not have QAT w/ Quantize and DeQuantize later during the training ?

0 Kudos
1 Solution
Wan_Intel
Moderator
1,782 Views

Hi Hep77to,

Thanks for reaching out to us.

For your information, the goal of Quantization Aware Training with NNCF, using PyTorch framework is to demonstrate how to use the Neural Network Compression Framework (NNCF) 8-bit quantization to optimize a PyTorch model for inference with OpenVINO™ toolkit. The optimization process contains the following steps:

·      Transform the original FP32 model to INT8

·      Use fine-tuning to restore the accuracy

·      Export optimized and original models to ONNX and then to OpenVINO IR

·      Measure and compare the performance of models

 

On another note, Post-Training Quantization of PyTorch models with NNCF demonstrate how to use the NNCF 8-bit quantization in post-training mode, without the fine-tuning pipeline to optimize a PyTorch model for the high-speed inference via OpenVINO™ toolkit. The optimization process contains the following steps:

·      Evaluate the original model

·      Transform the original model to a quantized one

·      Export optimized and original models to ONNX and then to OpenVINO IR

·      Compare performance of the obtained FP32 and INT8 models

 

 

Regards,

Wan


View solution in original post

0 Kudos
3 Replies
Wan_Intel
Moderator
1,783 Views

Hi Hep77to,

Thanks for reaching out to us.

For your information, the goal of Quantization Aware Training with NNCF, using PyTorch framework is to demonstrate how to use the Neural Network Compression Framework (NNCF) 8-bit quantization to optimize a PyTorch model for inference with OpenVINO™ toolkit. The optimization process contains the following steps:

·      Transform the original FP32 model to INT8

·      Use fine-tuning to restore the accuracy

·      Export optimized and original models to ONNX and then to OpenVINO IR

·      Measure and compare the performance of models

 

On another note, Post-Training Quantization of PyTorch models with NNCF demonstrate how to use the NNCF 8-bit quantization in post-training mode, without the fine-tuning pipeline to optimize a PyTorch model for the high-speed inference via OpenVINO™ toolkit. The optimization process contains the following steps:

·      Evaluate the original model

·      Transform the original model to a quantized one

·      Export optimized and original models to ONNX and then to OpenVINO IR

·      Compare performance of the obtained FP32 and INT8 models

 

 

Regards,

Wan


0 Kudos
timosy
New Contributor I
1,766 Views

Thanks for the comments, and sorry for my reply.

 

I understand three kinds of quantization:

- Post-training dynamic range quantization

- Post-training integer (static) quantization

   = your 2nd exsample

- Quantization-aware training

    = fine-tuning, your 1st exsample

 

 

Are there any methods to do the 1st one in openVino?

meaning Post-training dynamic range quantization.

0 Kudos
Wan_Intel
Moderator
1,732 Views

Hi Timosy,

For your information, Post-training dynamic range quantization is only available at TensorFlow, and it is not available in the OpenVINO™ toolkit. The available options for optimization in the OpenVINO™ toolkit are available here. This thread will no longer be monitored since we have provided suggestions. If you need any additional information from Intel, please submit a new question.

 

 

Regards,

Wan

 

0 Kudos
Reply