Solved: Quantization aware training

timosy · ‎06-24-2022

I'm checking Quantization aware training in openVino, and I found two tutorials :
1). Post-Training Quantization of PyTorch models with NNCF
2). Quantization Aware Training with NNCF, using PyTorch framework

As for the 2nd one, I though that training is done by sandwiching layers w/
"Quantize" layer and "DeQuantize" layer as Pytorch does.

But it seems that QWT mentioned 2) is actully fine tuning
(just tune the parameters after the quantization).

So, openVino does not have QAT w/ Quantize and DeQuantize later during the training ?

Wan_Intel · ‎06-27-2022

Hi Hep77to,

Thanks for reaching out to us.

For your information, the goal of Quantization Aware Training with NNCF, using PyTorch framework is to demonstrate how to use the Neural Network Compression Framework (NNCF) 8-bit quantization to optimize a PyTorch model for inference with OpenVINO™ toolkit. The optimization process contains the following steps:

· Transform the original FP32 model to INT8

· Use fine-tuning to restore the accuracy

· Export optimized and original models to ONNX and then to OpenVINO IR

· Measure and compare the performance of models

On another note, Post-Training Quantization of PyTorch models with NNCF demonstrate how to use the NNCF 8-bit quantization in post-training mode, without the fine-tuning pipeline to optimize a PyTorch model for the high-speed inference via OpenVINO™ toolkit. The optimization process contains the following steps:

· Evaluate the original model

· Transform the original model to a quantized one

· Export optimized and original models to ONNX and then to OpenVINO IR

· Compare performance of the obtained FP32 and INT8 models

Regards,

Wan

View solution in original post

Wan_Intel · ‎06-27-2022

Hi Hep77to,

Thanks for reaching out to us.

For your information, the goal of Quantization Aware Training with NNCF, using PyTorch framework is to demonstrate how to use the Neural Network Compression Framework (NNCF) 8-bit quantization to optimize a PyTorch model for inference with OpenVINO™ toolkit. The optimization process contains the following steps:

· Transform the original FP32 model to INT8

· Use fine-tuning to restore the accuracy

· Export optimized and original models to ONNX and then to OpenVINO IR

· Measure and compare the performance of models

On another note, Post-Training Quantization of PyTorch models with NNCF demonstrate how to use the NNCF 8-bit quantization in post-training mode, without the fine-tuning pipeline to optimize a PyTorch model for the high-speed inference via OpenVINO™ toolkit. The optimization process contains the following steps:

· Evaluate the original model

· Transform the original model to a quantized one

· Export optimized and original models to ONNX and then to OpenVINO IR

· Compare performance of the obtained FP32 and INT8 models

Regards,

Wan

timosy · ‎06-29-2022

Thanks for the comments, and sorry for my reply.

I understand three kinds of quantization:

- Post-training dynamic range quantization

- Post-training integer (static) quantization

= your 2nd exsample

- Quantization-aware training

= fine-tuning, your 1st exsample

Are there any methods to do the 1st one in openVino?

meaning Post-training dynamic range quantization.

Wan_Intel · ‎07-03-2022

Hi Timosy,

For your information, Post-training dynamic range quantization is only available at TensorFlow, and it is not available in the OpenVINO™ toolkit. The available options for optimization in the OpenVINO™ toolkit are available here. This thread will no longer be monitored since we have provided suggestions. If you need any additional information from Intel, please submit a new question.

Regards,

Wan

Quantization aware training

Code Samples

Model Optimizer

Post training Optimizer Tool