Model Quantization with OpenVINO

Jade-Worrall · ‎10-22-2024

The second Intel® Liftoff Days hackathon continued to push the boundaries of AI innovation. Workshops provided a unique opportunity for participants to immerse themselves in the latest AI technologies and optimization techniques.

Each session was geared toward enhancing the technical toolkit of startups, offering practical guidance on improving model performance, efficiency, and scalability.

By bridging the gap between theory and application, these workshops aimed to foster innovation and support the development of robust AI solutions ready for deployment.

Here’s a breakdown of what happened

The fourth workshop was led by Desmond Grealy, who delved into the intricacies of model quantization using OpenVINO. This session offered a comprehensive look at how quantization can drastically improve AI model performance and reduce size, making it ideal for deployment on edge devices.

Desmond kicked off the workshop by introducing the concept of quantization, explaining its benefits in optimizing AI models for edge environments. He outlined how quantization reduces model size and improves inference speed by converting models to lower-precision formats such as FP16, INT8, or INT4. “Quantization can significantly reduce the model size and improve inference speed, making it suitable for deployment on constrained devices,” Desmond explained. He also demonstrated the use of the Optimum CLI and OpenVINO tools for the quantization process, providing attendees with a practical walkthrough via a Jupyter notebook.

Screenshot 2024-10-18 150940.png

The workshop included a hands-on demonstration, where Desmond set up the environment to download and quantize the Llama 2 model. This step-by-step guide highlighted the practical aspects of using Optimum and OpenVINO to convert models to INT8 and INT4 formats.

Screenshot 2024-10-18 151128.png

Desmond emphasized the importance of balancing the trade-offs between performance gains and potential accuracy loss, particularly when quantizing to lower formats like INT4.

During the Q&A session, participants posed thought-provoking questions and shared insights on best practices for model quantization. Matthew Irvin inquired about managing the balance between performance gains and accuracy loss, while Rahul Nair suggested training-aware quantization as a method to mitigate these challenges. The discussion underscored the importance of carefully evaluating application requirements and choosing the right quantization strategy to achieve optimal results.

The workshop provided valuable insights into the practical aspects of model quantization and its potential to enhance AI model performance. Desmond Grealy’s presentation offered a deep dive into the tools and techniques available with OpenVINO, empowering participants to implement quantization effectively.

The session highlighted the power of experimentation and collaboration, reminding participants of the importance of continuous learning and adaptation in the ever-evolving field of AI.