OpenVINO 2025.3 Available Now!

Hari_B_Intel · ‎09-16-2025

We are excited to announce the release of OpenVINO™ 2025.3! This update brings expanded model coverage, CPU and GPU optimizations, and Gen AI enhancements, designed to maximize the efficiency and performance of your AI deployments, whether at the edge, in the cloud, or locally.

What’s new in this release:

More Gen AI coverage and frameworks integrations to minimize code changes

New models supported: Phi-4-mini-reasoning, AFM-4.5B, Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B.
NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B.
LLMs optimized for NPU now available on OpenVINO Hugging Face collection.
Preview: Intel® Core™ Ultra Processor and Windows-based AI PCs can now leverage the OpenVINO™ Execution Provider for Windows* ML for high-performance, off-the-shelf starting experience on Windows*.

Broader LLM model support and more model compression optimization technique

The NPU plug-in adds support for longer contexts of up to 8K tokens, dynamic prompts, and dynamic LoRA for improved LLM performance.
The NPU plug-in now supports dynamic batch sizes by reshaping the model to a batch size of 1 and concurrently managing multiple inference requests, enhancing performance and optimizing memory utilization.
Accuracy improvements for GenAI models on both built-in and discrete graphics achieved through the implementation of the key cache compression per channel technique, in addition to the existing KV cache per-token compression method.
OpenVINO™ GenAI introduces TextRerankPipeline for improved retrieval relevance and RAG pipeline accuracy, plus Structured Output for enhanced response reliability and function calling while ensuring adherence to predefined formats.

More portability and performance to run AI at the edge, in the cloud or locally

Announcing support for Intel® Arc™ Pro B-Series (B50 and B60).
Preview: Hugging Face models that are GGUF-enabled for OpenVINO GenAI are now supported by the OpenVINO™ Model Server for popular LLM model architectures such as DeepSeek Distill, Qwen2, Qwen2.5, and Llama 3. This functionality reduces memory footprint and simplifies integration for GenAI workloads.
With improved reliability and tool call accuracy, the OpenVINO™ Model Server boosts support for agentic AI use cases on AI PCs, while enhancing performance on Intel CPUs, built-in GPUs, and NPUs.
int4 data-aware weights compression, now supported in the Neural Network Compression Framework (NNCF) for ONNX models, reduces memory footprint while maintaining accuracy and enables efficient deployment in resource-constrained environments.

Download the 2025.3 Release
Download Latest Release Now

Get all the details
See 2025.3 release notes

NNCF RELEASE

Check out the new NNCF release

Helpful Links

NOTE: Links open in a new window.