OpenVINO 2025.1 Available Now!

Luis_at_Intel · ‎04-14-2025

We are excited to announce the release of OpenVINO™ 2025.1! This update brings expanded model coverage, GPU optimizations, and OpenVINO Gen AI enhancements, designed to maximize the efficiency and performance of your AI deployments, whether at the edge, in the cloud, or locally.

What’s new in this release:

More Gen AI coverage and framework integrations to minimize code changes.

New models supported: Phi-4 Mini, Jina CLIP v1, and Bce Embedding Base v1
OpenVINO™ Model Server now supports VLM models, including Qwen2-VL, Phi-3.5-Vision, and InternVL2.
OpenVINO GenAI now includes image-to-image and inpainting features for transformer-based pipelines, such as Flux.1 and Stable Diffusion 3 models, enhancing their ability to generate more realistic content.
Preview: AI Playground now utilizes the OpenVINO Gen AI backend to enable highly optimized inferencing performance on AI PCs.

Broader LLM model support and more model compression techniques.

Reduced binary size through optimization of the CPU plugin and removal of the GEMM kernel.
Optimization of new kernels for the GPU plugin significantly boosts the performance of Long Short-Term Memory (LSTM) models, used in many applications, including speech recognition, language modeling, and time series forecasting.
Preview: Token Eviction implemented in OpenVINO GenAI to reduce the memory consumption of KV Cache by eliminating unimportant tokens. This current Token Eviction implementation is beneficial for tasks where a long sequence is generated, such as chatbots and code generation.
NPU acceleration for text generation is now enabled in OpenVINO™ Runtime and OpenVINO™ Model Server to support the power-efficient deployment of VLM models on NPUs for AI PC use cases with low concurrency.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support for the latest Intel® Core™ processors (Series 2) (formerly codenamed Bartlett Lake), Intel® Core™ 3 Processor N-series and Intel® Processor N-series (formerly codenamed Twin Lake) on Windows.
Additional LLM performance optimizations on Intel® Core™ Ultra 200H series processors for improved 2nd token latency on Windows and Linux.
Enhanced LLM performance and efficient resource utilization with the implementation of Paged Attention and Continuous Batching by default in the GPU plugin.
Preview: The new OpenVINO backend for Executorch will enable accelerated inference and improved performance on Intel hardware, including CPUs, GPUs, and NPUs

Download the 2025.1 Release
Download Latest Release Now

Get all the details
See 2025.1 release notes

NNCF RELEASE

Check out the new NNCF release

Helpful Links

NOTE: Links open in a new window.