We are excited to announce the release of OpenVINO™ 2025.0! This update brings expanded model coverage, new integrations, and GenAI API enhancements, designed to maximize the efficiency and performance of your AI deployments, whether at the edge, in the cloud, or locally.
What’s new in this release:
More Gen AI coverage and framework integrations to minimize code changes.
- New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell and FLUX.1 Dev
- Whisper Model : Improved performance on CPUs, built-in GPUs, and discrete GPUs with GenAI API.
- Preview: Introducing NPU support for torch.compile, giving developers the ability to use the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the TorchVision, Timm, and TorchBench repositories.
Broader LLM model support and more model compression techniques.
- Preview: Addition of Prompt Lookup to GenAI API improves 2nd token latency for LLMs by effectively utilizing predefined prompts that match the intended use case.
- Preview: The GenAI API now offers image-to-image inpainting functionality. This feature enables models to generate realistic content by inpainting specified modifications and seamlessly integrating them with the original image.
- Asymmetric KV Cache compression is now enabled by default for INT8 on CPUs, resulting in lower memory consumption and improved 2nd token latency, especially when dealing with long prompts that require significant memory.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed Arrow Lake-H)
- Integration of the OpenVINO ™ backend with the Triton Inference Server allows developers to utilize the Triton server for enhanced model serving performance when deploying on Intel CPUs.
- Preview: A new OpenVINO ™ backend integration allows developers to leverage OpenVINO performance optimizations directly within Keras 3 workflows for faster AI inference on CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is available with the latest Keras 3.8 release.
- The OpenVINO Model Server now supports native Windows Server deployments, allowing developers to leverage better performance by eliminating container overhead and simplifying GPU deployment.
Download the 2025.0 Release
Download Latest Release Now
Get all the details
See 2025.0 release notes
NNCF RELEASE
Check out the new NNCF release
Helpful Links
NOTE: Links open in a new window.
链接已复制
0 回复数
