NEW RELEASE: OpenVINO 2024.0 is available now!

Luis_at_Intel · ‎03-07-2024

We're excited to unveil the latest release of the OpenVINO toolkit, 2024.0. This update brings enhancements in LLM performance, empowering your generative AI tasks with OpenVINO.

What’s new in this release:

More Gen AI coverage and framework integrations to minimize code changes.

Improved out-of-the-box experience for TensorFlow sentence encoding models through the installation of OpenVINO Tokenizers.
New and noteworthy models validated: Mistral, StableLM-tuned-alpha-3b, and StableLM-Epoch-3B
OpenVINO now supports Mixture of Experts (MoE), a new architecture that helps process more efficient generative models through the pipeline.
JavaScript developers now have seamless access to OpenVINO API. This new binding enables a smooth integration with JavaScript API.

Broader LLM model support and more model compression techniques.

Improved quality of INT4 weight compression for LLMs by adding the popular technique, Activation-aware Weight Quantization, to the Neural Network Compression Framework (NNCF). This addition reduces memory requirements and helps speed up token generation.
Experience enhanced LLM performance on Intel® CPUs, with internal memory state enhancement, and INT8 precision for KV-cache. Specifically tailored for multi-query LLMs like ChatGLM.
We're making it easier for developers by integrating more OpenVINO features with the Hugging Face ecosystem. Now, store quantization configurations for popular models directly in Hugging Face to compress models into INT4 format while preserving accuracy and performance.

More portability and performance to run AI at the edge, in the cloud, or locally.

Improved performance on ARM by enabling the ARM threading library. In addition, we now support multi-core ARM platforms and enable FP16 precision by default on MacOS.
A preview plugin architecture of the integrated NPU as part of Intel® Core™ Ultra (codename Meteor Lake) is now included in the main OpenVINO package on PyPI.

New and improved LLM serving samples from OpenVINO Model Server for multi-batch inputs and Retrieval Augmented Generation (RAG).

Download the 2024.0 Release
Download Latest Release Now

Get all the details
See 2024.0 release notes

Helpful Links

NOTE: Links open in a new window.