Newest release of OpenVINO™ 2024.5 is now available!

Luis_at_Intel · ‎12-02-2024

More Gen AI coverage and framework integrations to minimize code changes

New models supported: Llama* 3.2 (1B & 3B), Gemma* 2 (2B & 9B), and YOLO11*.
LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision*, Wav2Lip*, Whisper*, and LLaVA*
Preview: support for Flax*, a high-performance Python* neural network library based on JAX*. Its modular design allows for easy customization and accelerated inference on GPUs.

Broader Large Language Model (LLM) support and more model compression techniques.

Optimizations for built-in GPUs on Intel® Core™ Ultra Processors (Series 1) and Intel® Arc™ Graphics include KV Cache compression for memory reduction along with improved usability, and model load time optimizations to improve first token latency for LLMs..
Dynamic quantization was enabled to improve first token latency for LLMs on built-in Intel® GPUs without impacting accuracy on Intel® Core™ Ultra Processors (Series 1). Second token latency will also improve for large batch inference.
A new method to generate synthetic text data is implemented in the Neural Network Compression Framework (NNCF). This will allow LLMs to be compressed more accurately using data-aware methods without datasets. Coming soon: This feature will soon be accessible via Optimum Intel on Hugging Face.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support for Intel® Xeon® 6 Processors with P-cores (formerly codenamed Granite Rapids) and Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S).
Preview: GenAI API enables multimodal AI deployment with support for multimodal pipelines for improved contextual awareness, transcription pipelines for easy audio-to-text conversions, and image generation pipelines for streamlined text-to-visual conversions..
Speculative decoding feature added to the GenAI API for improved performance and efficient text generation using a small draft model that is periodically corrected by the full-size model.
Preview: LoRA adapters are now supported in the GenAI API for developers to quickly and efficiently customize image and text generation models for specialized tasks.
The GenAI API now also supports LLMs on NPU allowing developers to specify NPU as the target device, specifically for Whisper* Pipeline (for whisper-base, whisper-medium, and whisper-small) and LLM* Pipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.

Download the 2024.5 Release
Download Latest Release Now

Get all the details
See 2024.5 release notes

NNCF RELEASE

Check out the new NNCF release

Helpful Links

NOTE: Links open in a new window.