NEW RELEASE: OpenVINO 2024.2 is here!

Luis_at_Intel · ‎06-18-2024

We're excited to announce the latest release of the OpenVINO toolkit, 2024.2. This update brings continued improvements in LLM performance, empowering your generative AI workloads with OpenVINO.

What’s new in this release:

More Gen AI coverage and framework integrations to minimize code changes.

Llama 3 optimizations for CPUs, built-in GPUs, and discrete GPUs for improved performance and efficient memory usage.
Support for Phi-3-mini, a family of AI models that leverages the power of small language models for faster, more accurate and cost-effective text processing.
Python Custom Operation is now enabled in OpenVINO making it easier for Python developers to code their custom operations instead of using C++ custom operations (also supported). Python Custom Operation empowers users to implement their own specialized operations into any model.
Notebooks expansion to ensure better coverage for new models. Noteworthy notebooks added: DynamiCrafter, YOLOv10, Chatbot notebook with Phi-3, and QWEN2.

Broader Large Language Model (LLM) support and more model compression techniques.

GPTQ method for 4-bit weight compression added to NNCF for more efficient inference and improved performance of compressed LLMs.
Significant LLM performance improvements and reduced latency for both built-in GPUs and discrete GPUs.
Significant improvement in 2nd token latency and memory footprint of FP16 weight LLMs on AVX2 (13th Gen Intel® Core™ processors) and AVX512 (3rd Gen Intel® Xeon® Scalable Processors) based CPU platforms, particularly for small batch sizes.

More portability and performance to run AI at the edge, in the cloud, or locally.

Model Serving Enhancements:
- Preview: OpenVINO Model Server (OVMS) now supports OpenAI-compatible API along with Continuous Batching and PagedAttention, enabling significantly higher throughput for parallel inferencing, especially on Intel® Xeon® processors, when serving LLMs to many concurrent users.
- OpenVINO backend for Triton Server now supports built-in GPUs and discrete GPUs, in addition to dynamic shapes support.
- Integration of TorchServe through torch.compile OpenVINO backend for easy model deployment, provisioning to multiple instances, model versioning, and maintenance.
Preview: addition of the Generate API, a simplified API for text generation using large language models with only a few lines of code. The API is available through the newly launched OpenVINO GenAI package.
Support for Intel Atom® Processor X Series. For more details, see System Requirements.
Preview: Support for Intel® Xeon® 6 processor (Formerly codenamed Sierra Forest and Granite Rapids Granite Rapids).

Download the 2024.2 Release
Download Latest Release Now

Get all the details
See 2024.2 release notes

NNCF RELEASE

Check out the new NNCF release

Helpful Links

NOTE: Links open in a new window.