Generative or LLM models inferencing for higher batches

Shravanthi · ‎01-11-2024

Hi,

How to collect inferences for stable diffusion and llama2 models for higher batch sizes also how can we run these models on Intel GPU ?

Peh_Intel · ‎01-11-2024

Hi Shravanthi,

OpenVINO™ offers two main paths for Generative AI use cases:

Using OpenVINO as a backend for Hugging Face frameworks (transformers, diffusers) through the Optimum Intel extension.
Using OpenVINO native APIs (Python and C++) with custom pipeline code.

For more information, you can refer to the Optimize and Deploy Generative AI Models.

Besides, there is also few Jupyter notebook tutorials for OpenVINO™ on running Generative AI models.

Regards,

Peh

Peh_Intel · ‎01-18-2024

Hi Shravanthi,

This thread will no longer be monitored since we have provided suggestion and answer. If you need any additional information from Intel, please submit a new question.

Regards,

Peh