Model Support for Video LLama

Srii · ‎03-17-2025

Hi,

Currently there is a VideoQna sample on OPEA that runs on xeon. THere is customer interest for thsi use case on accelerator. Looking for model support on gaudi for this microservice : https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/integrations/video_llama.py

James_Edwards · ‎03-27-2025

My research on supporting HPU on https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/integrations/video_llama.py - found the following:

The task is to adapt the video-lama server to HPU the following needs to occur:

A Dockerfile like the llava Dockerfile.hpu docker file must be created for video-llama integration using the latest Intel Gaudi software pytorch container as the base container. This should be integrated into this directory: GenAIComps/tree/main/comps/lvms/src/integrations/dependency/video-llama
The server.py file for video-llama needs to be modified to use the hpu (look at GenAIComps/blob/main/comps/lvms/src/integrations/dependency/llava/llava_server.py for insight).
The current model specified in video_llama_eval_only_vl.yaml is: /home/user/model/Video-LLaMA-2-7B-Finetuned/llama-2-7b-chat-hf which is a fine tuned version of meta-llama/Llama-2-7b-chat-hf. This model is a supported model on Gaudi, so the finetuned model should be supported on Gaudi as well.

From what I can tell there is nothing blocking the implementation of the video-llama service on a Gaudi using the Intel Gaudi software. Do you have specific concern that needs to be addressed or are you just interested in the OPEA team doing the integration?