- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Currently there is a VideoQna sample on OPEA that runs on xeon. THere is customer interest for thsi use case on accelerator. Looking for model support on gaudi for this microservice : https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/integrations/video_llama.py
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My research on supporting HPU on https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/integrations/video_llama.py - found the following:
The task is to adapt the video-lama server to HPU the following needs to occur:
- A Dockerfile like the llava Dockerfile.hpu docker file must be created for video-llama integration using the latest Intel Gaudi software pytorch container as the base container. This should be integrated into this directory: GenAIComps/tree/main/comps/lvms/src/integrations/dependency/video-llama
- The server.py file for video-llama needs to be modified to use the hpu (look at GenAIComps/blob/main/comps/lvms/src/integrations/dependency/llava/llava_server.py for insight).
- The current model specified in video_llama_eval_only_vl.yaml is: /home/user/model/Video-LLaMA-2-7B-Finetuned/llama-2-7b-chat-hf which is a fine tuned version of meta-llama/Llama-2-7b-chat-hf. This model is a supported model on Gaudi, so the finetuned model should be supported on Gaudi as well.
From what I can tell there is nothing blocking the implementation of the video-llama service on a Gaudi using the Intel Gaudi software. Do you have specific concern that needs to be addressed or are you just interested in the OPEA team doing the integration?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page