- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm currently trying to run Llama3 from the Hugging Face Repo, using the OpenVINO backend for inference.
I've followed tutorials provided by OpenVINO and from Hugging Face pretty faithfully, here is the code:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
from optimum.intel.openvino import OVModelForCausalLM
model_id = "meta-llama/Meta-Llama-3-8B"
model = OVModelForCausalLM.from_pretrained(model_id, export=True, device="GPU")
tokenizer=AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
model_kwargs={"torch_dtype": torch.bfloat16},
)
k = pipe("Hey how are you doing today?")
print(k)
The output on CPU usually gives the same output that is pretty coherent:
'Hey how are you doing today? I am doing well. I am a little bit tired because I'
while using device=GPU gives complete nonsense, and it's always random nonsense as well, such as:
aaaaaaaa href="aaaaaaaa\n the right to the the (a)
I've tried tweaking a lot of different components, with little success.
I'm using Meteor Lake / Intel Arc Graphics, with PCI ID 7D55:
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:7d55] (rev 08)
Some versions of possibly relevant packages I'm using:
- OpenVINO 2024.1.0
- Optimum 1.19.2 (with Optimum-Intel 1.17.0.dev0+bfd0767)
- torch 2.3.0
Any pointers would be greatly appreciated. Thank you!
- Tags:
- Hugging Face
- Optimum
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've figured out the issue - turns out my kernel version (6.5, which was the default for 22.04) was outdated. I upgraded to 6.9.3 and now the outputs are more reasonable.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ayf7,
Thanks for reaching out to us.
We'll investigate the issue and update you as soon as possible. Meanwhile, could you please share which operating system are you using on your machine?
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wan,
Thanks for reaching out. I am using Ubuntu 22.04 LTS.
I also tried a stable diffusion model, and I also took this sample code somewhere:
import requests
import torch
from PIL import Image
from io import BytesIO
from optimum.intel.openvino import OVStableDiffusionImg2ImgPipeline
model_id = "runwayml/stable-diffusion-v1-5"
pipeline = OVStableDiffusionImg2ImgPipeline.from_pretrained(model_id, device="CPU", export=True)
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")
Compiling on CPU gave an expected result, and GPU outputted noise. So maybe that means there's some issue at a lower level?
- ayf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ayf7,
Thanks for the information.
I've set up the environment via the following installation guide:
- Installed transformer and torch - https://huggingface.co/docs/transformers/en/installation
- Installed optimum and openvino - https://github.com/huggingface/optimum
I've been granted to access model meta-llama/Meta-Llama-Guard-2-8B when I applied for meta-llama/Meta-Llama-3-8B. However, when I ran your code, I encountered the following error:
403 Forbidden: Authorization error
Cannot access content at: https://huggingface.co/api/models/meta-llama/Meta-Llama-Guard-2-8B/tree/main?recursive=True&expand=False.
If you are trying to create or update content,make sure you have a token with the `write` role.
Could you please share the model that you are using with us to further replicate the issue?
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm, that's strange - if you apply for the meta-llama/Meta-Llama-3-8B model, you should be given access to 3-8B model, not the guard one. I think if you fill out a submission on this site: https://huggingface.co/meta-llama/Meta-Llama-3-8B you should be given this access - is this what you did? They may have made a mistake if that's the case.
The code I supplied in the original post is the exact code I'm running.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ayf7,
Thanks for the information.
Let me check with relevant team, and we'll update you as soon as possible.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a follow-up, I tried the following example code provided by OpenVINO:
https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-pytorch.html
Using the code:
from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights
import requests, PIL, io, torch
# Get a picture of a cat from the web:
img = PIL.Image.open(io.BytesIO(requests.get("https://placekitten.com/200/300").content))
# Torchvision model and input data preparation from https://pytorch.org/vision/stable/models.html
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
preprocess = weights.transforms()
batch = preprocess(img).unsqueeze(0)
# PyTorch model inference and post-processing
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}% (with PyTorch)")
# OpenVINO model preparation and inference with the same post-processing
import openvino as ov
compiled_model = ov.compile_model(ov.convert_model(model, example_input=batch), device_name="GPU")
prediction = torch.tensor(compiled_model(batch)[0]).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}% (with OpenVINO)")
The only addition was in the compile_model call, where I also specified device_name. When I compile with CPU, both PyTorch and OpenVINO output the same value:
Egyptian cat: 22.9% (with PyTorch)
Egyptian cat: 22.9% (with OpenVINO)
If I specify to compile with GPU, the output is always arbitrary with a low percentage, for instance:
Egyptian cat: 22.9% (with PyTorch)
analog clock: 1.5% (with OpenVINO)
Some other outputs included "fire screen", "pitcher", "velvet", etc, all with arbitrary scores. When I print out the prediction value, it's clear that the compiled GPU model is not outputting accurate values.
This makes me think its less of a Hugging Face/Optimum issue, it's an issue with OpenVINO or something lower level.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ayf7,
Thanks for sharing your findings with us.
I've escalated your findings with the relevant team. We will further investigate the issue and we will update you as soon as possible.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've figured out the issue - turns out my kernel version (6.5, which was the default for 22.04) was outdated. I upgraded to 6.9.3 and now the outputs are more reasonable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ayf7,
Thanks for the information.
We're glad to know that the issue resolved after you upgrading your kernel version. Is there anything else that we can help you with?
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think for this specific post, I've addressed the issue - the GPU no longer outputs nonsense. However, I will say that from some standard benchmarks, the output of GPU is not exactly the same as CPU - for instance, matrix multiplication gives an output that's different by a factor of 1e-5 between CPU and GPU, and similarly with my NPU. I need to double check my drivers again, then I may make a separate post for this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ayf7,
Thanks for the information.
Yes, you may open a new thread for a new issue as the issue for this thread has been resolved. Thank you for sharing your solution in the OpenVINO™ Community.
Regards,
Wan
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page