Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6577 Discussões

Llama3 performance (HuggingFace + Optimum) on CPU and GPU are completely different

ayf7
Novato
6.777 Visualizações

Hello,


I'm currently trying to run Llama3 from the Hugging Face Repo, using the OpenVINO backend for inference.
I've followed tutorials provided by OpenVINO and from Hugging Face pretty faithfully, here is the code:

 

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
from optimum.intel.openvino import OVModelForCausalLM

model_id = "meta-llama/Meta-Llama-3-8B"

model = OVModelForCausalLM.from_pretrained(model_id, export=True, device="GPU")
tokenizer=AutoTokenizer.from_pretrained(model_id)

pipe = pipeline(
  task="text-generation",
  model=model,
  tokenizer=tokenizer,
  model_kwargs={"torch_dtype": torch.bfloat16},
)

k = pipe("Hey how are you doing today?")
print(k)

 

The output on CPU usually gives the same output that is pretty coherent: 

 

'Hey how are you doing today? I am doing well. I am a little bit tired because I'

 

 while using device=GPU gives complete nonsense, and it's always random nonsense as well, such as:

 

aaaaaaaa href="aaaaaaaa\n the right to the the (a)

 

 I've tried tweaking a lot of different components, with little success.

 

I'm using Meteor Lake / Intel Arc Graphics, with PCI ID 7D55:

 

0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:7d55] (rev 08)

 

Some versions of possibly relevant packages I'm using:

- OpenVINO 2024.1.0

- Optimum 1.19.2 (with Optimum-Intel 1.17.0.dev0+bfd0767)

- torch 2.3.0

 

Any pointers would be greatly appreciated. Thank you!

Etiquetas (2)
0 Kudos
1 Solução
ayf7
Novato
6.383 Visualizações

I've figured out the issue - turns out my kernel version (6.5, which was the default for 22.04) was outdated. I upgraded to 6.9.3 and now the outputs are more reasonable.

Ver solução na publicação original

11 Respostas
Wan_Intel
Moderador
6.735 Visualizações

Hi ayf7,

Thanks for reaching out to us.

 

We'll investigate the issue and update you as soon as possible. Meanwhile, could you please share which operating system are you using on your machine?

 

 

Regards,

Wan

 

ayf7
Novato
6.690 Visualizações

Hi Wan,

 

Thanks for reaching out. I am using Ubuntu 22.04 LTS.

 

I also tried a stable diffusion model, and I also took this sample code somewhere:

import requests
import torch
from PIL import Image
from io import BytesIO
from optimum.intel.openvino import OVStableDiffusionImg2ImgPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipeline = OVStableDiffusionImg2ImgPipeline.from_pretrained(model_id, device="CPU", export=True)

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")

Compiling on CPU gave an expected result, and GPU outputted noise. So maybe that means there's some issue at a lower level?

 

- ayf

Wan_Intel
Moderador
6.628 Visualizações

Hi Ayf7,

Thanks for the information.

 

I've set up the environment via the following installation guide:

 

I've been granted to access model meta-llama/Meta-Llama-Guard-2-8B when I applied for meta-llama/Meta-Llama-3-8B. However, when I ran your code, I encountered the following error:

403 Forbidden: Authorization error

Cannot access content at: https://huggingface.co/api/models/meta-llama/Meta-Llama-Guard-2-8B/tree/main?recursive=True&expand=False.

If you are trying to create or update content,make sure you have a token with the `write` role.

 

Could you please share the model that you are using with us to further replicate the issue?

 

 

Regards,

Wan

 

ayf7
Novato
6.619 Visualizações

Hmm, that's strange - if you apply for the meta-llama/Meta-Llama-3-8B model, you should be given access to 3-8B model, not the guard one. I think if you fill out a submission on this site: https://huggingface.co/meta-llama/Meta-Llama-3-8B you should be given this access - is this what you did? They may have made a mistake if that's the case.

The code I supplied in the original post is the exact code I'm running.

Wan_Intel
Moderador
6.611 Visualizações

Hi ayf7,

Thanks for the information.

 

Let me check with relevant team, and we'll update you as soon as possible.

 

 

Regards,

Wan

 

ayf7
Novato
6.513 Visualizações

As a follow-up, I tried the following example code provided by OpenVINO: 

https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-pytorch.html

 

Using the code:

from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights
import requests, PIL, io, torch

# Get a picture of a cat from the web:
img = PIL.Image.open(io.BytesIO(requests.get("https://placekitten.com/200/300").content))

# Torchvision model and input data preparation from https://pytorch.org/vision/stable/models.html
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
preprocess = weights.transforms()
batch = preprocess(img).unsqueeze(0)

# PyTorch model inference and post-processing
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}% (with PyTorch)")

# OpenVINO model preparation and inference with the same post-processing
import openvino as ov
compiled_model = ov.compile_model(ov.convert_model(model, example_input=batch), device_name="GPU")

prediction = torch.tensor(compiled_model(batch)[0]).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}% (with OpenVINO)")

The only addition was in the compile_model call, where I also specified device_name. When I compile with CPU, both PyTorch and OpenVINO output the same value:

Egyptian cat: 22.9% (with PyTorch)
Egyptian cat: 22.9% (with OpenVINO)

If I specify to compile with GPU, the output is always arbitrary with a low percentage, for instance:

Egyptian cat: 22.9% (with PyTorch)
analog clock: 1.5% (with OpenVINO)

Some other outputs included "fire screen", "pitcher", "velvet", etc, all with arbitrary scores. When I print out the prediction value, it's clear that the compiled GPU model is not outputting accurate values.

This makes me think its less of a Hugging Face/Optimum issue, it's an issue with OpenVINO or something lower level.

Wan_Intel
Moderador
6.485 Visualizações

Hi ayf7,

Thanks for sharing your findings with us.

 

I've escalated your findings with the relevant team. We will further investigate the issue and we will update you as soon as possible.

 

 

Regards,

Wan

 

ayf7
Novato
6.384 Visualizações

I've figured out the issue - turns out my kernel version (6.5, which was the default for 22.04) was outdated. I upgraded to 6.9.3 and now the outputs are more reasonable.

Wan_Intel
Moderador
6.205 Visualizações

Hi Ayf7,

Thanks for the information.

 

We're glad to know that the issue resolved after you upgrading your kernel version. Is there anything else that we can help you with?

 

 

Regards,

Wan

 

ayf7
Novato
6.129 Visualizações

I think for this specific post, I've addressed the issue - the GPU no longer outputs nonsense. However, I will say that from some standard benchmarks, the output of GPU is not exactly the same as CPU - for instance, matrix multiplication gives an output that's different by a factor of 1e-5 between CPU and GPU, and similarly with my NPU. I need to double check my drivers again, then I may make a separate post for this.

Wan_Intel
Moderador
6.071 Visualizações

Hi ayf7,

Thanks for the information.

 

Yes, you may open a new thread for a new issue as the issue for this thread has been resolved. Thank you for sharing your solution in the OpenVINO™ Community.

 

 

Regards,

Wan

 

Responder