- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Issue
When I compile the exact same model on different devices, I am receiving outputs that what I believe are non-trivial differences between CPU, GPU, NPU. Executed operations in OpenVINO are outputting values with difference up to 1e-4.
Reproducible Steps and Results
Here is some sample code I've run:
import torch
import torch.nn as nn
import torch.nn.functional as F
import openvino as ov
import numpy as np
class Function(nn.Module):
def __init__(self):
super(Function, self).__init__()
def forward(self, x):
y = F.softmax(x, dim=1) # replace with any function
return y
if __name__ == "__main__":
torch.set_printoptions(precision=8)
model = Function()
input_tensor = torch.randn(1, 5)
# Native inference in PyTorch
print("Input \t", input_tensor, "\n")
output = model(input_tensor).numpy()
print("Native Output -\t", output, "\n")
# Using OpenVINO conversion
converted_model = ov.convert_model(model,
input=("x", ov.Shape([1, 5])),
example_input=input_tensor
)
compiled_model_cpu = ov.compile_model(converted_model, device_name="CPU")
compiled_model_gpu = ov.compile_model(converted_model, device_name="GPU")
compiled_model_npu = ov.compile_model(converted_model, device_name="NPU")
output_cpu = compiled_model_cpu(input_tensor)[0]
output_gpu = compiled_model_gpu(input_tensor)[0]
output_npu = compiled_model_npu(input_tensor)[0]
# Inference
print("CPU Output -\t", output_cpu)
print("GPU Output -\t", output_gpu)
print("NPU Output -\t", output_npu, "\n")
print("Error CPU vs. Native - \t", np.mean(np.abs(output - output_cpu)))
print("Error GPU vs. Native - \t", np.mean(np.abs(output - output_gpu)))
print("Error NPU vs. Native - \t", np.mean(np.abs(output - output_npu)))
In this file, I've defined a very simple PyTorch model that simply executes a single operation, softmax. I run inference natively on PyTorch, followed by compilation using OpenVINO for CPU, GPU, and NPU as well. For each OpenVINO model, I calculate the error relative to the native output by taking the absolute difference for each dimension, then taking the average of all differences.
Here is an example output:
Input tensor([[-0.08171148, 1.05321479, 2.15006995, -0.07086628, -2.35258102]])
Native Output - [[0.06876861 0.2139353 0.6406792 0.06951848 0.00709846]]
CPU Output - [[0.0687686 0.2139353 0.6406791 0.06951846 0.00709846]]
GPU Output - [[0.06872559 0.21386719 0.640625 0.06958008 0.00709915]]
NPU Output - [[0.06872559 0.21374512 0.640625 0.06951904 0.00708771]]
Error CPU vs. Native - 1.6577541e-08
Error GPU vs. Native - 4.552165e-05
Error NPU vs. Native - 5.9740152e-05
I believe 1e-8 is epsilon difference, but 1e-5 is a pretty significant.
I've also replaced the provided PyTorch function as ReLU, sigmoid, and GeLU. I've also tested different dtypes f32, f16, i8, u8 as well. I've cross tested the different combinations and this is what I found:
ReLU, Sigmoid, GeLU were tested on 10000 dimensional vectors, while softmax was tested on a 5 dimensional vector (or something similar).
Hardware
Intel(R) Core(TM) Ultra 9 185H / Meteor Lake
OS
Ubuntu 22.04 LTS
Kernel version 6.9.3
Drivers
GPU Driver: Intel Compute Runtime 24.17.29377.6
NPU Driver: Linux NPU Driver v1.2.0
Level Zero: 1.16.15
Using OpenVINO 2024.1 as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear ayf7,
Thanks for your patience.
We've received feedback from our developer.
Our developer has responded that Intel® NPU doesn't have FP32 support on the NPU device. Therefore, all the FP32 models are natively executed as FP16, and the accuracy difference is expected behavior for this case.
On the other hand, since OpenVINO™ relies on the OpenCL kernels for the GPU implementation, it prefers FP16 inference precision over FP32.
For more information, please refer to the following links:
- https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#npu-device
- https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html
Sorry for the inconvenience and thank you for your support.
Regards,
Wan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ayf7,
Thanks for reaching out to us.
We will further investigate the issue and update you as soon as possible.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear ayf7,
Thanks for your patience.
We've received feedback from our developer.
Our developer has responded that Intel® NPU doesn't have FP32 support on the NPU device. Therefore, all the FP32 models are natively executed as FP16, and the accuracy difference is expected behavior for this case.
On the other hand, since OpenVINO™ relies on the OpenCL kernels for the GPU implementation, it prefers FP16 inference precision over FP32.
For more information, please refer to the following links:
- https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#npu-device
- https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html
Sorry for the inconvenience and thank you for your support.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ayf7,
Thanks for your question.
If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.
Regards,
Wan

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page