Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6457 Discussions

Compiled models on GPU, NPU output different results

ayf7
Novice
528 Views

Issue

When I compile the exact same model on different devices, I am receiving outputs that what I believe are non-trivial differences between CPU, GPU, NPU. Executed operations in OpenVINO are outputting values with difference up to 1e-4.

 

Reproducible Steps and Results

Here is some sample code I've run:

import torch
import torch.nn as nn
import torch.nn.functional as F
import openvino as ov

import numpy as np

class Function(nn.Module):
    def __init__(self):
        super(Function, self).__init__()

    def forward(self, x):
        y = F.softmax(x, dim=1) # replace with any function
        return y

if __name__ == "__main__":
    torch.set_printoptions(precision=8)

    model = Function()
    input_tensor = torch.randn(1, 5)

    # Native inference in PyTorch
    print("Input  \t", input_tensor, "\n")
    output = model(input_tensor).numpy()
    print("Native Output -\t", output, "\n")

    # Using OpenVINO conversion
    converted_model = ov.convert_model(model,
                                       input=("x", ov.Shape([1, 5])),
                                       example_input=input_tensor
                                       )

    compiled_model_cpu = ov.compile_model(converted_model, device_name="CPU")
    compiled_model_gpu = ov.compile_model(converted_model, device_name="GPU")
    compiled_model_npu = ov.compile_model(converted_model, device_name="NPU")

    output_cpu = compiled_model_cpu(input_tensor)[0]
    output_gpu = compiled_model_gpu(input_tensor)[0]
    output_npu = compiled_model_npu(input_tensor)[0]
    
    # Inference
    print("CPU Output -\t", output_cpu)
    print("GPU Output -\t", output_gpu)
    print("NPU Output -\t", output_npu, "\n")

    print("Error CPU vs. Native - \t", np.mean(np.abs(output - output_cpu)))
    print("Error GPU vs. Native - \t", np.mean(np.abs(output - output_gpu)))
    print("Error NPU vs. Native - \t", np.mean(np.abs(output - output_npu)))

In this file, I've defined a very simple PyTorch model that simply executes a single operation, softmax. I run inference natively on PyTorch, followed by compilation using OpenVINO for CPU, GPU, and NPU as well. For each OpenVINO model, I calculate the error relative to the native output by taking the absolute difference for each dimension, then taking the average of all differences.

Here is an example output:

Input    tensor([[-0.08171148,  1.05321479,  2.15006995, -0.07086628, -2.35258102]]) 

Native Output -  [[0.06876861 0.2139353  0.6406792  0.06951848 0.00709846]] 

CPU Output -     [[0.0687686  0.2139353  0.6406791  0.06951846 0.00709846]]
GPU Output -     [[0.06872559 0.21386719 0.640625   0.06958008 0.00709915]]
NPU Output -     [[0.06872559 0.21374512 0.640625   0.06951904 0.00708771]] 

Error CPU vs. Native -   1.6577541e-08
Error GPU vs. Native -   4.552165e-05
Error NPU vs. Native -   5.9740152e-05

I believe 1e-8 is epsilon difference, but 1e-5 is a pretty significant. 

I've also replaced the provided PyTorch function as ReLU, sigmoid, and GeLU. I've also tested different dtypes f32, f16, i8, u8 as well. I've cross tested the different combinations and this is what I found: 

image (5).png

ReLU, Sigmoid, GeLU were tested on 10000 dimensional vectors, while softmax was tested on a 5 dimensional vector (or something similar).

 

Hardware

Intel(R) Core(TM) Ultra 9 185H / Meteor Lake

 

OS

Ubuntu 22.04 LTS

Kernel version 6.9.3

 

Drivers

GPU Driver: Intel Compute Runtime 24.17.29377.6

NPU Driver: Linux NPU Driver v1.2.0

Level Zero: 1.16.15

Using OpenVINO 2024.1 as well.

Labels (1)
0 Kudos
1 Solution
Wan_Intel
Moderator
315 Views

Dear ayf7,

Thanks for your patience.

 

We've received feedback from our developer.

 

Our developer has responded that Intel® NPU doesn't have FP32 support on the NPU device. Therefore, all the FP32 models are natively executed as FP16, and the accuracy difference is expected behavior for this case.

 

On the other hand, since OpenVINO™ relies on the OpenCL kernels for the GPU implementation, it prefers FP16 inference precision over FP32.

 

For more information, please refer to the following links:

 

Sorry for the inconvenience and thank you for your support.

 

 

Regards,

Wan

 

 

 

View solution in original post

0 Kudos
3 Replies
Wan_Intel
Moderator
472 Views

Hi ayf7,

Thanks for reaching out to us.

 

We will further investigate the issue and update you as soon as possible.

 

 

Regards,

Wan

 

0 Kudos
Wan_Intel
Moderator
316 Views

Dear ayf7,

Thanks for your patience.

 

We've received feedback from our developer.

 

Our developer has responded that Intel® NPU doesn't have FP32 support on the NPU device. Therefore, all the FP32 models are natively executed as FP16, and the accuracy difference is expected behavior for this case.

 

On the other hand, since OpenVINO™ relies on the OpenCL kernels for the GPU implementation, it prefers FP16 inference precision over FP32.

 

For more information, please refer to the following links:

 

Sorry for the inconvenience and thank you for your support.

 

 

Regards,

Wan

 

 

 

0 Kudos
Wan_Intel
Moderator
196 Views

Hi ayf7,

Thanks for your question.

If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.

 

 

Regards,

Wan

 

0 Kudos
Reply