OpenCL vectorisation issue in OneAPI driver

OCLdev · ‎09-17-2021

The OneAPI CPU OpenCL drivers (2021.12.6.0.19_160000) seem to have an issue with vectorization resulting in corrupted data. This simple kernel:

__kernel void test(__global float *f, __global float *r) {
    int i = get_global_id(0);

    r[i] = 0.0;
    if (f[i] == 1.0F) {
        r[i] = 1.0F+pow(1.0F, 1.0F);
    }
}

when run over a buffer of length 16 with f equal to:

1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1

results in the following values in r:

2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3126, 0, 0, 0, 0, 3126

Note the odd 3126 values. Turning off vectorisation using

CL_CONFIG_CPU_VECTORIZER_MODE=1

Results in the correct values:

2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2

The issue seems to be caused by adding a value to the function (i.e. here 1.0+pow()), for example this gives an expected result:

__kernel void test(__global float *f, __global float *r) {
    int i = get_global_id(0);

    r[i] = 0.0;
    if (f[i] == 1.0F) {
        r[i] = pow(1.0F, 1.0F);
    }
}

This is on a Ubuntu 18.04 VM with an Intel(R) Core(TM) i7-6800K.

Is there anywhere I can raise a bug report/ticket? Or is this the correct forum to report?

VarshaS_Intel · ‎09-20-2021

Hi,

Thanks for reaching out to us.

Could you please provide the details of the compiler you are using and complete reproducer code with the steps to reproduce the issue?

>>>Is there anywhere I can raise a bug report/ticket? Or is this the correct forum to report?

You can post your queries related to OpenCL in this forum.

Link to the Forum: https://community.intel.com/t5/GPU-Compute-Software/bd-p/gpu-compute-software

Thanks & Regards

Varsha

OCLdev · ‎09-21-2021

Hi Varsha,

Thanks for getting back. I should start by saying I tried this on an Ubuntu F4s_v2 (Intel Xeon 8272CL) VM on Azure and had no issues. The above came from an Ubuntu VM running on VMWare on Windows (on a Intel Core i7-6800K). So it might be related to just this particular unusual combination. I'll post the steps to run the test in case you need them:

Create Ubuntu 18.04 VM.
Update VM:
sudo apt-get update
Install g++ and pocl to build:
sudo apt-get -y install g++ ocl-icd-opencl-dev opencl-headers libpocl2
Copy OpenCL_test.cpp and cl2.hpp to VM.
Compile test program:
g++ OpenCL_test.cpp -o test /usr/lib/x86_64-linux-gnu/libOpenCL.so
Run test:
azure:~$./test
Platform: Portable Computing Language
OpenCL: Build log:
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2
Install Intel OpenCL:
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"
sudo apt update
sudo apt-get install -y --no-install-recommends intel-oneapi-runtime-opencl
Run test, argument is the platform to use: (note it works on the Azure VM):
./test 0
Platform: Intel(R) OpenCL
OpenCL: Build log: Compilation started
Compilation done
Linking started
Linking done
Device build started
Options used by backend compiler: -cl-std=CL1.2 -w
Device build done
Kernel <test> was successfully vectorized (16)
Done.
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2

On the VMWare VM this has the error:

./test 0
Platform: Intel(R) OpenCL
OpenCL: Build log: Compilation started
Compilation done
Linking started
Linking done
Device build started
Options used by backend compiler: -cl-std=CL1.2 -w
Device build done
Kernel <test> was successfully vectorized (8)
Done.
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3126, 0, 0, 0, 0, 3126

Alberto_R_Intel · ‎09-23-2021

OCLdev, Thank you for posting in the Intel® Communities Support.

Just to let you know, the thread was already moved to the proper department, they will further assist you with this matter as soon as possible.

Regards,

Albert R.

Intel Customer Support Technician

OCLdev · ‎07-06-2024

I reported this 3 years ago now but it's still not fixed in the latest driver (Windows 2024.2.0.980). It's a bit surprising no-one else has had issues with this given it's giving completely corrupted results for the most basic kernel. Please could you point me to the thread this was moved to? Or provide details to give a bug report to the developers?

cw_intel · ‎07-08-2024

Hi,

VMWare is not a validated platform for Intel OpenCL CPU RT. I tested your code on a Ubuntu machine, the result is correct.

Does your issue occur on all platforms with VMWare? Or does it only occur in your configuration(Ubuntu 18.04 VM + Intel Core i7-6800K) ?

Thanks.

OCLdev · ‎07-09-2024

Hi, thanks for getting back about this. I initially tried this years ago on an Ubuntu VM but this latest test was on Windows. I've tried it on a couple of machines with these results:

Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz: Failed.
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz: Failed

Just to confirm this is the very latest version w_opencl_runtime_p_2024.2.0.980 downloaded from here. The test exe was compiled with Visual Studio. I haven't tested on Linux yet. The test gives the correct result with the NVidia GPU driver.

The tests were using auto vectorisation CL_CONFIG_CPU_VECTORIZER_MODE=0. Disabling this gives the correct result but obviously disables vectorisation and has a significant performance impact

cw_intel · ‎07-09-2024

Thank you for the feedback. I can reproduce the issue on Windows. And we will investigate it.

OCLdev · ‎09-19-2024

Thanks for looking into it. A couple more bits of info:

Same issue occurs with the latest drivers on Ubuntu 24.04 (intel-oneapi-runtime-opencl).
The really strange thing is that this only seems to occur for sizes that are power of 2 (see below).
Changing the local workgroup size to 4 results in no error, although this seems very small. However, I'm testing on a 4 core VM - so maybe this is something to do with the issue..?

For the original code (local size of NullRange) the result is correct for a size of 15:

1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0

For a size of 16 the issue occurs:

 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3126, 0, 0, 0, 0, 3126

For a size of 17 it's fine:

1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0

It's broken for sizes of 32, 64 etc.

I only noticed this as we break things into power-of-two size chunks, which might explain why others haven't seen this.

cw_intel · ‎11-03-2024

The original issue on Windows was fixed in the latest version 2025.0 and it was available now. Please test your code with the latest version.

OCLdev · ‎11-10-2024

Working now! Thank you very much for the fix.