OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1729 Discussions

OpenCL vectorisation issue in OneAPI driver

OCLdev
Beginner
3,832 Views

The OneAPI CPU OpenCL drivers (2021.12.6.0.19_160000) seem to have an issue with vectorization resulting in corrupted data. This simple kernel:

__kernel void test(__global float *f, __global float *r) {
int i = get_global_id(0);

r[i] = 0.0;
if (f[i] == 1.0F) {
r[i] = 1.0F+pow(1.0F, 1.0F);
}
}

when run over a buffer of length 16 with f equal to:

1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1

results in the following values in r:

2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3126, 0, 0, 0, 0, 3126

Note the odd 3126 values. Turning off vectorisation using

CL_CONFIG_CPU_VECTORIZER_MODE=1

Results in the correct values:

2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2

The issue seems to be caused by adding a value to the function (i.e. here 1.0+pow()), for example this gives an expected result:

__kernel void test(__global float *f, __global float *r) {
int i = get_global_id(0);

r[i] = 0.0;
if (f[i] == 1.0F) {
r[i] = pow(1.0F, 1.0F);
}
}

This is on a Ubuntu 18.04 VM with an Intel(R) Core(TM) i7-6800K.

Is there anywhere I can raise a bug report/ticket? Or is this the correct forum to report?

0 Kudos
10 Replies
VarshaS_Intel
Moderator
3,760 Views

Hi,


Thanks for reaching out to us.


Could you please provide the details of the compiler you are using and complete reproducer code with the steps to reproduce the issue?


>>>Is there anywhere I can raise a bug report/ticket? Or is this the correct forum to report?

You can post your queries related to OpenCL in this forum.

Link to the Forum: https://community.intel.com/t5/GPU-Compute-Software/bd-p/gpu-compute-software


Thanks & Regards

Varsha


0 Kudos
OCLdev
Beginner
3,737 Views

Hi Varsha,

 

Thanks for getting back. I should start by saying I tried this on an Ubuntu F4s_v2 (Intel Xeon 8272CL) VM on Azure and had no issues. The above came from an Ubuntu VM running on VMWare on Windows (on a Intel Core i7-6800K). So it might be related to just this particular unusual combination. I'll post the steps to run the test in case you need them:

  1. Create Ubuntu 18.04 VM.
  2. Update VM:
    sudo apt-get update
  3. Install g++ and pocl to build:
    sudo apt-get -y install g++ ocl-icd-opencl-dev opencl-headers libpocl2
  4. Copy OpenCL_test.cpp and cl2.hpp to VM.
  5. Compile test program:
    g++ OpenCL_test.cpp -o test /usr/lib/x86_64-linux-gnu/libOpenCL.so
  6. Run test:
    azure:~$./test
    Platform: Portable Computing Language
    OpenCL: Build log:
    1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
    2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2
  7. Install Intel OpenCL:
    wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
    sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
    rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
    sudo add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"
    sudo apt update
    sudo apt-get install -y --no-install-recommends intel-oneapi-runtime-opencl
  8. Run test, argument is the platform to use: (note it works on the Azure VM):
    ./test 0
    Platform: Intel(R) OpenCL
    OpenCL: Build log: Compilation started
    Compilation done
    Linking started
    Linking done
    Device build started
    Options used by backend compiler: -cl-std=CL1.2 -w
    Device build done
    Kernel <test> was successfully vectorized (16)
    Done.
    1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
    2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2

On the VMWare VM this has the error:

./test 0
Platform: Intel(R) OpenCL
OpenCL: Build log: Compilation started
Compilation done
Linking started
Linking done
Device build started
Options used by backend compiler: -cl-std=CL1.2 -w
Device build done
Kernel <test> was successfully vectorized (8)
Done.
1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3126, 0, 0, 0, 0, 3126

 

0 Kudos
Alberto_R_Intel
Employee
3,682 Views

OCLdev, Thank you for posting in the Intel® Communities Support.


Just to let you know, the thread was already moved to the proper department, they will further assist you with this matter as soon as possible.


Regards,

Albert R.


Intel Customer Support Technician


0 Kudos
OCLdev
Beginner
2,469 Views

I reported this 3 years ago now but it's still not fixed in the latest driver (Windows 2024.2.0.980). It's a bit surprising no-one else has had issues with this given it's giving completely corrupted results for the most basic kernel. Please could you point me to the thread this was moved to? Or provide details to give a bug report to the developers?

0 Kudos
cw_intel
Moderator
2,406 Views

Hi,

 

VMWare is not a validated platform for Intel OpenCL CPU RT.   I tested your code on a Ubuntu machine, the result is correct. 

Does your issue occur on all platforms with VMWare? Or does it only occur in your configuration(Ubuntu 18.04 VM + Intel Core i7-6800K) ?

 

Thanks.

 

 

 

 

0 Kudos
OCLdev
Beginner
2,384 Views

Hi, thanks for getting back about this. I initially tried this years ago on an Ubuntu VM but this latest test was on Windows. I've tried it on a couple of machines with these results:

  • Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz: Failed.
  • Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz: Failed

Just to confirm this is the very latest version w_opencl_runtime_p_2024.2.0.980 downloaded from here. The test exe was compiled with Visual Studio. I haven't tested on Linux yet. The test gives the correct result with the NVidia GPU driver.

The tests were using auto vectorisation CL_CONFIG_CPU_VECTORIZER_MODE=0. Disabling this gives the correct result but obviously disables vectorisation and has a significant performance impact

0 Kudos
cw_intel
Moderator
2,354 Views

Thank you for the feedback. I can reproduce the issue on Windows. And we will investigate it. 

0 Kudos
OCLdev
Beginner
1,749 Views

Thanks for looking into it. A couple more bits of info:

  • Same issue occurs with the latest drivers on Ubuntu 24.04 (intel-oneapi-runtime-opencl).
  • The really strange thing is that this only seems to occur for sizes that are power of 2 (see below).
  • Changing the local workgroup size to 4 results in no error, although this seems very small. However, I'm testing on a 4 core VM - so maybe this is something to do with the issue..?

For the original code (local size of NullRange) the result is correct for a size of 15:

1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0

For a size of 16 the issue occurs:

 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3126, 0, 0, 0, 0, 3126

For a size of 17 it's fine:

1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0

 It's broken for sizes of 32, 64 etc.

I only noticed this as we break things into power-of-two size chunks, which might explain why others haven't seen this.

 

0 Kudos
cw_intel
Moderator
1,385 Views

The original issue on Windows  was fixed in the latest version 2025.0 and it was available now.  Please test your code with the latest version.  

0 Kudos
OCLdev
Beginner
1,165 Views

Working now! Thank you very much for the fix.

0 Kudos
Reply