pointer aritmitic issue in new graphics driver

nielsen__rasmus · ‎08-24-2018

Hi

I have just tracked down a bug in a opencl kernel i have written. The code had been working fine until one of the users got a graphics driver update (versione 20.19.15.4835).

The code had worked for about 1 year on a wide assortment of CPU's and integrated and dedicated GPU's, both when compiled with x64 and x86. The old code still works on the CPU when compiled with either x64 or x86, and on the integrated gpu when compiled with x86. But when run on integrated graphics cards, with the newest driver, in x64 mode, it failes.

i have been able to track it down to this line of code:

float x1 = (xCoords + turbines * windDirIndex)[rel.downstream];

Seemingly randomly, this line would return 0 instead of the content in xCoords. Changing the code to the following fixes the bug.

float x1 = xCoords[rel.downstream + turbines * windDirIndex];

The variable types are as follows:
xCoords: global float*
turbines: ushort
windDirIndex: ushort

Can anybody explain why the two lines have different behaviour in this very specific case?

Lukasz_W_Intel · ‎09-04-2018

Hello there,

could please report this issue here:
https://github.com/intel/intel-graphics-compiler/issues

and please provide some example shader to reproduce this issue.

Thanks!

Łukasz

Ben_A_Intel · ‎09-04-2018

I agree that adding a github issue (ideally with a simple reproducer) would be helpful and the fastest way to resolve this issue. As to why you are seeing different results, the two lines do look slightly different to the compiler: - The first computes a new pointer that is an offset from the base pointer, and then indexes into the new pointer. - The second does some math to compute the index, but then indexes directly into the base pointer. It's possible that this slightly different order of operations is exposing a bug.

Michael_C_Intel1 · ‎09-07-2018

Hi RasmusN,

Thanks for commenting on your sighting.

Can you clarify what you mean by "on the CPU"... I'm trying to understand the comparison. Which (if any) of these two cases are you describing?

Do you mean you have a program that executes without OpenCL on the CPU that you've ported to be CPU and OpenCL/Intel Graphics?
Do you mean that you have a program that runs through OpenCL for our OpenCL CPU implementation that you are comparing to the CPU and OpenCL/Intel Graphics port?

Regardless... This could be a case of the MSVS project preprocessor macros or compiler toggles associated with x64 or x32 labels are doing something unexpected with your memory filling assumptions/routines. This may orient the triage person toward how memory is being written to in the host program before getting shipped to the OCL target.

Sending a generic reproducer through the git hub portal mentioned in the previous post is recommended.

-MichaelC

nielsen__rasmus · ‎09-09-2018

Hi all.

Thanks for responding! as per your suggestion, i have created an issue on the github, togeather with a "minimal" example. I had less faith in my understanding of pointer arithmetic than in the correctness of the compiler, so that is why i statet by creating this, to not make a fool of myself :)

To respond to your (MICHAEL C) comments/questions:

Excactly, i have a opencl kernel, which i then run with the intel compiler on both a cpu and a gpu, which yields different results. I get the same results when the opencl device used is the cpu, and the gpu compiled in 32 bit mode, but when compiled on the gpu in 64 bit mode, the results are different.

Michael_C_Intel1 · ‎09-10-2018

Hi RasmusN,

Thanks for the followup. My following response is a little bit verbose because your topic brings up a few different gaps.

Lots of language here is overloaded so let me try to confirm. Which of these cases applies?

You have an msvs project setup to use x64 and Intel Compiler (icl) for a host side CPU only application (c/c++ src) implementing all your algorithms, this project builds a functional application.
You have an msvs project setup to use x86 (32bit) and Intel Compiler (icl) for a host side CPU only application (c/c++ src) implementing all your algorithms, this project builds a functional application.
You have an msvs project setup to use x64 and Intel Compiler (icl) for a heterogeneous OpenCL target application (c/c++ and cl srcs) , this project builds a functional application when the OpenCL target is CPU. This project generates an application which fails when the OpenCL target is Intel Graphics.
You have an msvs project setup to use x86 (32bit) and Intel Compiler (icl) for a heterogeneous OpenCL target application (c/c++ and cl srcs) , this project builds a functional application when the OpenCL target is CPU. This project builds a functional application when the OpenCL target is Intel Graphics.

I took a look at the example you provided... immediate observations:

arg 0 is set to kernel inputA and host program relationsBuffer
arg 2 is set to kernel inputC and host program inputA
host relationsBuffer/kernel inputA uses char filling, float for sizing, and int pointer for kernel access...

The aliasing used moves me to ask: are the objects used matching their intent?

Also to be complete... there are a few different compilers related to heterogeneous development.

Intel Compiler (icc/icl) for x86 and x86_64 programs
ioc64/ioc32 from the SDK are a frontend to drive Intel's OpenCL implementations to build OpenCL kernels.
igp as mentioned in the github link which serves as an OpenCL compiler targeting Intel Graphics... GEN architecture devices...
- This compiler comes with Intel Graphics implementations like Intel Graphics Compute Runtime for OpenCL Driver.

-MichaelC

Michael_C_Intel1 · ‎09-10-2018

Reference: https://github.com/intel/intel-graphics-compiler/issues/21

-MichaelC

nielsen__rasmus · ‎09-11-2018

Hi Michael

Cases 3 and 4 are correct. There is no native c++/c implementation, it is always run through opencl in all scenarioes. Except for the fact that the c++ compiler used is the default Visual Studio c++ Compiler, so MSVC.

So my understanding is that my code and the cl.hpp file are compiled using MSVC, and then the intel driver are given the source code for the kernel for compiling. Is it the library which compiles the kernel from a given string and gives the kernel object you refer to as Intel Compiler (icl)?

I'm sorry for the inconsistent typing and naming. I did a lot of refactoring to cut down a big project to a small example, so i made a few typoes. I have fixed them, and it should be more consistent and clear now.

I am not sure what you mean by aliasing? The arrays should be non-overlapping if that is what you refer to?

I am sorry, but i dont know much about the compilers? where can i check it? My usage was like this:
I installed the Intel OpenCL SDK for windows 8, with the visual studio 2017 plugin from Intel's website, then created an empty opencl project (a template project created by the plugin), and then wrote the code shown in the git issue in the main.cpp file, and click run. I dont have any idea how the intel compiler pipeline works behind the cl.hpp file, so if you need any more info about the OpenCL compiler, i think i need some direction on how tofind out :) . My iGPU is a HD 5500.

Michael_C_Intel1 · ‎09-11-2018

Hello RasmusN,

On Compliers:
- icl.exe is the primary compiler executable for Intel Compiler. It's a separate product meant for x86 / x86_64 programs. You described the Intel Compiler earlier in the thread so I sought to clarify... icl.exe is not used for OpenCL kernels. The installed Windows OS suite comes with plugins to replace the cl.exe references within a project/.sln.
- cl.exe is the driver behind MSVS, it sounds like thats what you're using for the host side program.
- The Intel Graphics targeted just in time compilation is handled by intel graphics compiler for OpenCL within Intel Graphics Compute Runtime for OpenCL Driver... This OpenCL implementation comes as part of the Intel Graphics Driver package. This sounds like where you're experiencing unexpected behavior.
Reproducers:
- Creating reproducers is a non trivial effort. It's understood what's supplied can't be perfect. I think the devs on github can work with your source so long as it's representative...
- Sorry for being unclear... by aliasing, I meant a name associated with a buffer that would have a different name when used in the kernel... since the names don't match it encourages confirmation that the proper buffers were used in the proper place in the kernel. We'll see what they say.

Thank you for your forum participation.

-MichaelC