- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

My system is

OS: Windows 7 64bits CPU: i7-4720HQ iGPU: HD 4600 Graphics driver: 15.38.28.4332 and 15.36.24.64.4264 OpenCL SDK: 2016 revision

**Problem: when compiled with -fast-relaxed-math, the cos() function behaves as the sin() function on HD4600.**

Minimal reproducer code:

void kernel test_bug(global float* input, global float* output) { float x = input[0]; float cos_rand = cos(x); output[0] = cos_rand; }

The buffers initialization (storing only 1 float):

intel_bug_input_buffer = cl::Buffer {context, CL_MEM_READ_ONLY, sizeof(float)}; intel_bug_output_buffer = cl::Buffer {context, CL_MEM_WRITE_ONLY, sizeof(float)}; intel_bug_input = new float[1]; intel_bug_input[0] = 1; intel_bug_output = new float[1]; intel_bug_output[0] = 0;

The calling host code:

queue.enqueueWriteBuffer(intel_bug_input_buffer, CL_TRUE, 0, sizeof(float), intel_bug_input); cl::make_kernel<cl::Buffer&, cl::Buffer&> intel_bug(render.prog, "test_bug"); cl::EnqueueArgs enqueueArgs1(queue, cl::NDRange(1, 1)); intel_bug(enqueueArgs1, intel_bug_input_buffer, intel_bug_output_buffer).wait(); queue.enqueueReadBuffer(intel_bug_output_buffer, CL_TRUE, 0, sizeof(float), intel_bug_output); printf("cos(%f) returned %f\n", intel_bug_input[0], intel_bug_output[0]);

Testing with different input values and different devices gives the following results:

**HD4600**

cos(0.000000) returned 0.000000 cos(1.000000) returned 0.841489 cos(2.000000) returned 0.909297 cos(3.140000) returned 0.001593

**i7-4720HQ**

cos(0.000000) returned 1.000000 cos(1.000000) returned 0.540302 cos(2.000000) returned -0.416147 cos(3.140000) returned -0.999999

**Nvidia 965M**

cos(0.000000) returned 1.000000 cos(1.000000) returned 0.540302 cos(2.000000) returned -0.416147 cos(3.140000) returned -0.999999

I tried using OpenCL C native_cos() instead and looking at the generated asm with CodeBuilder, I found a strange thing:

With -fast-relaxed-math, the cos() incorrectly returns sin() values and the native_cos() returns correct values. But the generated assembly is exactly the same ! It uses "math.cos" asm instructions. Can execution be different with the same assembly ? Unfortunately I was looking at the code generated by CodeBuilder but had to build and run it with my app (using the exact same kernel) as I couldn't run it in CodeBuilder, so maybe my app actually produced different code.. doubtful but I can't know for sure.

Whitout the flag, both are correct; the native_cos() still generates "math.cos" asm instructions and the cos() generates hundreds of instructions.

I could see how one could have mismatch a cos by a sin in the driver seeing how they are related but I can't believe I'm the only one using a cos() with -fast-relaxed-math on this hardware/software config...

PS: I tried but no, sin() correctly returns sin() values...

PPS: attached is the asm generated for this kernel. It is exactly the same for the 3 [native_cos] / [native_cos+fast-relaxed-math] / [cos+fast-relaxed-math configurations].

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Teybeo,

I tried really hard to reproduce the issue on my IVB, HSW and BDW boxes running Windows 8.1 and Windows 10 without any success. Everything appears to be working properly.

I wonder what happens when you try the following kernel

kernel void test_bug(global float* input, global float* output) { int i = get_global_id(0); float x = input; float cos_rand = cos(x); output= cos_rand; }

with buffer sizes and nd ranges of size 1024?

I consulted with our driver architect, he looked at the underlying cos code and stated that it is impossible to return sin values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Robert,

I tried your suggestions but every values in the output buffer are still sin() values.

One way we could obtain these values with cos() would be doing cos(Pi/2 - x) or cos(x + 3Pi/4) but the spir and assembly generated through CodeBuilder (ioc64 ?) looks fine to me (but i'm definitely not familiar with llvm, spir nor hd4600 asm...).

I tried retrieving the binary of the kernel after it has been built from my app with clGetProgramInfo( ... CL_PROGRAM_BINARIES...) to verify it was similar to the offline compiler output but it's binary so I can't really parse it... I attached the file if this can help (added .txt to the name to allow me to upload it).

Modified buffer setup:

INTEL_COUNT = 1024; intel_bug_input_buffer = cl::Buffer {context, CL_MEM_READ_ONLY, sizeof(float) * INTEL_COUNT}; intel_bug_output_buffer = cl::Buffer {context, CL_MEM_WRITE_ONLY, sizeof(float) * INTEL_COUNT}; intel_bug_input = new float[INTEL_COUNT]; intel_bug_output = new float[INTEL_COUNT]; for (int i = 0; i < INTEL_COUNT; ++i) { intel_bug_input= 1; intel_bug_output= 0; }

And modified calling code:

queue.enqueueWriteBuffer(intel_bug_input_buffer, CL_TRUE, 0, sizeof(float) * INTEL_COUNT, intel_bug_input); cl::make_kernel<cl::Buffer&, cl::Buffer&> intel_bug(render.prog, "test_bug"); cl::EnqueueArgs enqueueArgs1(queue, cl::NDRange(INTEL_COUNT)); intel_bug(enqueueArgs1, intel_bug_input_buffer, intel_bug_output_buffer).wait(); queue.enqueueReadBuffer(intel_bug_output_buffer, CL_TRUE, 0, sizeof(float) * INTEL_COUNT, intel_bug_output); for (int i = 0; i < INTEL_COUNT; i += 128) { printf("[%d]: cos(%f) returned %f\n", i, intel_bug_input, intel_bug_output); }

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page