clEnqueueReadBuffer transfers truncated data on HD4600

Pavel_S_1 · ‎04-25-2015

In certain cases clEnqueueReadBuffer doesn't transfer all the required data when executed on HD4600. System: Win7 x64, driver version 15.36.19.64.4170, 32-bit application.

It seems that in case of page-aligned destination buffer and transfer length that is not multiple of 4KB only multiple of 4KB is transfered. Sample code:

size_t length = 76800;
char *dst = VirtualAlloc(NULL, length, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, PAGE_READWRITE);
cl_mem buf = clCreateBuffer(context, CL_MEM_READ_WRITE, length, NULL, NULL);
clEnqueueReadBuffer(command_queue, buf, CL_TRUE, 0, length, dst, 0, NULL, NULL);

The problem is present only on Intel GPU, the same code produce correct results when executed on AMD and Nvidia GPUs. The same incorrect behavior can be replicated on proper implementations by calling:

clEnqueueReadBuffer(command_queue, buf, CL_TRUE, 0, length & 0xFFFFF000, dst, 0, NULL, NULL);

Robert_I_Intel · ‎04-27-2015

Hi Pavel,

Could you please provide the full minimal reproducer with the kernel that you are running?

Typically, on our systems we highly recommend to allocate memory using _aligned_malloc w/ 4K alignment and create a buffer w/ USE_HOST_POINTER flag from aligned memory allocation: this guarantees "zero copy" behavior, so instead of using clEnqueueReadBuffer, you could use clEnqueueMapBuffer and no memory is copied around.

BTW, is there any reason not to use that pattern and use VirtualAlloc and clEnqueueReadBuffer instead?

Thanks!

Robert

Pavel_S_1 · ‎04-28-2015

I've attached cutted down version of my code that reproduces the bug. Compile with MSVS2013.

Additional information: bug is only present if the destination address lies in second half of 32-bit address space. For that /LARGEADDRESSAWARE linker flag and MEM_TOP_DOWN allocation flag are essential. If either one is not specified the memory is allocated below 2GB and everything works fine.

The code is not optimized for Intel GPUs in any way, before that I need to be sure that my code works there properly. My previous attempt to run the same code on some earlier driver resulted in driver hang with TDR recovery several seconds later, so I just gave up on them for the time being.

Robert_I_Intel · ‎04-28-2015

Hi Pavel,

I could reproduce the behavior you described. However, the fix is very simple: build you program using x64 configuration, and then you can use /LARGEADDRESSAWARE flag.

Pavel_S_1 · ‎04-28-2015

That 'fix' is not a fix at all: the code has to interact with plenty of handwritten assembler code so 32-bit and /LARGEADDRESSAWARE are a must. Better (and still current) solution is blacklisting Intel GPUs until they can do the processing correctly.

Robert_I_Intel · ‎04-28-2015

Pavel,

I contacted our driver folks to see if they can figure out what's wrong w/ 32-bit mode.

Will keep you posted.

Robert

Pavel_S_1 · ‎04-28-2015

Thanks, I'm looking forward to better drivers.

Robert_I_Intel · ‎06-16-2015

Hi Pavel,

The driver architect had informed me that this issue was fixed. It will be probably another two to six months until the driver is actually released, since the driver update was just released at the beginning of June.

Sorry for the long wait! And thank you again for reporting this issue.

Pavel_S_1 · ‎06-18-2015

I'm glad the issue is finally fixed. Too bad it would require so much time to finally see it.

I think Intel should adopt regular beta drivers builds to push new versions faster and get more feedback from people who are willing to help you testing.

Pavel_S_1 · ‎09-10-2015

I can confirm that the bug is fixed in driver version 15.36.24.64.4264.

Too bad anything after 15.36.21.64.4222 is unusable because it can't duplicate two displays. (I haven't made bug report for that yet, will do so when gather enough information.)

Pavel_S_1 · ‎10-31-2015

Status update: driver version 15.36.26.4294 finally is the version we can switch to, as it has fix for the issue and no other problems for us.

Thank you (Intel) again, for actually fixing driver bugs. It's really pleasant to see that company cares for proper implementation and not just that major games/programs are working fine.