OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

clEnqueueReadBuffer transfers truncated data on HD4600

Pavel_S_1
Beginner
285 Views

In certain cases clEnqueueReadBuffer doesn't transfer all the required data when executed on HD4600. System: Win7 x64, driver version 15.36.19.64.4170, 32-bit application.

It seems that in case of page-aligned destination buffer and transfer length that is not multiple of 4KB only multiple of 4KB is transfered. Sample code:

size_t length = 76800;
char *dst = VirtualAlloc(NULL, length, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, PAGE_READWRITE);
cl_mem buf = clCreateBuffer(context, CL_MEM_READ_WRITE, length, NULL, NULL);
clEnqueueReadBuffer(command_queue, buf, CL_TRUE, 0, length, dst, 0, NULL, NULL);

The problem is present only on Intel GPU, the same code produce correct results when executed on AMD and Nvidia GPUs. The same incorrect behavior can be replicated on proper implementations by calling:

clEnqueueReadBuffer(command_queue, buf, CL_TRUE, 0, length & 0xFFFFF000, dst, 0, NULL, NULL);











 

0 Kudos
10 Replies
Robert_I_Intel
Employee
285 Views

Hi Pavel,

Could you please provide the full minimal reproducer with the kernel that you are running?

Typically, on our systems we highly recommend to allocate memory using _aligned_malloc w/ 4K alignment and create a buffer w/ USE_HOST_POINTER flag from aligned memory allocation: this guarantees "zero copy" behavior, so instead of using clEnqueueReadBuffer, you could use clEnqueueMapBuffer and no memory is copied around.

BTW, is there any reason not to use that pattern and use VirtualAlloc and clEnqueueReadBuffer instead?

Thanks!

Robert

Pavel_S_1
Beginner
285 Views

I've attached cutted down version of my code that reproduces the bug. Compile with MSVS2013.

Additional information: bug is only present if the destination address lies in second half of 32-bit address space. For that /LARGEADDRESSAWARE linker flag and MEM_TOP_DOWN allocation flag are essential. If either one is not specified the memory is allocated below 2GB and everything works fine.

The code is not optimized for Intel GPUs in any way, before that I need to be sure that my code works there properly. My previous attempt to run the same code on some earlier driver resulted in driver hang with TDR recovery several seconds later, so I just gave up on them for the time being.

Robert_I_Intel
Employee
285 Views

Hi Pavel,

I could reproduce the behavior you described. However, the fix is very simple: build you program using x64 configuration, and then you can use /LARGEADDRESSAWARE flag.

Pavel_S_1
Beginner
285 Views

That 'fix' is not a fix at all: the code has to interact with plenty of handwritten assembler code so 32-bit and /LARGEADDRESSAWARE are a must. Better (and still current) solution is blacklisting Intel GPUs until they can do the processing correctly.
 

Robert_I_Intel
Employee
285 Views

Pavel,

I contacted our driver folks to see if they can figure out what's wrong w/ 32-bit mode.

Will keep you posted.

Robert

Pavel_S_1
Beginner
285 Views

Thanks, I'm looking forward to better drivers.
 

Robert_I_Intel
Employee
285 Views

Hi Pavel,

The driver architect had informed me that this issue was fixed. It will be probably another two to six months until the driver is actually released, since the driver update was just released at the beginning of June.

Sorry for the long wait! And thank you again for reporting this issue.

Pavel_S_1
Beginner
285 Views

I'm glad the issue is finally fixed. Too bad it would require so much time to finally see it.

I think Intel should adopt regular beta drivers builds to push new versions faster and get more feedback from people who are willing to help you testing.

Pavel_S_1
Beginner
285 Views

I can confirm that the bug is fixed in driver version 15.36.24.64.4264.

Too bad anything after 15.36.21.64.4222 is unusable because it can't duplicate two displays. (I haven't made bug report for that yet, will do so when gather enough information.)

Pavel_S_1
Beginner
285 Views

Status update: driver version 15.36.26.4294 finally is the version we can switch to, as it has fix for the issue and no other problems for us.
 

Thank you (Intel) again, for actually fixing driver bugs. It's really pleasant to see that company cares for proper implementation and not just that major games/programs are working fine.

Reply