- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In certain cases clEnqueueReadBuffer doesn't transfer all the required data when executed on HD4600. System: Win7 x64, driver version 15.36.19.64.4170, 32-bit application.
It seems that in case of page-aligned destination buffer and transfer length that is not multiple of 4KB only multiple of 4KB is transfered. Sample code:
size_t length = 76800; char *dst = VirtualAlloc(NULL, length, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, PAGE_READWRITE); cl_mem buf = clCreateBuffer(context, CL_MEM_READ_WRITE, length, NULL, NULL); clEnqueueReadBuffer(command_queue, buf, CL_TRUE, 0, length, dst, 0, NULL, NULL);
The problem is present only on Intel GPU, the same code produce correct results when executed on AMD and Nvidia GPUs. The same incorrect behavior can be replicated on proper implementations by calling:
clEnqueueReadBuffer(command_queue, buf, CL_TRUE, 0, length & 0xFFFFF000, dst, 0, NULL, NULL);
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavel,
Could you please provide the full minimal reproducer with the kernel that you are running?
Typically, on our systems we highly recommend to allocate memory using _aligned_malloc w/ 4K alignment and create a buffer w/ USE_HOST_POINTER flag from aligned memory allocation: this guarantees "zero copy" behavior, so instead of using clEnqueueReadBuffer, you could use clEnqueueMapBuffer and no memory is copied around.
BTW, is there any reason not to use that pattern and use VirtualAlloc and clEnqueueReadBuffer instead?
Thanks!
Robert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've attached cutted down version of my code that reproduces the bug. Compile with MSVS2013.
Additional information: bug is only present if the destination address lies in second half of 32-bit address space. For that /LARGEADDRESSAWARE linker flag and MEM_TOP_DOWN allocation flag are essential. If either one is not specified the memory is allocated below 2GB and everything works fine.
The code is not optimized for Intel GPUs in any way, before that I need to be sure that my code works there properly. My previous attempt to run the same code on some earlier driver resulted in driver hang with TDR recovery several seconds later, so I just gave up on them for the time being.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavel,
I could reproduce the behavior you described. However, the fix is very simple: build you program using x64 configuration, and then you can use /LARGEADDRESSAWARE flag.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That 'fix' is not a fix at all: the code has to interact with plenty of handwritten assembler code so 32-bit and /LARGEADDRESSAWARE are a must. Better (and still current) solution is blacklisting Intel GPUs until they can do the processing correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel,
I contacted our driver folks to see if they can figure out what's wrong w/ 32-bit mode.
Will keep you posted.
Robert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I'm looking forward to better drivers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavel,
The driver architect had informed me that this issue was fixed. It will be probably another two to six months until the driver is actually released, since the driver update was just released at the beginning of June.
Sorry for the long wait! And thank you again for reporting this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm glad the issue is finally fixed. Too bad it would require so much time to finally see it.
I think Intel should adopt regular beta drivers builds to push new versions faster and get more feedback from people who are willing to help you testing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can confirm that the bug is fixed in driver version 15.36.24.64.4264.
Too bad anything after 15.36.21.64.4222 is unusable because it can't duplicate two displays. (I haven't made bug report for that yet, will do so when gather enough information.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Status update: driver version 15.36.26.4294 finally is the version we can switch to, as it has fix for the issue and no other problems for us.
Thank you (Intel) again, for actually fixing driver bugs. It's really pleasant to see that company cares for proper implementation and not just that major games/programs are working fine.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page