- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, everybody!
I have found 2 problems with OpenCL kernel compilation on Intel HD4600:
1) Kernel Builder can't build some of my kernels and reports this:
OpenCL Intel(R) Graphics device was found! Device name: Intel(R) HD Graphics 4600 Device version: OpenCL 1.2 Device vendor: Intel(R) Corporation Device profile: FULL_PROFILE fcl build 1 succeeded. Build failed!
If i reduce kernel code, then build would finish successfully. But the whole kernel can't be build. And my program also can't do this.
At the same time this kernel can be compiled by Kernel Builder. I can't share my kernel code, because it is proprietary. And while trying to write test kernel to reproduce this problem, I found another one:
2) Kernel Builder hangs up while building following kernel:
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; #define ROTR( x, n ) ( ( (x) >> (n) ) | ( (x) << ( 32 - (n) ) ) ) #define SHR( x, n ) ( (x) >> (n) ) #define Ch( x, y, z ) ( ( (x) & (y) ) ^ ( ~(x) & (z) ) ) #define Maj( x, y, z ) ( ( (x) & (y) ) ^ ( (x) & (z) ) ^ ( (y) & (z) ) ) #define SIGMA0( x ) ( ROTR( (x), 2 ) ^ ROTR( (x), 13 ) ^ ROTR( (x), 22 ) ) #define SIGMA1( x ) ( ROTR( (x), 6 ) ^ ROTR( (x), 11 ) ^ ROTR( (x), 25 ) ) #define sigma0( x ) ( ROTR( (x), 7 ) ^ ROTR( (x), 18 ) ^ SHR( (x), 3 ) ) #define sigma1( x ) ( ROTR( (x), 17 ) ^ ROTR( (x), 19 ) ^ SHR( (x), 10 ) ) #define ROUND( A, B, C, D, E, F, G, H, W, k ) { \ (H) += SIGMA1( (E) ) + Ch( (E), (F), (G) ) + (k) + (W); \ (D) += (H); \ (H) += SIGMA0( (A) ) + Maj( (A), (B), (C) ); } void test( __read_only image2d_t image, uint word, uint* digest ) { uint4 storage = read_imageui( image, sampler, (int2)( get_local_id( 0 ), get_group_id( 0 ) ) ); uint a = storage.x; uint b = storage.y; uint c = storage.z; uint d = storage.w; uint e = 5; uint f = 6; uint g = 7; uint h = 8; uint w0 = word; uint w1 = 1; uint w2 = 0; uint w3 = 0; uint w4 = 0; uint w5 = 0; uint w6 = 0; uint w7 = 0; uint w8 = 0; uint w9 = 0; uint wA = 0; uint wB = 0; uint wC = 0; uint wD = 0; uint wE = 0; uint wF = 1; ROUND( a, b, c, d, e, f, g, h, w0, 1 ); ROUND( h, a, b, c, d, e, f, g, w1, 2 ); ROUND( g, h, a, b, c, d, e, f, w2, 3 ); ROUND( f, g, h, a, b, c, d, e, w3, 4 ); ROUND( e, f, g, h, a, b, c, d, w4, 5 ); ROUND( d, e, f, g, h, a, b, c, w5, 6 ); ROUND( c, d, e, f, g, h, a, b, w6, 7 ); ROUND( b, c, d, e, f, g, h, a, w7, 8 ); ROUND( a, b, c, d, e, f, g, h, w8, 9 ); ROUND( h, a, b, c, d, e, f, g, w9, 10 ); ROUND( g, h, a, b, c, d, e, f, wA, 11 ); ROUND( f, g, h, a, b, c, d, e, wB, 12 ); ROUND( e, f, g, h, a, b, c, d, wC, 13 ); ROUND( d, e, f, g, h, a, b, c, wD, 14 ); ROUND( c, d, e, f, g, h, a, b, wE, 15 ); ROUND( b, c, d, e, f, g, h, a, wF, 16 ); digest[ 0 ] = 1 + a; digest[ 1 ] = 2 + b; digest[ 2 ] = 3 + c; digest[ 3 ] = 4 + d; digest[ 4 ] = 5 + e; digest[ 5 ] = 6 + f; digest[ 6 ] = 7 + g; digest[ 7 ] = 8 + h; } __kernel void hangup( __read_only image2d_t image, __constant uint* cdata, __global uint* data ) { const uint gid = get_global_id( 0 ); uint digest[ 8 ]; test( image, data[ gid ], digest ); for( uint index = 0; index < 8; ++index ) { data[ gid ] ^= digest[ index ]; } }
And again, this kernel can be compiled by Kernel Builder. I assume hang occurs on link stage.
If I don't use 2D image, then no problems occur while building.
OpenCL driver version is 10.18.10.3652. Any help will be appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this report. I've been able to replicate here and the issue has been reported to the development team. Until a kernel builder fix can be implemented, is it reasonable in your case to switch to buffers?
According to the Optimization Guide:
To improve performance on the Intel Processor Graphics, do the following:
- Avoid images, except for irregular access patterns. For example, use buffers when processing in memory (in row-major) order.
- Use buffers for look-up tables
- Use local memory for explicit caching of buffer values
Regards, Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to clarify, are these issues seen only when using the kernel builder tool or also for regular compilation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeffrey Mcallister (Intel) wrote:
Until a kernel builder fix can be implemented, is it reasonable in your case to switch to buffers?
To improve performance on the Intel Processor Graphics, do the following:
- Avoid images, except for irregular access patterns. For example, use buffers when processing in memory (in row-major) order.
- Use buffers for look-up tables
- Use local memory for explicit caching of buffer values
Unfortunately, most of our kernels use images to get benefit from texture (image) cache. It is better to use images rather then global buffers on discrete GPUs, such as Nvidia and AMD GPUs.
Is it not true for Intel GPUs, because they have no texture cache? If it so, then we should write kernels for Intel GPUs from scratch...
Also, I have found, that hang up occurs when building for HD4600, which has OpenCL 1.2 version. And no hang up occurs when building the same kernel for HD4000 (OpenCL 1.1).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeffrey Mcallister (Intel) wrote:
Just to clarify, are these issues seen only when using the kernel builder tool or also for regular compilation?
Both Kernel Builder (32/64 bit) and our code with regular compilation via OpenCL.dll have this issue with hang up.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page