- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
trying to disable auto-vectorization, I used vec_type_hint(char) as below. But checking the built assembly code, I see the kernel is still compiled as SIMD32. Any advice?
__kernel __attribute__((vec_type_hint(uchar))) void modulate_v1_uchar(global const uchar *pSrc, global uchar *pDst)
//__kernel void modulate_v1_uchar(global const uchar *pSrc, global uchar *pDst)
{
uint idx = get_global_id(0);
uchar src = pSrc[idx];
src >>= MODULATE_SHIFT_FACTOR;
pDst[idx] = src;
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You won't be able to disable vectorization on the GPU. That hint is for the CPU. You could use _attribute__((reqd_work_gr oup_size(X, Y, Z))) could to change SIMD width to 8 (8, 1, 1), 16 (16, 1, 1) or 32 (32, 1, 1)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
great! thanks Robert.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I actually have a further question: how is workgroup size (eg, _attribute__((reqd_work_gr oup_size(X, Y, Z)))) related to SIMD width? does it mean work-item of different workgroups can not be handled on the same thread? Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, typically a work-group maps to a hardware thread or can span multiple threads or even multiple EUs.505229
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See slides 18 thru 34 in the presentation above.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page