Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
8 Views

strange behavior when float16 are used and the meaning of thread idle

Hi OpenCL experts:

    I saw a sentence "Thread dispatch serialization becomes a gating factor when a kernel has insufficient work per a work-item." in page 6 of the paper named <Intel® VTune™ Amplifier XE: Getting started with OpenCL™ performance HD Graphics OpenCL™ analysis on Intel HD Graphics>. I don't get the point.
    Today I wrote a kernel to translate 3-channelled image to gray. The 3-channel are placed in 3 separated mem. Every work-item should read 3 times to utilize these data. When I use SIMD4(vload4()) instruction, the idle EUs array can be 17% , meanwhile 82% with SIMD 16(vload16()). Which factor cased it? Did I miss something?
    These statistics are collected with 512x512 image, the number of work-items are 128x512 and 32x512 separately, and the local size are set to NULL. Please help me, I didn't have any idea.

 

0 Kudos
1 Reply
Highlighted
Moderator
8 Views

I'm moving this to the OpenCL* forum.

0 Kudos