OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1720 Discussions

strange behavior when float16 are used and the meaning of thread idle


Hi OpenCL experts:

    I saw a sentence "Thread dispatch serialization becomes a gating factor when a kernel has insufficient work per a work-item." in page 6 of the paper named <Intel® VTune™ Amplifier XE: Getting started with OpenCL™ performance HD Graphics OpenCL™ analysis on Intel HD Graphics>. I don't get the point.
    Today I wrote a kernel to translate 3-channelled image to gray. The 3-channel are placed in 3 separated mem. Every work-item should read 3 times to utilize these data. When I use SIMD4(vload4()) instruction, the idle EUs array can be 17% , meanwhile 82% with SIMD 16(vload16()). Which factor cased it? Did I miss something?
    These statistics are collected with 512x512 image, the number of work-items are 128x512 and 32x512 separately, and the local size are set to NULL. Please help me, I didn't have any idea.


0 Kudos
1 Reply

I'm moving this to the OpenCL* forum.

0 Kudos