OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1720 Discussions

The impact of Shared Memory for workgroup scheduling between Ivybridge HD4000 and haswell HD4600



I have kernel which is defined as local size=16, global size=256, and in each workgroup there are 32KB shared memory allocated.

I run my kernel on Ivybridge 4000, and got the GPU idle state account for 75% percent, which is fine. As per half-slice there are 64KB shared memory, so only two (64KB/32KB) workgroups can be launched per half slice. Each workgroup schedule on one EU, so at most two EUs are active per half-slice, which brings us the idle number 1- (2 active EUs)/(8 EUs per half-slice) = 0.75.

However, when I run the same code on the Haswell HD4600 GPU, the idle state is only 20%. HD4600 has 20 EUs, 10 EU per half slice, each workgroup schedule on one EU, so the idle EUs is 4. This indicates all 16 hardware threads(or workgroups) are launched without shared memory constrain anymore.

So my question is, what kind of change has been made to haswell that it can launch workgroup without constrain of shared memory usage?



0 Kudos
0 Replies