OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

parallelism degree?


Dear all,

 I try to know the number of parallel instruction involved in a OpenCL kernel regarding the kernel parameters... For instance, with 4-core Xeon, I launch 8 workgroup of 32 threads. (1 workgroup per HW thread).  We have so a parallelism degree of = 8 x parallelism degree of workgroup..

What is the parallelism degree of a workgroup? I know that the code is scalarized and vectorized to fit with the xmm registers width.. And we must consider pipeline mechanism..

Any idea?

Regards, Michael

0 Kudos
1 Reply

Thanks Michael for your question.

You are definately right. Each workgroup is implemented as a loop over the work-items. Then the loop is unrolled to the "float" SIMD width of the CPU. So double precision operations would need 2 SIMD egisters for each argument. This gives paralelism of 8 on today's CPUs (4 for doubles).

In addition, each CPU core can issue multiple different instructions rep cycle. The level of instuction level parallelism is dependent on the CPU model (generation) on the combination of instruction ready to execution at any given cycle and on the availeble CPU resources at that clock cycle. Hoever, the OpenCL compiler wouldn't expose such parallelism by additional loop untoling.


0 Kudos