- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I try to know the number of parallel instruction involved in a OpenCL kernel regarding the kernel parameters... For instance, with 4-core Xeon, I launch 8 workgroup of 32 threads. (1 workgroup per HW thread). We have so a parallelism degree of = 8 x parallelism degree of workgroup..
What is the parallelism degree of a workgroup? I know that the code is scalarized and vectorized to fit with the xmm registers width.. And we must consider pipeline mechanism..
Any idea?
Regards, Michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Michael for your question.
You are definately right. Each workgroup is implemented as a loop over the work-items. Then the loop is unrolled to the "float" SIMD width of the CPU. So double precision operations would need 2 SIMD egisters for each argument. This gives paralelism of 8 on today's CPUs (4 for doubles).
In addition, each CPU core can issue multiple different instructions rep cycle. The level of instuction level parallelism is dependent on the CPU model (generation) on the combination of instruction ready to execution at any given cycle and on the availeble CPU resources at that clock cycle. Hoever, the OpenCL compiler wouldn't expose such parallelism by additional loop untoling.
Arik
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page