clenqueueTask uses how many cores??

evk8888 · ‎02-07-2012

hello guys,

I am using opencl 1.1 with intel xeon 24 core processor..

When i enqueue a task for execution using clenqueueTask() is it supposed to use a single core for execution???

Thanks

Evgeny_F_Intel · ‎02-07-2012

Hi,

Yes, it will run in a single thread.

According to the OpenCL spec. a task has a global size of (1,1,1), that means single execution item.

Thanks,
Evgeny

evk8888 · ‎02-07-2012

Hello,

thanks for your reply.... I have another question..

I do device fission say 4 core machine into 4 sub-devices each with 1 core.

I create different queues according to the sub-devices.

for example

is it possible to use clenqueueReadBuffer() for a cl_mem buffer in the device using the queue of sub-device 1 or 2 or 3 and 4 irrespective of where it was executed..

or it is possible to have a global queue for data transfers to/from the device and separate queues for the sub-devices to executed tasks...

will this work....

thanks a lot...

Evgeny_F_Intel · ‎02-07-2012

Theoretically it should.
Be aware that as it's stated in the Release Notes the device fission is experimental and you might have inconsistency in your results.

We would be glad to hear your feedback about experience with this feature.

Jim_Vaughn · ‎02-07-2012

Hi,

If you you create different queues for each sub-device you will have to copy them memory to each "device" giving you seperate memory on each device. Also the clenqueueReadbuffer() takes the queue, program and context so for all intents and purposes they are seperate memory. (right?)

To be fair the spec forcl_ext_device_fission extension is very low on information on what I see as a complex subject.

Evgeny_F_Intel · ‎02-07-2012

Thanks for the good question.

In Intel implementation sub-devices share memory resources of the parent device, the exception is NUMA aware systems wherein implementation may try to locate memory objects on the appropriate NUMA node.

Sub-devices are using separate execution units, in the CPU device those are different HW threads.

According to the spec programs shouldbe compiled separately for each sub-device; however, implementation may have single program for all sub-devices of a parent device.

Evgeny