Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
12 Views

clenqueueTask uses how many cores??

hello guys,
I am using opencl 1.1 with intel xeon 24 core processor..
When i enqueue a task for execution using clenqueueTask() is it supposed to use a single core for execution???
Thanks
0 Kudos
5 Replies
Highlighted
Employee
12 Views

Hi,

Yes, it will run in a single thread.

According to the OpenCL spec. a task has a global size of (1,1,1), that means single execution item.

Thanks,
Evgeny

0 Kudos
Highlighted
Beginner
12 Views

Hello,
thanks for your reply.... I have another question..
I do device fission say 4 core machine into 4 sub-devices each with 1 core.
I create different queues according to the sub-devices.
for example
is it possible to use clenqueueReadBuffer() for a cl_mem buffer in the device using the queue of sub-device 1 or 2 or 3 and 4 irrespective of where it was executed..
or it is possible to have a global queue for data transfers to/from the device and separate queues for the sub-devices to executed tasks...
will this work....
thanks a lot...
0 Kudos
Highlighted
Employee
12 Views

Theoretically it should.
Be aware that as it's stated in the Release Notes the device fission is experimental and you might have inconsistency in your results.

We would be glad to hear your feedback about experience with this feature.

0 Kudos
Highlighted
Beginner
12 Views

Hi,
If you you create different queues for each sub-device you will have to copy them memory to each "device" giving you seperate memory on each device. Also the clenqueueReadbuffer() takes the queue, program and context so for all intents and purposes they are seperate memory. (right?)
To be fair the spec forcl_ext_device_fission extension is very low on information on what I see as a complex subject.
0 Kudos
Highlighted
Employee
12 Views

Thanks for the good question.

In Intel implementation sub-devices share memory resources of the parent device, the exception is NUMA aware systems wherein implementation may try to locate memory objects on the appropriate NUMA node.

Sub-devices are using separate execution units, in the CPU device those are different HW threads.

According to the spec programs shouldbe compiled separately for each sub-device; however, implementation may have single program for all sub-devices of a parent device.

Evgeny

0 Kudos