OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

fully utilizing a multiprocessor system ?

qinq
Beginner
195 Views
Hello

i have a dual Xeon system(each is quad core), i can see that using Intel OpenCL i can only use a single processor at a time, and though querying functions indicate 8 compute units, actually only a max. of 4 can be used per kernel invocation

any suggestions how to use the two processors simultaneously for the same kernel ?
0 Kudos
10 Replies
Doron_S_Intel
Employee
195 Views
Hello,

Could you elaborate a bit? The behaviour you're describing isn't what's expected on such machines. How many device IDs are returned by the call to clGetDeviceIDs with the Intel platform ID? (expected one) How do you measure utilization? (the best way is probably to use Intel GPA)

Thanks,
Doron Singer
qinq
Beginner
195 Views
i'm very sorry for late response (forum doesn't send alerting mail for replies)

i get 1 device with 8 compute units ( on a windows 7 x64 machine)
i measure utilization through windows task manager (maximum cpu usage is 50%, i.e. 4 processors used)
also there absoloutelty no performance change between when i use setProcesserAffinity using the first 4 processors and any greate number (here 5-8)
Doron_S_Intel
Employee
195 Views
Thanks for your reply. In order for us to try and reproduce this issue, could you provide as accurate a spec of your HW setup as possible? Windows 7 version (enterprise/ultimate/etc), which CPU, etc?

Another thing you might want to try (though it requires a bit of work) is to use the device fission extension (clCreateSubdevicesEXT) to create two device IDs identifying the two NUMA nodes in your machine, and submitting jobs to both of them (via two command queues) simultaneously -- does this break the 50% utilization barrier?

Thanks,
Doron Singer
qinq
Beginner
195 Views
Windows 7 Ultimate x64
Dual Intel Xeon E5500
12GB Ram
(a DELL Precison system)

i also tried testing on another system, with 4 Xeon processors, initially it had a windows server 2003, the problem was that Intel OpenCL SDK don't work on it, ironically the system also refuses installing Windows 7 x64 ( a bug that microsoft knows but didn't solve yet), lastly i tried a 32bit Windows 7 Professional, it worked but recognized only 2 processors ( a limitation imposed by microsoft), but also suffered the same 50% usage problem but on a 32B system this time.
another question, is there any plans for intel to support windows server ?

i will try the fission extension as fast as possible (though even if it worked, i need to use them as a single device not two)
Uri_L_Intel
Employee
195 Views

Hello,
Our SDK designed to work on any Windows version based on NT 6.0 and above. That includes Windows Vista, Windows 7 and Windows Server 2008.
Windows Server 2003 is based on NT version 5.2 (same as Windows XP) which is not supported by this version of the SDK.
For more information on our supported platforms, please visit theproduct's release notes page on our web-site: http://software.intel.com/en-us/articles/opencl-release-notes/

Thanks,
Uri Levy

Doron_S_Intel
Employee
195 Views
Hello again,

So far we've been unable to reproduce this issue, admittedly on somewhat different hardware setups. Could you try and provide us with a reproduction so we can ensure we're trying the right thing to reproduce the issue?

Thanks,
Doron Singer
loriordan
Beginner
195 Views
Hi, just a thought on the issue: Does your specific Xeon CPU support Hyperthreading (HT)? It may be that the SDK is not using the HT ability of the CPU and running on the physical cores only (of which there may be only 4). As far as I understand HT should report double the number of physical cores on the system. If I am wrong, then disregard this comment.
Doron_S_Intel
Employee
195 Views
Hello Lee,

It's a good thought, but the SDK is implemented to take advantage of all Intel CPU features, including Hyperthreading technology. Full utilization is expected on supported processors, some of which support Hyperthreading.

Thanks,
Doron
Maxim_S_Intel
Employee
195 Views
Hi, having utilization of 50% might be related to the fact tat your app is doing some other job beyound computing with OCL. E.g. reading data from files, rendering with DX, etc. In those and many other cases an app can be waiting on some OS/drivers sync routine for significant portion of the time.

With the conventional "Windows Task Manager" that you are currently using for checking the CPUs utilization, you can check the amount of time which your app spend deep in OS via "Options->Show Kernel Times".

Also if your tasks are really lightweight (e.g. utilize CPUs for 100% but just for a small period), then resolution of the "Windows Task Manager" might be insufficient to capture the load distribution over time.
The best way is to use OCL perf counters to collect the time for OCL kernel and to double-check whether it is some significant portion of the total wall-clock time.
Refer to http://software.intel.com/en-us/articles/performance-debugging-intro/

I would also suggest to increase the OCL load by proccesing more data etc to check the scaling.
qinq
Beginner
195 Views
much thanks for all the replies
problem was magically solved with SDK 1.1, and usage went normally to 99%
Reply