Hi Guys,
Recently I am working on the OpenCL and using a dual sockets machine from Intel (X5650). I wonder how I can control the NUMA effects with OpenCL? Do I have any API for it? or it can be handled by the run-time and this factor is hidden by the run-time?
Thanks,
Jianbin
Link Copied
Hello Jianbin,
You can try the following:
That should allow you to enqueue kernels on a single socket using the appropriate sub-device ID, and you can ensure each kernel operates on memory objects allocated on physical pages from that node.
As an aside, the reason there isn't a more straightforward way to go about things is that our testing showing a relatively low return on investment - the performance impact was negligible thanks to the Intel Quick Path Interconnect technology.
If you try this and find a case where this has a significant impact, please let us know.
Thanks,
Doron
Doron Singer (Intel) wrote:
If you try this and find a case where this has a significant impact, please let us know.
Reductions! As I reported here: http://software.intel.com/en-us/forums/topic/508377
I haven't tested it on other bandwidth bound applications, but I think it's generally applicable. Thank you, Doron.
-James
For more complete information about compiler optimizations, see our Optimization Notice.