- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guys,
Recently I am working on the OpenCL and using a dual sockets machine from Intel (X5650). I wonder how I can control the NUMA effects with OpenCL? Do I have any API for it? or it can be handled by the run-time and this factor is hidden by the run-time?
Thanks,
Jianbin
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jianbin,
You can try the following:
- Allocate memory yourself, using something like libnuma to ensure it's all allocated on a single socket.
Make sure to align the memory to the size of the OpenCL data type you intend to use. - Create memory objects using CL_MEM_USE_HOST_PTR to wrap these allocations.
- Use clCreateSubdevices to create sub-devices representing the different NUMA nodes. The current version of the SDK doesn't support partitioning by CL_DEVICE_AFFINITY_DOMAIN_NUMA, but you can use the Intel extension CL_DEVICE_PARTITION_BY_NAMES_INTEL to define which cores to assign to which sub-devices, yourself. Read more about it here: http://www.khronos.org/registry/cl/extensions/intel/cl_intel_device_partition_by_names.txt
That should allow you to enqueue kernels on a single socket using the appropriate sub-device ID, and you can ensure each kernel operates on memory objects allocated on physical pages from that node.
As an aside, the reason there isn't a more straightforward way to go about things is that our testing showing a relatively low return on investment - the performance impact was negligible thanks to the Intel Quick Path Interconnect technology.
If you try this and find a case where this has a significant impact, please let us know.
Thanks,
Doron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Doron Singer (Intel) wrote:
If you try this and find a case where this has a significant impact, please let us know.
Reductions! As I reported here: http://software.intel.com/en-us/forums/topic/508377
I haven't tested it on other bandwidth bound applications, but I think it's generally applicable. Thank you, Doron.
-James
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page