OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

NUMA effects with OpenCL

Jianbin_F_
Beginner
696 Views

Hi Guys,

Recently I am working on the OpenCL and using a dual sockets machine from Intel (X5650). I wonder how I can control the NUMA effects with OpenCL? Do I have any API for it? or it can be handled by the run-time and this factor is hidden by the run-time?

Thanks,

Jianbin

0 Kudos
2 Replies
Doron_S_Intel
Employee
696 Views

Hello Jianbin,

You can try the following:

  1. Allocate memory yourself, using something like libnuma to ensure it's all allocated on a single socket.
    Make sure to align the memory to the size of the OpenCL data type you intend to use.
  2. Create memory objects using CL_MEM_USE_HOST_PTR to wrap these allocations.
  3. Use clCreateSubdevices to create sub-devices representing the different NUMA nodes. The current version of the SDK doesn't support partitioning by CL_DEVICE_AFFINITY_DOMAIN_NUMA, but you can use the Intel extension CL_DEVICE_PARTITION_BY_NAMES_INTEL to define which cores to assign to which sub-devices, yourself. Read more about it here: http://www.khronos.org/registry/cl/extensions/intel/cl_intel_device_partition_by_names.txt

That should allow you to enqueue kernels on a single socket using the appropriate sub-device ID, and you can ensure each kernel operates on memory objects allocated on physical pages from that node.

As an aside, the reason there isn't a more straightforward way to go about things is that our testing showing a relatively low return on investment - the performance impact was negligible thanks to the Intel Quick Path Interconnect technology.

If you try this and find a case where this has a significant impact, please let us know.

 

Thanks,

Doron

0 Kudos
James_R_
Beginner
696 Views

Doron Singer (Intel) wrote:

If you try this and find a case where this has a significant impact, please let us know.

Reductions!  As I reported here: http://software.intel.com/en-us/forums/topic/508377

I haven't tested it on other bandwidth bound applications, but I think it's generally applicable.  Thank you, Doron.

-James

0 Kudos
Reply