OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1722 Discussions

Achieving peak bandwidth on multi-socket systems

James_R_
Beginner
726 Views

Let's say each CPU socket has 43 GB/s of bandwidth through its four memory channels.  Let's say I have a dual socket system.  A reduction operation should achieve performance of 86 GB/s, but it doesn't.  It will still only achieve up to 43 GB/s.  Why is that and is there anything in Intel's OpenCL implementation for CPUs that can fix that?

How could I fix that outside of OpenCL?

0 Kudos
2 Replies
Arik_N_Intel
Employee
726 Views

Dear James,

Can you please check if the following helps?

http://software.intel.com/forums/topic/497429

Thanks,

Arik

0 Kudos
James_R_
Beginner
726 Views

Arik Narkis (Intel) wrote:

Dear James,

Can you please check if the following helps?

http://software.intel.com/forums/topic/497429

Thanks,

Arik

Arik,

This method improves bandwidth performance substantially (by >1.8x).  It actually achieves 90+% of the platform bandwidth for my code rather than the ~50% of peak bandwidth I had the other day.  I'm a bit surprised it actually worked.  I'd like to test it on a four or eight socket system now, but I'll have to find one.

Knowing this now makes life more difficult for OpenCL developers with bandwidth-bound kernels on multi-socket nodes.  Thanks a lot, Arik!

-James

0 Kudos
Reply