OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Algnment question

ABoxe
Beginner
420 Views

    In one of the sample apps, there is an align_malloc method.

Inside, there is this assert:

      assert(size/sizeof(void*)*sizeof(void*) == size);

Why must the memory size be divisible by sizeof(void*)  ?

Thanks,

Aaron

0 Kudos
4 Replies
Raghupathi_M_Intel
420 Views

From the code snippet looks like the intention is to check and make sure "size" is aligned to the sizeof(void *).

Thanks,
Raghu

0 Kudos
ABoxe
Beginner
420 Views

Thanks, Raghu. This method is called on the size of the host memory buffer before calling

clCreateBuffer.  

So, my question is: does the host memory have to be alligned to sizeof(void*)

before passing it into clCreateBuffer ?  I have a 64 bit system with sizeof(void*) equal to 8.

Can I pass a buffer of size 14 into clCreateBuffer?  Is there a penalty if I do?

Thanks,

Aaron

0 Kudos
Raghupathi_M_Intel
420 Views

In this case it looks like it is a requirement (someone from the Xeon PHI team can correct me if I am wrong), but most of the times alignment is needed for performance reasons. You will get better performance if the data is aligned to, say, a cache line for example. On HD graphics you will get better performance if the buffer is aligned to a cache line and best performance if its aligned to a page boundary.

You can find it the hard way. If your buffer is not aligned to sizeof(void *) and you get a crash in your application then you have to make sure this requirement is met. Otherwise it is for performance reasons.

 

0 Kudos
Dmitry_K_Intel
Employee
420 Views

From the Xeon Phi prospective you will get acceptable performance when buffers are aligned to 64 bytes. To get the best possible performance please align your buffers to 4K (standard x86 memory page). The same is right also for sub-buffers and Read/Write/Copy operations - if offsets are aligned properly the data transfer bandwidth is much higher.

.

0 Kudos
Reply