How the number of memory banks is being on SIMDs size?

MAsla5 · ‎02-16-2020

Hi,

My question is, How memory banks are being assigned on SIMDs factor, why it is assigning all memory banks to A only? and only 1 to B? I am very new to this field and struggling to get into these things.

I read about it in Intel documentation that memory banks works only on local dimensions by default , So in my case, i have two local memory buffers.

Asub and Bsub, ..

__local Asub[block-size][block_size]

__local Bsub[block-size][block_size]

and my work groups sizes are distributed in this way

__attribute__((reqd_work_group_size(block_size, block_size, 1)))

So, as per my understading, the flow will be like this, Asub local buffer is considered as first dimension of reqd_work_group_size and Bsub is second dimension, so keeping in mind the "Intel documentation words", A sub is lowest dimension so that's why all memory banks are being assigned to Asub local buffer..

Kindly correct me if i am wrong, and elaborate it in more detail if anybody can.

So, the 2nd question comes in mind, if it is the exactly flow that i wrote above, then behavior would change and all memory banks must be assigned to Bsub buffer, if i swap both Asub and Bsub local buffers

Thanks!!!

HRZ · ‎02-18-2020

The compile automatically optimizes the number of banks based on the number of reads from and writes to the local buffers in your code. Writes are connected to all banks, while each read is connected to one bank. Hence, for example, if you have 1 write to and 4 reads from one buffer, without double-pumping, the local memory buffer will have four banks. Note that if accesses to a local buffer are consecutive, then the accesses will be merged into a larger access and will not increase the number of banks for that buffer; this is usually the case when you use SIMD or loop unrolling. In your case, the difference between the number of banks for the two buffers in your code is a direct result of the number of accesses to/from the buffers.

MEIYAN_L_Intel · ‎02-19-2020

Hi,

For more information to optimized the implementation of memory block, you may need to refer to Chapter 8.4 and Chapter 8.5 as link below:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf

Thanks