I am using Intel HLS tools to design an FPGA accelerator.
In my application, I have an input array of 512 elements to my component. In each iteration of a loop, there are 3 concurrent loads to that array, without any stores. I am unrolling the loop by 8, so now there are 24 concurrent loads.
By default, the compiler chooses to replicate the array in memory 12 times (2 ports per replicate, we need 24 ports in total). However, based on the access patterns, I have found that it can be optimized if we only have 3 replicates, and define different bankbits in each replicate, i.e replicate 1 of the array must have bankbits(0,1,2), replicate 2 must have bankbits(3,4,5) and replicate 3 must have bankbits(6,7,8). Stall-free banking cannot be implemented without replicating the memory in this case.
I have gone through the documentation of HLS tools but I didn't find something helpful as to how I could implement that.
What I basically want is to take an input array A and replicate it 3 times into arrays A1, A2 and A3 in local memory, where I can define separate bankbits for each.
Does anyone have any ideas on this matter?
Thank you in advance.
Hi Dimitris ,
Hope you are staying safe
you can double the number of ports in the replica if you double pump the component.
So it will give you 4 ports per replicate.
You also have the flexibility to use the hls_memory and hls_bankwidth attributes to change the number of banks and bank bits.
Please refer to section 5
Component Memories (Memory Attributes)
of the following documents and try to come up with the best possible solution you can.
Then we can analyze the reports and can work on it further.
Thanks and Regards
We do not receive any response from you to the previous question/reply/answer that I have provided.
Please post a response in the next 15 days to allow me to continue to support you.
After 15 days, this thread will be transitioned to community support.
The community users will be able to help you with your follow-up questions