Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17261 討論

Local memory in one work group

Altera_Forum
榮譽貢獻者 II
1,381 檢視

Say we define two __local memory A[1024] and B[1024] in the kernel function and the data-flow is DDR--> A --> B -->DDR, my question is that do they combine their read/write ports to generate common local memory system or preserve the individual read/write ports? I am sure that the common local memory system will greatly degrade the performance, even worse when many __local variables are defined.  

 

According to the kernel log: " .. kernel number of local memory banks : 1 1 1 1 1 1 1 ", Does it mean 7 banks exist in my design? 

 

The memory utilization is much more than the __local memory defined in the OpenCL code, do the delay operations (with wider datapath, e.g. 32) require a lot of Block RAMs when the OpenCL code has relatively complex logic? BTY, I have roughly read the *.v generated by the AOC, and a lot of FIFO (with *.IMPL = "ram") are generated for delay operations. Does it mean we should avoid complex logic by generating multiple kernels?
0 積分
3 回應
Altera_Forum
榮譽貢獻者 II
530 檢視

There are a lot of answers to that question because it depends on the kernel. In general I have these recommendations: 

 

1) Use the __restrict keyword any time it's safe to do so (i.e. when you don't need to worry about pointers aliasing the same data) 

2) Make sure you don't use the same pointer to dereference multiple memory spaces (i.e. in this case don't have a pointer that has to access both A and B, use multiple pointers instead) 

3) Use __private temporary storage when possible 

 

I can't really comment on whether it makes sense for the algorithm to be split into multiple kernels. In general I try to keep everything combined into a single kernel since it's normally more effiicient to have a single pipeline than a bunch of independent pipelines in terms of performance and area.
Altera_Forum
榮譽貢獻者 II
530 檢視

Nice recommendations. I do not think that keeping everthing combined into a single kernel is a good idea, since the resource requirement increases exponentially and the clock frequency drops quickly while the control logic is complex.

Altera_Forum
榮譽貢獻者 II
530 檢視

It depends on the kernel(s) really. Sometimes what you said is indeed true other times it's not. Again these are just general rules of thumb based on what I've seen across many designs.

回覆