Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
425 Discussions

Question about sequence load form global memory and parallel load from global memory ????

New Contributor I

hi HRZ:

I need some help !

In my project (single kernel ), i am trying to load data form global memory by sequence(that is serial sequence) to local memory (inner of FPGA),.i declaration many local data array ,but in the report i found that there is no RAMs occupation ,i doubt it haven't generate any local memory by RAMs, but in the system viewer, it shows local memory , i don't understand here. the report and piece of code in the below photo:



And i change the code, load data form the global memory in parallel to local memory ,(the other part are the same )[in the code, the x is global memory,x_buf_x is local memory]par1.png


in the parallel mode, it generate local buffer by use of RAMs in FPGA , and each local buffer use 15 RAMs, here is OK.


my question is :

  1. Does the sequence load generate any local memory??? use what kind of resource ? why not RAMs?

2 .how can i load data to local buffer in sequence efficiently, difference piece data to difference local buffer ?

3 .why the parallel load occupied RAMs, but the sequence load not utilized any RAMs??? (both declaration are the same in code, and the kernel can run correctly , but kernel time is different greatly)


Thanks for your help!




0 Kudos
4 Replies


I will go through your thread description and let you know the feedback soon.



Valued Contributor II

You should look at the line where the buffer is declared to see its area usage, not where it is referenced. The resource usage for the lines you are highlighting show the area usage of the operation in those lines, not the buffers involved. The 15 Block RAMs used in the second case is very likely because the compiler is creating a Cached Load-Store Unit for that line which uses Block RAMs, while it is not being generated for the first case.

New Contributor I

Thanks for your reply!

Yes in the second case, it create cached LSU use RAMs, it parallel load from the GM(global memory). But i want load it in serial (first case), that is 1....256 to the first local buffer(use RAMs), 257...512 to the second buffer(use RAMs)..............., my first case seems failed! how can i do that ?????

Valued Contributor II

You do not need to concern yourself whether the compiler uses Block RAMs or not. The compiler might optimize your memory accesses in a way that they use registers instead of Block RAMs, which is better since it will use less resources. As long as the global memory accesses look the way you expect them to in the report, it will be fine.