- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have generated altera_emif IP with the following parameters:
- Protocol : DDR4
- Target Device: Arria10
- Memory Clock frequency : 1200 MHz
- Clock Rate of user logic: Quarter
- User logic clock: 300 MHz
- DQ Width : 32 bits
- amm_readdata and amm_writedata : 256 bits
The above configuration summarizes to the following statements:
- FPGA Receives 64 bits at from DDR4 at 1200 MHz at every clock (32 bits in positive edge and 32 bits in negative edge)
- Avalon interface works at 300 MHz (quarter rate)
- Avalon interface sends out 256 bits data (32*8) at 300 MHz at every clock.
- Bandwidth = 1200 * 1000000 (MHz) * 2 * 32 / (10^9) = 76.8 Giga bits per second.
Is my understanding correct?Please Confirm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Your understanding on all questions 1 to 4 are correct.
One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"
Actual data transfer throughput may vary depending on following factor
- Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
- It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
- It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement
Thanks.
Regards,
dlim
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Your understanding on all questions 1 to 4 are correct.
One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"
Actual data transfer throughput may vary depending on following factor
- Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
- It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
- It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement
Thanks.
Regards,
dlim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there any AMM DMA Linux Driver Example on Host Side ? I don't find any . Thanks a lot
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!!
In addition to the above query,
I observed that DDR4 limits the burst length to 8 (BL8)
Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
Sorry, Intel FPGA doesn't have DMA linux driver example as we are just DDR4 IP memory controller solution provider rather than system level application solution provider.
For your enquiry on burst length of 8,
- Yes, one read request on burst length of 8 will transfer total of 256 bit data (32 x 8)
- But do take note this whole process happen over 4 clock cycle, each clock cycle transfer 2 times of data (rising edge + falling edge)
- Each burst only transfer 32 bits of data where 256 bits data transfer is achieved via 8 times of data transfer using only one read command
Thanks.
Regards,
dlim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Previous query :
I observed that DDR4 limits the burst length to 8 (BL8)
Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?
Further on enquiry on burst length of 8
I tried instantiating a DDR4 Controller IP for Arria 10 device and simulated the example design.
I found that amm_burstcount = 58 in the example design.
And this contradicts with the statement that the DDR4 IP constraints the burst length to 8 (Fixed BL8).
Can someone please clarify on this?
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
There are 2 sides of data transaction flow as below.
- User logic <=> DDR4 IP <=> DDR4 SDRAM
BL8 is applicable for the data transaction between DDR4 IP <=> DDR4 SDRAM which is defined by JEDEC spec.
I believed the higher burstcount is happening on example design data flow between User logic <=> DDR4 IP, right ?
User can blast a lot of data to DDR4 IP but it will be queue and process accordingly inside the DDR4 IP to be transferred to DDR4 SDRAM later with BL8.
I hope I clear your doubt. Thanks.
Regards,
dlim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Thanks , it is clear now.
Further calculating the DDR4 latency.
Time taken between raising the read request and retrieving the the first word from Memory is
Latency = CAS Latency/ Memory clock speed * (2000) nanoseconds
example: for DDR4 - 2400, Clock speed - 1200MHz , if CL = 15
Latency = (1200/15)*2000 = 25 nanoseconds
My question is :
If I request a burst count of 32 (4 *BL8) , What would be the total latency to receive the data ?
Is it , 4 (BL8) Read requests * 25 = 100 nanoseconds ?
Or , 1 Read request * 25 = 25 nanoseconds?
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
For estimated latency, you can refer to A10 EMIF user guide doc (page 418, table 394)
Thanks.
Regards,
dlim
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page