Solved: DDR4 Memory Access with Tartget Device - Arria10

SKGR0 · ‎11-07-2019

I have generated altera_emif IP with the following parameters:

Protocol : DDR4
Target Device: Arria10
Memory Clock frequency : 1200 MHz
Clock Rate of user logic: Quarter
User logic clock: 300 MHz
DQ Width : 32 bits
amm_readdata and amm_writedata : 256 bits

The above configuration summarizes to the following statements:

FPGA Receives 64 bits at from DDR4 at 1200 MHz at every clock (32 bits in positive edge and 32 bits in negative edge)
Avalon interface works at 300 MHz (quarter rate)
Avalon interface sends out 256 bits data (32*8) at 300 MHz at every clock.
Bandwidth = 1200 * 1000000 (MHz) * 2 * 32 / (10^9) = 76.8 Giga bits per second.

Is my understanding correct?Please Confirm.

Deshi_Intel · ‎11-08-2019

Hi,

Your understanding on all questions 1 to 4 are correct.

One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"

Actual data transfer throughput may vary depending on following factor

Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement

Thanks.

Regards,

dlim

View solution in original post

Deshi_Intel · ‎11-08-2019

Hi,

Your understanding on all questions 1 to 4 are correct.

One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"

Actual data transfer throughput may vary depending on following factor

Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement

Thanks.

Regards,

dlim

JET60200 · ‎11-08-2019

Is there any AMM DMA Linux Driver Example on Host Side ? I don't find any . Thanks a lot

SKGR0 · ‎11-08-2019

Thanks!!

In addition to the above query,

I observed that DDR4 limits the burst length to 8 (BL8)

Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?

Deshi_Intel · ‎11-08-2019

HI,

Sorry, Intel FPGA doesn't have DMA linux driver example as we are just DDR4 IP memory controller solution provider rather than system level application solution provider.

For your enquiry on burst length of 8,

Yes, one read request on burst length of 8 will transfer total of 256 bit data (32 x 8)
But do take note this whole process happen over 4 clock cycle, each clock cycle transfer 2 times of data (rising edge + falling edge)
Each burst only transfer 32 bits of data where 256 bits data transfer is achieved via 8 times of data transfer using only one read command

Thanks.

Regards,

dlim

SKGR0 · ‎11-14-2019

Previous query :

I observed that DDR4 limits the burst length to 8 (BL8)

Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?

Further on enquiry on burst length of 8

I tried instantiating a DDR4 Controller IP for Arria 10 device and simulated the example design.

I found that amm_burstcount = 58 in the example design.

And this contradicts with the statement that the DDR4 IP constraints the burst length to 8 (Fixed BL8).

Can someone please clarify on this?

Thanks in advance!

Deshi_Intel · ‎11-18-2019

HI,

There are 2 sides of data transaction flow as below.

User logic <=> DDR4 IP <=> DDR4 SDRAM

BL8 is applicable for the data transaction between DDR4 IP <=> DDR4 SDRAM which is defined by JEDEC spec.

I believed the higher burstcount is happening on example design data flow between User logic <=> DDR4 IP, right ?

User can blast a lot of data to DDR4 IP but it will be queue and process accordingly inside the DDR4 IP to be transferred to DDR4 SDRAM later with BL8.

I hope I clear your doubt. Thanks.

Regards,

dlim

SKGR0 · ‎11-18-2019

Hi

Thanks , it is clear now.

Further calculating the DDR4 latency.

Time taken between raising the read request and retrieving the the first word from Memory is

Latency = CAS Latency/ Memory clock speed * (2000) nanoseconds

example: for DDR4 - 2400, Clock speed - 1200MHz , if CL = 15

Latency = (1200/15)*2000 = 25 nanoseconds

My question is :

If I request a burst count of 32 (4 *BL8) , What would be the total latency to receive the data ?

Is it , 4 (BL8) Read requests * 25 = 100 nanoseconds ?

Or , 1 Read request * 25 = 25 nanoseconds?

Thanks in advance!

Deshi_Intel · ‎11-19-2019

HI,

For estimated latency, you can refer to A10 EMIF user guide doc (page 418, table 394)

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-20115.pdf

Thanks.

Regards,

dlim