FPGA, SoC, And CPLD Boards And Kits
FPGA Evaluation and Development Kits
6194 Discussions

DDR4 Memory Access with Tartget Device - Arria10

SKGR0
Beginner
1,561 Views

I have generated altera_emif IP with the following parameters:

 

  • Protocol : DDR4
  • Target Device: Arria10
  • Memory Clock frequency : 1200 MHz
  • Clock Rate of user logic: Quarter
  • User logic clock: 300 MHz
  • DQ Width : 32 bits
  • amm_readdata and amm_writedata : 256 bits

 

The above configuration summarizes to the following statements:

  1. FPGA Receives 64 bits at from DDR4 at 1200 MHz at every clock (32 bits in positive edge and 32 bits in negative edge)
  2. Avalon interface works at 300 MHz (quarter rate)
  3. Avalon interface sends out 256 bits data (32*8) at 300 MHz at every clock.
  4. Bandwidth = 1200 * 1000000 (MHz) * 2 * 32 / (10^9) = 76.8 Giga bits per second.

 

Is my understanding correct?Please Confirm.

 

0 Kudos
1 Solution
Deshi_Intel
Moderator
1,342 Views

Hi,

 

Your understanding on all questions 1 to 4 are correct.

 

One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"

 

Actual data transfer throughput may vary depending on following factor

  1. Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
  2. It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
  3. It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement

 

Thanks.

 

Regards,

dlim

View solution in original post

8 Replies
Deshi_Intel
Moderator
1,343 Views

Hi,

 

Your understanding on all questions 1 to 4 are correct.

 

One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"

 

Actual data transfer throughput may vary depending on following factor

  1. Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
  2. It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
  3. It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement

 

Thanks.

 

Regards,

dlim

JET60200
New Contributor I
1,342 Views

Is there any AMM DMA Linux Driver Example on Host Side ? I don't find any . Thanks a lot

0 Kudos
SKGR0
Beginner
1,342 Views

Thanks!!

In addition to the above query,

I observed that DDR4 limits the burst length to 8 (BL8)

Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?

 

0 Kudos
Deshi_Intel
Moderator
1,342 Views

HI,

 

Sorry, Intel FPGA doesn't have DMA linux driver example as we are just DDR4 IP memory controller solution provider rather than system level application solution provider.

 

For your enquiry on burst length of 8,

  • Yes, one read request on burst length of 8 will transfer total of 256 bit data (32 x 8)
  • But do take note this whole process happen over 4 clock cycle, each clock cycle transfer 2 times of data (rising edge + falling edge)
  • Each burst only transfer 32 bits of data where 256 bits data transfer is achieved via 8 times of data transfer using only one read command

 

Thanks.

 

Regards,

dlim

0 Kudos
SKGR0
Beginner
1,342 Views

Previous query :

I observed that DDR4 limits the burst length to 8 (BL8)

Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?

 

Further on enquiry on burst length of 8

I tried instantiating a DDR4 Controller IP for Arria 10 device and simulated the example design.

I found that amm_burstcount = 58 in the example design.

And this contradicts with the statement that the DDR4 IP constraints the burst length to 8 (Fixed BL8).

Can someone please clarify on this?

 

Thanks in advance!

0 Kudos
Deshi_Intel
Moderator
1,342 Views

HI,

 

There are 2 sides of data transaction flow as below.

  • User logic <=> DDR4 IP <=> DDR4 SDRAM

 

BL8 is applicable for the data transaction between DDR4 IP <=> DDR4 SDRAM which is defined by JEDEC spec.

 

I believed the higher burstcount is happening on example design data flow between User logic <=> DDR4 IP, right ?

 

User can blast a lot of data to DDR4 IP but it will be queue and process accordingly inside the DDR4 IP to be transferred to DDR4 SDRAM later with BL8.

 

I hope I clear your doubt. Thanks.

 

Regards,

dlim

SKGR0
Beginner
1,342 Views

Hi

Thanks , it is clear now.

Further calculating the DDR4 latency.

Time taken between raising the read request and retrieving the the first word from Memory is

Latency = CAS Latency/ Memory clock speed * (2000) nanoseconds

 

example: for DDR4 - 2400, Clock speed - 1200MHz , if CL = 15

Latency = (1200/15)*2000 = 25 nanoseconds

 

My question is :

If I request a burst count of 32 (4 *BL8) , What would be the total latency to receive the data ?

Is it , 4 (BL8) Read requests * 25 = 100 nanoseconds ?

Or , 1 Read request * 25 = 25 nanoseconds?

 

Thanks in advance!

 

0 Kudos
Deshi_Intel
Moderator
1,342 Views

HI,

 

For estimated latency, you can refer to A10 EMIF user guide doc (page 418, table 394)

 

Thanks.

 

Regards,

dlim

 

0 Kudos
Reply