FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6534 Discussions

QSYS amm interconnect delay with EMIF (Arria 10)

rled64
New Contributor I
594 Views

Hello community,

 

Using Terasic HAN Pilot Plateform (Arria 10), I recently found interesting behavior when using qsys or not for read/write operations to DDR4 external memory through EMIF.

Basically I have my 2 components : the Intel EMIF IP configured for DDR4, and my custom avalon memory mapped controller which is addressing read/writes to EMIF.

When I have a simple top level design with these 2 modules connected, the read and writes (during a burst) works fine and there is no waitrequest. I can basically reach the theoretical datarate during the burst which is here : 266 MHz x 256 bits (on fpga logic side) which corresponds to 2133MT/s x 32 bits, meaning 66Gbps.

 

I now did the same thing by using plateform designer qsys : I add these to modules, connect them and export necessary in/outs. I instantiate the qsys module in my top level design, make connections... and in this case I have a datarate 8x slower than previously ! Actually there is a waitrequest_n (also called ready by EMIF amm interface) which is high for 8 clock cycles at each read/write within a burst. So to perform a burst of 16 writes, this doesn't take 16 clock cycles, but 144 clock cycles !

Why would the delay occur when I connect my modules through qsys instead of directly in my top level ? I see that qsys module introduces amm_interconnect bridges, I suppose this might be related, but cannot understand why it gives such delay, especially I don't have anything else connected so there cannot be arbitration delay.

I would be glad to have your thoughts about this !

Thanks

Labels (1)
0 Kudos
9 Replies
sstrell
Honored Contributor III
550 Views

That is odd.  In PD, hover your cursor over the connection dots between the controller and EMIF IP to see if there is additional conversion logic being inserted due to some mismatch between the two sides.  The tooltip will indicate as such and may clue you in to why this is happening.

You could also try adding Signal Tap to the design and tap into the interconnect to see what may be happening.

AdzimZM_Intel
Employee
503 Views

Hello,


Can you provide some snapshots of the signals been delayed by the waitrequest in this thread?


Do you observe any different between both designs?


Does the delay only observe when running the transaction in burst mode? or can also occur in single read and write transaction?


Which Quartus version that you used?


Is there any timing issue reported in the design?


Regards,

Adzim


rled64
New Contributor I
485 Views

Thank you @sstrell and @AdzimZM_Intel  for quick replies,

 

I didn't know we could quick visualize interconnect infos on plateform designer nodes ! 
Indeed I had a CDC bridge between my amm controller and EMIF amm agent because quartus thought it was different. Why is that, is because in my system view I first exported the emif_usr_clk, in order to access it in my toplevel design, then added an input clock bridge to take it back and bring it to my controller (master_amm_burst_0). It's indeed more logical to connect it directly inside qsys and export through an output clock bridge for other user stuff. Here in the pictures are the following previous and new connections which exluded the need of an interconnect adapter :

Before:

rled64_1-1730985636362.png

After:

rled64_0-1730985595266.png

 

I have no more waitrequest low for each write and I am almost at the theoretical datarate. I say almost, because I noticed that I have now another issue which appears obviously when I perform long writes.
For instance on 20 bursts with burstlength of 127, writing 256bit databus , we see that the "ready" signal (meaning waitrequest_n) is going more or less every 2090 clock cycles (based of emif_usr_clk) and remains low for 80-100 cycles. Here is the signaltap of signals during a write.

 

rled64_2-1730986327401.png

 

Mostly you can see master_amm_address incrementing each time a new burst of 127 is started, I perform the 20 bursts, and you can notice the master_amm_ready is low and stalls the master_amm_writedata and write_count for 80-100 cycles.

Do you have an idea where this behavior could be coming from ? Indeed I have timing issue :

rled64_3-1730986590803.png

 

But I'm not really comfortable with timing constraints ! If you need more information, don't hesitate. If this can be relevant, here is also the amm host parameters of my custom component amm controller in the component editor :

rled64_4-1730987064076.png

I only matched the parameters with amm EMIF interface, meaning "Address units" as WORDS, and "Max. pending read transactions" as 64, not anything else.

Regards,

 

 

 

 

 

0 Kudos
AdzimZM_Intel
Employee
452 Views

Hello,


To view the system interconnect, you can go to Platform Designer -> System -> Show System with Platform Designer Interconnect.

Then you can add the interconnect signal into SignalTap to see the data.


I think you can check if there is any warning in the Analysis & Synthesis compilation report related to address range of the memory transfer.


Try to solve the timing issue first, it can impact the memory functionality.

Check the DDR Report for the summary of the memory timing.

Also check for the path the timing is not meet. from which modules to which.

I expect this timing may occur in core timing path.


Regards,

Adzim


0 Kudos
rled64
New Contributor I
432 Views

Hello,

 

I found that the waitrequest every 2080 cycle is related with the refresh interval "tREFI" of 7,8 µs which exactly corresponds with 2080 cycles at 266 MHz. So I believe everything works well, I just have to take this into account for estimating maximum DDR datarate which is about 65Gbps instead of 68Gbps, which is fine.

 

For Timing errors, I don't have any setup error, but I have 4 hold errors for instance with this path : 

 

-0.346 u0|emif_0|emif_0|arch|arch_inst|io_tiles_wrap_inst|io_tiles_inst|tile_gen[0].tile_ctrl_inst|pa_core_clk_out[0] auto_fab_0|alt_sld_fab_0|alt_sld_fab_0|auto_signaltap_auto_signaltap_0|sld_signaltap_inst|acq_data_in_reg[102] u0|emif_0|emif_0_core_usr_clk u0|emif_0|emif_0_core_usr_clk 0.000 3.609 3.688 Slow 900mV 0C Model


Also I have 1 single timing error in the "Report DDR" of the timing analyser, for all 4 models (slow/fast/0C/100C), which is on the "Read Capture" signal of EMIF :

rled64_0-1731071791706.png

 

Actually the design works but I don't know what to do with these errors, do you believe it is relevant to fix them ? Maybe this is only due to signaltap logic.

Thanks

 

0 Kudos
sstrell
Honored Contributor III
412 Views

Yes, remove Signal Tap and see if you are still having timing issues.

0 Kudos
AdzimZM_Intel
Employee
328 Views

Hi


Do you still have any timing issue after you remove the Signal Tap?


Regards,

Adzim


0 Kudos
rled64
New Contributor I
248 Views

Hi Adzim,

 

Sorry for late response, in fact I could resolve the timing issue while keeping signaltap, this was due to the trigger clock which was set to "DDR4A_REF_CLK" (reference clock input for EMIF) instead of the emif_usr_clk of EMIF. Since these have both the same frequency, I have troubles understanding why using the dedicated ref clk in signaltap generated timing errors ? If you have any insight about this, you're welcome.


I will close the topic after this, thanks again for help !

0 Kudos
AdzimZM_Intel
Employee
214 Views

I’m glad that your question has been addressed, I now transition this thread to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you.


0 Kudos
Reply