Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20641 Discussions

Avalon to AXI implementation

MichaelB
New Contributor I
4,107 Views

Hi,

 

currently I'm thinking to implement an own Avalon <-> AXI4 (MM) adapter and not using the QSYS autogenerated adapter.

 

Currently we are using an AXI4 DMA which will stream data into the DDR4.

Because the DDR4 is using an Avalon interface I already tried the autogenerated converter which is too slow to support the data rates.

Even with pending transactions set to 64 & burst size 16 we are not able to achieve data rate > 3 Gbps (AXI has burst size 16, too). 

DDR4 (1600 MHz) Avalon running with 200 MHz @ 512b and DMA with 160 MHz @ 256b.

Due the DMA on the 160 MHz got overflows I can be sure the transaction is the issue (I did not expect this because I have a higher frequency and double data width on receiving side..)

 

 

We had the same issues for AXI DMA <-> AXI HBM2 as well.

Here we implemented our own AXI <-> AXI connection which was much better and supports our required data rates much better than the autogenerated AXI converter (verified in simulation & on FPGA).

Regarding this we already had several debug sessions with Premier support - final result was to NOT use the autogenerated adapter and use our own...

 

Again in DDR4 we are facing the same throughput limitations (AXI <-> Avalon is the limitation).

 

Could you give me advices to implement this conversion?

Are there any data sheets which already describe the adapter autogenerated by QSYS?

 

Furthermore I saw I can edit the maximum pending read transactions on the DDR4 EMIF core & on Avalon Clock Crossing Bridges but not the maximum pending write transactions. Is there a reason why I cannot edit these parameters in those IP cores?

 

Kind regards,

 Michael

0 Kudos
14 Replies
RichardTanSY_Intel
4,029 Views

Hi @MichaelB 

 

Sorry for the delay in response.

You may checkout the User Guide below for more information on the adapter autogenerated by Platform Designer. 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qpp-platform-designer.pdf

 

Could you share a screenshot of the DDR4 EMIF core & Avalon Clock Crossing Bridges that shown you can edit the maximum pending read transactions but not the maximum pending write transactions?

 

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos. 

 

0 Kudos
MichaelB
New Contributor I
4,018 Views

Hi Richard,

 

thanks for your reply!

 

In the EMIF DDR4 & Avalon CCB settings I'd like to increase pending writes (CCB Avalon_M will write data to Avalon_S of DDR4):

 

AXI_M (write) -> Avalon CCB -> DDR4

 

AXI Burst length = 16 (data width = 256) & Avalon Burst length = 8 (data width = 512) based on my calculation:

16*256 = 8*512

 

Will the interconnect resolve data width conversion and align the bursts?

 

Screenshots of CCB & EMIF:

EMIF DDR4 parametersEMIF DDR4 parametersEMIF DDR4 AVM settingsEMIF DDR4 AVM settingsAvalon CCB AVM settingsAvalon CCB AVM settingsAvalon CCB parametersAvalon CCB parameters

 

Let me know if you need further information!

Best regards,

 Michael

0 Kudos
RichardTanSY_Intel
3,995 Views

Hi @MichaelB 

 

From what I found, the DDR4 EMIF core & Avalon MM Clock Crossing Bridges does not seem to support maximum pending write transactions. The interface must have both response and writeresponsevalid signals which these IP does not have. You may create a custom component though by adding the respective signals. fyi, the maximum pending read transactions need readdatavalid signal. 

 

Will the interconnect resolve data width conversion and align the bursts?

Make sure there is no error in the system message and the platform designer should take care most of the interconnect between the interface. 

You may check the chapter 5.1. Memory-Mapped Interfaces for further details: 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qpp-platform-designer.pdf#page=208

 

If you have further question on EMIF, I would recommend you to open a new forum case on EMIF related questions. As I am not an expert in EMIF unfortunately. 

 

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos. 

0 Kudos
MichaelB
New Contributor I
3,989 Views

Hi Richard,

 

yes, I recognised this, too, that here are some signals missing (EMIF core & Avalon CCB).

  • Is there an option in the Avalon CCB & EMIF core to enable those?
  • How can I create a custom component for an standard IP? Would this be a custom component instantiating the EMIF core?

 

I tried to edit the interface of those IP cores in the component section in QSYS but I cannot add further signals.

I already read through the EMIF user guide (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-10/ug-s10-emi.pdf) but I could not found any setting to enable/disable pending write transactions.

 

Currently I am using the autogenerated interconnect from QSYS to resolve 256b AXI (BL = 16) to 512b Avalon (BL = 8).

Here I did not see any errors in the QSYS only the hint that an Avalon adapter will be inserted between AXI <-> Avalon.

 

Are there any special settings of the mm_interconnect I have to configure to do the bus conversion?

From the documentation of the Platform Designer I assumed this will be done by mm_interconnect automatically.

 

Do you have any reference design (QSYS) where a AXI <-> Avalon connection with bitwidth conversion + burst conversion is done?

That would be helpful to understand the settings on both sides to align them for the best throughput performance.

 

Kind regards,

  Michael

0 Kudos
RichardTanSY_Intel
3,980 Views

Hi @MichaelB

 

Is there an option in the Avalon CCB & EMIF core to enable those?

Unfortunately I do not see a way to enable those. 

 

How can I create a custom component for an standard IP? Would this be a custom component instantiating the EMIF core?

You may launch the Component Editor by double-clicking New Component at the top of the IP Catalog or by selecting New Component from the File menu.

You may checkout the training video below.  Chapter 9. Creating custom component. 

https://www.intel.com/content/www/us/en/programmable/support/training/course/oqsys3000.html

 

Are there any special settings of the mm_interconnect I have to configure to do the bus conversion?

You do not need to configure anything. As you mentioned, this will be done by mm_interconnect automatically.

https://www.youtube.com/watch?v=LdD2B1x-5vo

 

Do you have any reference design (QSYS) where a AXI <-> Avalon connection with bitwidth conversion + burst conversion is done?

This is the closest design example that I found with AXI and Avalon connection. 

https://www.intel.com/content/altera-www/global/en_us/index/support/support-resources/design-examples/design-software/qsys/exm-demo-axi3-memory.html

 

 

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos. 

 

0 Kudos
RichardTanSY_Intel
3,944 Views

Hi @MichaelB 

 

May I know does my latest reply helps? 

Do you need further help regarding to this case? 

 

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos. 

0 Kudos
MichaelB
New Contributor I
3,921 Views

Hi Richard,

 

yes, I won't use the outstanding transactions due it is not supported by EMIF core anyway.

Furthermore I think it is not very beneficial to edit a standard component with an own component due the outstanding is not supported anyway by the EMIF core itself.

 

Would you recommend to use a CCB or an autogenerated CCB between two Pipeline Bridges? (mm_interconnect)

Here I would then configure without any outstanding transactions and just defining the BURST size.

 

Would this be a valid design for high throughput from AXI master to DDR?

Again this is my main reason why I opened this thread.

 

With > 3 Gbps I would get overflows on the AXI master side - here I'm running with 160 MHz @ 256b and don't know why I have overflows.
DDR is working with 200 MHz @ 512b and it doesn't make sense for me why I get overflows then - We faced such issues previously and we checked in simulation that Avalon <-> AXI (mm_interconnect) does not response fast enough with a valid indication.

 

Furthermore we figured out to do protocol conversion and CDC (AXI 160 MHz @ 256b <-> Avalon 200 MHz @ 512b) is even worse in throughput than doing the protocol conversion first and then the CDC from Avalon <-> Avalon only.

 

Here I really want to be sure to support a data rate > 10 Gbps which should be possible with a BURST size of 32 in a 160 MHz @ 256b domain.

 

Would you recommend to just set max. read/write outstanding to 0 and just do the connection with Avalon <-> AXI BURST/bitwidth conversion?

 

Kind regards,

 Michael

0 Kudos
Pramod_atintel
Employee
3,728 Views

Hi Michael,

 

I am building a system with PCIe endpoint <-> AXI <-> HBM2. 

I had a query regarding whether AXI interface able to support larger burstcount ? I have enabled burstcount greater than 32 in HBM controller.

Still, when I increase the burstcount in software to greater than 2, the HBM controller doesnt respond with data. 

Did you face such issues interfacing AXI with HBM

 

Best Regards,

Pramod

 

0 Kudos
MichaelB
New Contributor I
3,712 Views

Hi Pramod,

 

we did a similar architecture where we had multiple masters connected to a single HBM2 channel + PCIe DMA access, too.
Here the HBM2 burst controller seems to have a bug in some specific Quartus versions (Q20.4 and lower).

We succeeded with a solution provided by Intel to patch the IP files after IP generation (yes, you won't be able to use the common tool flow anymore sadly). With Quartus 21.1 you will have this fix provided in the IP again.


https://www.intel.com/content/www/us/en/support/programmable/articles/000086781.html

For us it was a long way to debug this. Hopefully this will help you.

 

FYI:
We currently didn't switch to 21.1 because with 20.4 we got no timing violations in our design and with 21.1 the retiming process is not working properly again. With this we got tremendous timing violations and there we did not get a clear information from Intel support team why this is happening with a version upgrade.
Let's see if this is fixed in further versions...

 

Kind regards,

 Michael

Pramod_atintel
Employee
3,700 Views

Hi Michael,

Thanks for the reply.

I am using Quartus 21.3 version. That should have solved the burst count issue, but still I am seeing the same issue.

I will try replacing the auto-generated file from the link you sent and check.

In 21.3, I am getting some timing violations, but most of them are false paths (signaltap related).

Did you use AVMM or AVST for PCIe interface ? 

 

Best Regards,

Pramod

0 Kudos
MichaelB
New Contributor I
3,696 Views

Hi Pramod,

ok I would assume that should be resolved already.
Could you share the configuration of the HBM2 controller IP file? Then I could compare it with our IP file.

Maybe there are some further settings necessary to enable the burst support.

We are using the PCIe AVMM (DMA) connected to the HBM2 (Stratix 10 MX).
Do you use any module in between PCIe <-> HBM2?

We faced some issue with the AXI bridge in between and just connected the AVMM of the PCIe directly to the HBM2 AXI ports.

Kind regards,

 Michael

0 Kudos
Pramod_atintel
Employee
3,692 Views

Hi Michael,

 

There is an option in HBM Controller, which says enable burstcount greater than 2 for AXI interface. I have enabled it and entered a value of 128. Ideally, it should have solved the issue. 

HBM2 Controller IP file -- you mean XML file which contains the configuration, right ?

I have attached the XML file here.

I started with the design example PCIe + DDR + HBM2 :

https://fpgacloud.intel.com/devstore/platform/19.1.0/Pro/an881-pcie-avmm-dma-gen3x16-ddr4-and-hbm2/

I upgraded the design from 19.2 to 21.3. I couldnt get the DMA software working in our Ubuntu 18.04 setup.

Problem was PIO operation was working fine, but DMA operation was always failing. Some of the functions in driver provided requires older linux kernel version, while Ubuntu 18.04 has newer linux kernel version. 

If you can provide some pointers on your setup or how to make it working, it would be helpful.

Hence I replaced the AVMM PCIe IP with MCDMA IP, whose software is working in Ubuntu 18.04. 

Example design has HBM to AXI verilog wrapper file, I have not modified it. 

I am able to do PIO operations, DMA operations where payload is limited to 64 Bytes at a time.

 

Best Regards,

Pramod

 

 

 

 

 

 

Best Regards,

Pramod

0 Kudos
MichaelB
New Contributor I
3,688 Views

Hi Pramod,

 

here a screenshot of our current controller settings in the HBM2 controller IP core (Quartus 20.4 / IP version 19.6.1):

HBM2 controller settingsHBM2 controller settings

 

Unfortunately we did not test the example design. Furthermore we wrote our own driver for the PCIe DMA access which I'm allowed to share due to confidential information.

We implemented our own converter (confidential, too) because the Intel AXI Bridge could not achieved the promised data rate.

 

We had several discussion/debug sessions with Intel Premier support over weeks but all provided possibilities with Intel IP only were not successful.
We added a signal tap in the mm_interconnect (which is autogenerated by QSYS) and AXI Bridge IP core and saw wrong AXI transactions ongoing which blocked/decreased the throughput tremendously. We proved that in our simulations, too.


Further, the data rate between PCIe DMA and HBM2 wasn't our main topic. We had some further AXI busses connected to the HBM2 which are providing the data (here we had a high data rate requirement).
The readout of the data using PCIe DMA wasn't used to have a specific data rate.

For the AXI connection to the AXI HBM2 we implemented our own fabric, too, instead of using the autogenerated mm_interconnect.

 

Kind regards,

 Michael

0 Kudos
Pramod_atintel
Employee
3,470 Views

Hi Michael,

 

In the DMA transfer from PCIe <-> HBM, i am seeing very high data rate for receive port (Rx, from FPGA to host) and very less bandwidth for transmitter port (Tx, from host PC to FPGA).

Did you face such issues ?

 

Regards,

Pramod

0 Kudos
Reply