Re: Avalon to AXI implementation

MichaelB · ‎08-04-2021

Hi,

currently I'm thinking to implement an own Avalon <-> AXI4 (MM) adapter and not using the QSYS autogenerated adapter.

Currently we are using an AXI4 DMA which will stream data into the DDR4.

Because the DDR4 is using an Avalon interface I already tried the autogenerated converter which is too slow to support the data rates.

Even with pending transactions set to 64 & burst size 16 we are not able to achieve data rate > 3 Gbps (AXI has burst size 16, too).

DDR4 (1600 MHz) Avalon running with 200 MHz @ 512b and DMA with 160 MHz @ 256b.

Due the DMA on the 160 MHz got overflows I can be sure the transaction is the issue (I did not expect this because I have a higher frequency and double data width on receiving side..)

We had the same issues for AXI DMA <-> AXI HBM2 as well.

Here we implemented our own AXI <-> AXI connection which was much better and supports our required data rates much better than the autogenerated AXI converter (verified in simulation & on FPGA).

Regarding this we already had several debug sessions with Premier support - final result was to NOT use the autogenerated adapter and use our own...

Again in DDR4 we are facing the same throughput limitations (AXI <-> Avalon is the limitation).

Could you give me advices to implement this conversion?

Are there any data sheets which already describe the adapter autogenerated by QSYS?

Furthermore I saw I can edit the maximum pending read transactions on the DDR4 EMIF core & on Avalon Clock Crossing Bridges but not the maximum pending write transactions. Is there a reason why I cannot edit these parameters in those IP cores?

Kind regards,

Michael

RichardTanSY_Altera · ‎08-23-2021

Hi @MichaelB

Sorry for the delay in response.

You may checkout the User Guide below for more information on the adapter autogenerated by Platform Designer.

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qpp-platform-designer.pdf

Could you share a screenshot of the DDR4 EMIF core & Avalon Clock Crossing Bridges that shown you can edit the maximum pending read transactions but not the maximum pending write transactions?

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos.

MichaelB · ‎08-23-2021

Hi Richard,

thanks for your reply!

In the EMIF DDR4 & Avalon CCB settings I'd like to increase pending writes (CCB Avalon_M will write data to Avalon_S of DDR4):

AXI_M (write) -> Avalon CCB -> DDR4

AXI Burst length = 16 (data width = 256) & Avalon Burst length = 8 (data width = 512) based on my calculation:

16*256 = 8*512

Will the interconnect resolve data width conversion and align the bursts?

Screenshots of CCB & EMIF:

EMIF DDR4 parametersEMIF DDR4 AVM settingsAvalon CCB AVM settingsAvalon CCB parameters

Let me know if you need further information!

Best regards,

Michael

RichardTanSY_Altera · ‎08-24-2021

Hi @MichaelB

From what I found, the DDR4 EMIF core & Avalon MM Clock Crossing Bridges does not seem to support maximum pending write transactions. The interface must have both response and writeresponsevalid signals which these IP does not have. You may create a custom component though by adding the respective signals. fyi, the maximum pending read transactions need readdatavalid signal.

Will the interconnect resolve data width conversion and align the bursts?

Make sure there is no error in the system message and the platform designer should take care most of the interconnect between the interface.

You may check the chapter 5.1. Memory-Mapped Interfaces for further details:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qpp-platform-designer.pdf#page=208

If you have further question on EMIF, I would recommend you to open a new forum case on EMIF related questions. As I am not an expert in EMIF unfortunately.

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos.

MichaelB · ‎08-24-2021

Hi Richard,

yes, I recognised this, too, that here are some signals missing (EMIF core & Avalon CCB).

Is there an option in the Avalon CCB & EMIF core to enable those?
How can I create a custom component for an standard IP? Would this be a custom component instantiating the EMIF core?

I tried to edit the interface of those IP cores in the component section in QSYS but I cannot add further signals.

I already read through the EMIF user guide (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-10/ug-s10-emi.pdf) but I could not found any setting to enable/disable pending write transactions.

Currently I am using the autogenerated interconnect from QSYS to resolve 256b AXI (BL = 16) to 512b Avalon (BL = 8).

Here I did not see any errors in the QSYS only the hint that an Avalon adapter will be inserted between AXI <-> Avalon.

Are there any special settings of the mm_interconnect I have to configure to do the bus conversion?

From the documentation of the Platform Designer I assumed this will be done by mm_interconnect automatically.

Do you have any reference design (QSYS) where a AXI <-> Avalon connection with bitwidth conversion + burst conversion is done?

That would be helpful to understand the settings on both sides to align them for the best throughput performance.

Kind regards,

Michael

RichardTanSY_Altera · ‎08-24-2021

Hi @MichaelB

Is there an option in the Avalon CCB & EMIF core to enable those?

Unfortunately I do not see a way to enable those.

How can I create a custom component for an standard IP? Would this be a custom component instantiating the EMIF core?

You may launch the Component Editor by double-clicking New Component at the top of the IP Catalog or by selecting New Component from the File menu.

You may checkout the training video below. Chapter 9. Creating custom component.

https://www.intel.com/content/www/us/en/programmable/support/training/course/oqsys3000.html

Are there any special settings of the mm_interconnect I have to configure to do the bus conversion?

You do not need to configure anything. As you mentioned, this will be done by mm_interconnect automatically.

https://www.youtube.com/watch?v=LdD2B1x-5vo

Do you have any reference design (QSYS) where a AXI <-> Avalon connection with bitwidth conversion + burst conversion is done?

This is the closest design example that I found with AXI and Avalon connection.

https://www.intel.com/content/altera-www/global/en_us/index/support/support-resources/design-examples/design-software/qsys/exm-demo-axi3-memory.html

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos.

RichardTanSY_Altera · ‎08-29-2021

Hi @MichaelB

May I know does my latest reply helps?

Do you need further help regarding to this case?

Best Regards,
Richard Tan

p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos.

MichaelB · ‎09-01-2021

Hi Richard,

yes, I won't use the outstanding transactions due it is not supported by EMIF core anyway.

Furthermore I think it is not very beneficial to edit a standard component with an own component due the outstanding is not supported anyway by the EMIF core itself.

Would you recommend to use a CCB or an autogenerated CCB between two Pipeline Bridges? (mm_interconnect)

Here I would then configure without any outstanding transactions and just defining the BURST size.

Would this be a valid design for high throughput from AXI master to DDR?

Again this is my main reason why I opened this thread.

With > 3 Gbps I would get overflows on the AXI master side - here I'm running with 160 MHz @ 256b and don't know why I have overflows.
DDR is working with 200 MHz @ 512b and it doesn't make sense for me why I get overflows then - We faced such issues previously and we checked in simulation that Avalon <-> AXI (mm_interconnect) does not response fast enough with a valid indication.

Furthermore we figured out to do protocol conversion and CDC (AXI 160 MHz @ 256b <-> Avalon 200 MHz @ 512b) is even worse in throughput than doing the protocol conversion first and then the CDC from Avalon <-> Avalon only.

Here I really want to be sure to support a data rate > 10 Gbps which should be possible with a BURST size of 32 in a 160 MHz @ 256b domain.

Would you recommend to just set max. read/write outstanding to 0 and just do the connection with Avalon <-> AXI BURST/bitwidth conversion?

Kind regards,

Michael

Pramod_atintel · ‎11-04-2021

Hi Michael,

I am building a system with PCIe endpoint <-> AXI <-> HBM2.

I had a query regarding whether AXI interface able to support larger burstcount ? I have enabled burstcount greater than 32 in HBM controller.

Still, when I increase the burstcount in software to greater than 2, the HBM controller doesnt respond with data.

Did you face such issues interfacing AXI with HBM

Best Regards,

Pramod

MichaelB · ‎11-05-2021

Hi Pramod,

we did a similar architecture where we had multiple masters connected to a single HBM2 channel + PCIe DMA access, too.
Here the HBM2 burst controller seems to have a bug in some specific Quartus versions (Q20.4 and lower).

We succeeded with a solution provided by Intel to patch the IP files after IP generation (yes, you won't be able to use the common tool flow anymore sadly). With Quartus 21.1 you will have this fix provided in the IP again.

https://www.intel.com/content/www/us/en/support/programmable/articles/000086781.html

For us it was a long way to debug this. Hopefully this will help you.

FYI:
We currently didn't switch to 21.1 because with 20.4 we got no timing violations in our design and with 21.1 the retiming process is not working properly again. With this we got tremendous timing violations and there we did not get a clear information from Intel support team why this is happening with a version upgrade.
Let's see if this is fixed in further versions...

Kind regards,

Michael

Pramod_atintel · ‎11-05-2021

Hi Michael,

Thanks for the reply.

I am using Quartus 21.3 version. That should have solved the burst count issue, but still I am seeing the same issue.

I will try replacing the auto-generated file from the link you sent and check.

In 21.3, I am getting some timing violations, but most of them are false paths (signaltap related).

Did you use AVMM or AVST for PCIe interface ?

Best Regards,

Pramod

MichaelB · ‎11-05-2021

Hi Pramod,

ok I would assume that should be resolved already.
Could you share the configuration of the HBM2 controller IP file? Then I could compare it with our IP file.

Maybe there are some further settings necessary to enable the burst support.

We are using the PCIe AVMM (DMA) connected to the HBM2 (Stratix 10 MX).
Do you use any module in between PCIe <-> HBM2?

We faced some issue with the AXI bridge in between and just connected the AVMM of the PCIe directly to the HBM2 AXI ports.

Kind regards,

Michael

Pramod_atintel · ‎11-05-2021

Hi Michael,

There is an option in HBM Controller, which says enable burstcount greater than 2 for AXI interface. I have enabled it and entered a value of 128. Ideally, it should have solved the issue.

HBM2 Controller IP file -- you mean XML file which contains the configuration, right ?

I have attached the XML file here.

I started with the design example PCIe + DDR + HBM2 :

https://fpgacloud.intel.com/devstore/platform/19.1.0/Pro/an881-pcie-avmm-dma-gen3x16-ddr4-and-hbm2/

I upgraded the design from 19.2 to 21.3. I couldnt get the DMA software working in our Ubuntu 18.04 setup.

Problem was PIO operation was working fine, but DMA operation was always failing. Some of the functions in driver provided requires older linux kernel version, while Ubuntu 18.04 has newer linux kernel version.

If you can provide some pointers on your setup or how to make it working, it would be helpful.

Hence I replaced the AVMM PCIe IP with MCDMA IP, whose software is working in Ubuntu 18.04.

Example design has HBM to AXI verilog wrapper file, I have not modified it.

I am able to do PIO operations, DMA operations where payload is limited to 64 Bytes at a time.

Best Regards,

Pramod

Best Regards,

Pramod

MichaelB · ‎11-05-2021

Hi Pramod,

here a screenshot of our current controller settings in the HBM2 controller IP core (Quartus 20.4 / IP version 19.6.1):

HBM2 controller settings

Unfortunately we did not test the example design. Furthermore we wrote our own driver for the PCIe DMA access which I'm allowed to share due to confidential information.

We implemented our own converter (confidential, too) because the Intel AXI Bridge could not achieved the promised data rate.

We had several discussion/debug sessions with Intel Premier support over weeks but all provided possibilities with Intel IP only were not successful.
We added a signal tap in the mm_interconnect (which is autogenerated by QSYS) and AXI Bridge IP core and saw wrong AXI transactions ongoing which blocked/decreased the throughput tremendously. We proved that in our simulations, too.

Further, the data rate between PCIe DMA and HBM2 wasn't our main topic. We had some further AXI busses connected to the HBM2 which are providing the data (here we had a high data rate requirement).
The readout of the data using PCIe DMA wasn't used to have a specific data rate.

For the AXI connection to the AXI HBM2 we implemented our own fabric, too, instead of using the autogenerated mm_interconnect.

Kind regards,

Michael

Pramod_atintel · ‎12-06-2021

Hi Michael,

In the DMA transfer from PCIe <-> HBM, i am seeing very high data rate for receive port (Rx, from FPGA to host) and very less bandwidth for transmitter port (Tx, from host PC to FPGA).

Did you face such issues ?

Regards,

Pramod