This is a follow up to the thread at https://community.intel.com/t5/FPGA-Intellectual-Property/HBM2-interfacing-with-the-PCI-express-hard-IP-STRATIX-10-MX/td-p/1233329/page/2.
We are trying to use a 512-bit interface to HBM2 from Stratix 10 MX FPGA, but are having trouble with a design file from a reference design. The file in question is 'axi_bridge_to_hbmc_if.v', from https://fpgacloud.intel.com/devstore/platform/18.1.0/Pro/pci-express-gen3-x16-avmm-dma-with-hbm2-memory-reference-design/?wapkw=hbm2%20interface%20pci%20express,. This file merges two 256-bit AXI4 slave interfaces (from two HBM2 pseudo-channels) into a single 512-bit interface.
We've resolved several bugs in this file, but are still encountering issues in our tests. We're pretty sure the error is in this file because we successfully tested our design with a single 256-bit interface to HBM2 after removing only this file-component. In the linked thread it was said that this design was still under validation. Has there been any update on this?
BoonT_Intel already left Intel. I will be the new support agent helping you out.
FYI... appreciate if you can file new forum thread moving forward instead of going back to post on old forum thread that may not be accessible to Intel support agent anymore.
I noticed you are still using old AN881 reference design compiled with Quartus Pro v18.1. I checked the reference design link and found out it already contains updated version of reference design targeting Quartus Pro v19.2
- I traced Intel internal record history and BoonT_Intel did validated v19.2 reference design
So, can you try out this latest v19.2 reference design ?
Unfortunately the error is still encountered when using the latest reference design. I will provide some details on the error and our design below:
We are attempting to modify the reference design to provide HBM2 access to an application kernel on the FPGA fabric. This is currently being done by MUXing two AXI4 master interfaces, one from the AVMM PCIe IP and the other from the application kernel, for connection to the AXI bridge that merges the HBM2 pseudo-channels.
We use our own created AXI4 master interface component to facilitate transfers from the application kernel to HBM2. This component uses FSMs for each AXI4 channel (aw, w, b, ar, r) and FIFOs, to allow for data to be pipelined in/out of the application kernel to/from HBM2. The error takes the form of a hang, and occurs specifically when we attempt to transfer more than 256 data words “in a row” (our AXI4 component accepts a ‘size’ signal which determines how many reads/writes to send in a row, controlled by the FSMs).
We do not think that our AXI4 master interface component is the failure source: when interfacing with just one of the HBM2 pseudo-channels (using only 256-bit bandwidth and as such not using the AXI bridge component from the reference design), no errors are encountered. I’ve been looking primarily into the axi_bridge_to_hbmc_if.v file from the reference design, as it seems to contain all the AXI4 interface merging logic, but have yet been unable to find a bug that would cause the error we’re seeing.
I’d greatly appreciate any ideas you have on what could be causing this, or any debugging advice.
Just to confirm again are you using Quartus v19.2 specifically as the reference design is validated with v19.2 only ?
And also to confirm the failure only occur after you made modification to the reference design, right ?
- The failure occurs on the path where your kernel application is transferring data to HBM2 ? May I know is this using PCIe as well or via other protocol ?
You mentioned the failure only occurs once you transfer data exceeding certain limit. Maybe you can
- Check on your back pressure design block if it's present in your design implementation
- Check your software application driver - buffer design to ensure it can cater for big data transfer
- Ensure your FPGA quartus design is timing clean
- If you really want to drill down further into debugging then you need to start to signal_tap the AXI master -> whatever interconnect block -> HBM2 soft interface to slowly isolate where is the failure point
I made a mistake in my last message. We are not modifying the reference design, just using the 'AXI Bridge' component that merges the two HBM2 pseudo-channel AXI4 interfaces. The failure only occurs when we use this component in our own design (i.e. when we wish to have a 512-bit wide AXI4 interface between kernel and 2 HBM2 pseudo-channels). We do not run into any error when this component is omitted (when we instead use a 256-bit wide AXI4 interface between kernel and 1 HBM2 pseudo-channel).
We've been using Quartus v19.4, and are limited to this version because of the PCIe FPGA Board that we are using. Is there anyway around the fact that the reference design is only validated for v19.2? Is there by chance another IP that just merges two AXI4 interfaces, that has been validated on v19.4?
Thanks for the debugging pointers; I will begin looking into these.
Unfortunately the original developer that created this reference design already left Intel. So, there is no one to continue to look into this reference design project anymore.
The last known good Quartus version that validated the reference design was v19.2. We can't guarantee it will continue to work for newer Quartus version due to unforeseen changes that may appear in both IP design and Quartus software.
If you truely suspect the "AXI bridge" can't integrate well with your design requirement then my advice to you is pls consider to redesign your own bridge to manage the data transaction.
- Most of the time design block in reference design is meant to work for some specific test/demo purpose only and it's not robust to be ported to different design