Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21390 Discussions

Agilex F-Tile tx_ready fails to assert

jmcguire3
Novice
2,349 Views

I'm starting a new thread, because the previous discussion was booted into the void:

https://community.intel.com/t5/Intel-SoC-FPGA-Embedded/Agilex-Ftile-is-not-asserting-tx-ready-after-the-reset-sequence/m-p/1463488

 

I have an 027 SoC evm with the AGIB027R31B1V1VAA part installed.  I removed the heatsink and verified this.

AGI027_package_marking_lo-res.jpg

I have two implementations of a soooper-simple design that attempts to drive fixed patters out the FGT transceivers.  One implementation works, the other does not.  Both designs use a common top-level Verilog file (with minor accommodations for the core-interface signal variances.)  The end goal is to drive signals at 1.5Gbps out the four lanes in one of the QSFP-DD cages present on the EVM.  (There's a bus-twist issue in the pin assignments, but I will address that in a different posting.)

 

So, the working design uses Platform Designer to install the HSSI refclk generator, and connects that to four instances of a x1 PMA core. (source files attached) Outputs are pretty much what you'd expect.

quartus_platform_fgt_four-inst.pngfgt-four-inst.png

The reference clock on the board is 156.25MHz, and the core is configured for an FGT output, Simplex TX, NRZ line coding, and a bit rate of 1562.5 MHz (10x the ref clock.)  The F-Tile PLL is running in integer mode.  The AIB data interface is set to 20-bits width, and I'm providing a fixed 20-bit pattern on each output - validated by the 78.125-ish MHz measurement in the bottom-right corner of the scope screengrab.  PMA config is mostly the default values - Elastic FIFO for the PMA FIFO; the other two are Phase Comp.  Interface is single rate, with the Core FIFO partial-full threshold dropped to "5" to clear the attendant error. 

 

Where this goes to hell is when I change from four-instances of x1 lanes to one-instance of x4 lanes ... which should be a perfectly valid configuration, right?  Same design, same clocking structure, same top-level Verilog, same physical hardware platform.  All I asked for was a 4-channel binder group in Platform Designer (which presumably coordinates operation of the channels as a group.) 

quartus_platform_fgt_one-inst_annot.pngfgt-one-inst.png

I brought all the status and FIFO semaphores up to the top level at one point, and "tx_ready" is not asserting in the one-instance x4 case.  None of the four tx_ready lines asserts.  Ever.  Now before you ask, yes, there's a reset-release block in Platform Designer, and there's a state machine at the top level Verilog that enforces the reset-request behavior at power-on per the F-Tile PMA documentation. 

 

I have poured through the littany of warning messages produced by the compile process, and the one-instance design generates this gem that caught my attention:

Warning(16735): Verilog HDL warning at directphy_f_sip_460_p4taocq.sv(781): actual bit length 4 differs from formal bit length 1 for port "dphy_reset_status_tx_ready_i"

So there appears to be a width-mismatch in the core generation script when the "Number of PMA Lanes" parameter is something other than 1.  I dug further into the directphy_f_sip_460_p4taocq.sv source and found this not-confidence-inspiring comment:

 

 

// Soft reset controller:
// Replaced the parameter l_sys_xcvrs with num_xcvr_per_sys*num_sys_cop to make it more explicit that the soft reset controller operates
// on each transceiver lane in a system copy, or reset group, but each reset group is tied to a single system copy
// The number of ports is unchanged. The ports with the '__MB__ markup are not brought out to the top level in the IP, but are
// connected at the qtlg step using cross-module references.
// TX outputs are present only if the tx is enabled

 

 

It makes sense that the Soft Reset Controller would try to resolve all the individual tx_ready signals for a group before letting the group run.  And it makes sense that the tx_ready width-mismatch might make the compiler shrug its shoulders and just insert the default GND for the signals it can't explicitly map.  And it would make sense that this is being caused by a script-hack that [poorly] implements cross-module referencing rather than using the hierarchy properly. 

 

I'm convinced that Quartus has a problem implementing the F-Tile PMA core properly for anything other than the demo designs.  I'm running 23.3, but have experienced this issue with 23.1, and 22.3 is completely non-functional with the F-Tile PMA core. 

 

Archive is attached with screenshots, verilog, and .qsf files.  I did not include anything else, because you'll need to re-implement the Platform Designer stuff for your particular version of Quartus, and the design is simple enough that it's not an imposition to clean-sheet this from scratch with the included info as a guide. 

 

If you want me to dig further into this, I have a "fix other peoples' problems" rate of $250/hr with a 4-hour minimum.  But you're more than welcome to fix it yourselves.

 

 

Labels (1)
(Virus scan in progress ...)
0 Kudos
9 Replies
jmcguire3
Novice
2,292 Views

Did a whole lot more reading about the F-Tile clocking architecture, and I'm convinced that my inexperience/ignorance of F-Tile internals has contributed to some of this.  Here's a visual clue:

quartus_platform_fgt_one-inst_annot02.png

 

I had followed the instructions and the tutorials and the demo-designs, but none of them were really clear about the SystemPLL requirements.  When you choose the 1-instance 4-channel configuration, you are required to use a SystemPLL.  That's about all the information Quartus provides.  It makes sense that one PMA channel can't drive the clock tree of other PMA channels because there isn't an internal routing resource (according to the F-Tile User Guide.)  However you can (and MUST) use the SystemPLL in the multi-channel configuration, because the SystemPLL DOES have a distribution capability. 

f-tile_pma_clocking_tx-only_annot.jpg

 

Here's the rub.  When I changed to the 1x4 configuration, Platform Designer complained that the "PMA Clocking" selection was invalid. So I changed that option to "SystemPLL," which resulted in an error message that "sys_PLL_clk_link" needed to be connected to ... something.  I found a radio button in the F-Tile RefClk generator IP that enabled the SystemPLL link, and connected that to the F-Tile instance.  Then Platform Designer complained that the two SystemPLL rates didn't match between those two blocks.  Here's where my inexperience kicks in - I just made the rates match to clear the error message.  The rate I chose was 805.6640625 MHz because that's what the clock generator block defaulted to. 

It didn't occur to me that Quartus would accept that rate and compile just fine.  I expected that Platform Designer, which has any number of configuration validations, would confirm that the input clock rate and the output clock rate of the "everything is bypassed" transmitter configuration would be sane.  Nope. 

In my application, the output PMA serializer is running at 1562.5 MHz (10x the 156.25MHz ref clock.) I chose a 20-bit data path width, which says the PMA word clock is 78.125MHz.  Because every element in the Tx path is just pass-thru, there's no rate adaptation.  The clocking, whether it's PMA or SystemPLL, had better be 78.125MHz at every stage in the path.  Otherwise, you'll see FIFO empty or full flags bashing around in what should be a synchronous flow-thru system.  I did observe the FIFO flags doing weird things when they shouldn't, and I'm pretty sure the clock rate mismatch is responsible for that.  There's still some F-Tile configuration weirdness going on because I have observed the tx_clkout signal, and it was nominally 78.125MHz and not 40.283 (which would have been 805.6640625 / 20).  I'm not sure what clock was being provided to me on the tx_clkout pin from the core.  Maybe I was getting the PMA clock but the F-Tile was using the SystemPLL clock for some of the internal bits?  I don't have any visibility into those elements.

I have configured the SystemPLL to be 78.125MHz and ... still no output.  Tx_Ready still doesn't assert.

0 Kudos
jmcguire3
Novice
2,289 Views

Correction: went back and double-checked.  Tx_Ready *did* assert with the SystemPLL set to 78.125MHz.  Haven't seen that happen in this configuration until now.

But the outputs are still all-zeroes.  Something else is still suspending proper operation.

0 Kudos
Kshitij_Intel
Employee
2,261 Views

Hi,


To debug your design, you have to share your design. Don't worry about the Quartus version. Please just mention the version you build the design. So, that I will debug your design in that same version.


Thank you,

Kshitij Goel


0 Kudos
jmcguire3
Novice
2,242 Views

Okay, took a while to figure out what I think you want ... a .qar archive with the "service request" features enabled, right?

https://www.intel.com/content/www/us/en/docs/programmable/683463/21-3/archiving-projects-for-service-requests.html

 

Archive and log files are attached.

(Virus scan in progress ...)
(Virus scan in progress ...)
0 Kudos
Kshitij_Intel
Employee
2,166 Views

Hi,


Thank you for sharing the project QAR. I will look into it.


Thank you,

Kshitij Goel


0 Kudos
Kshitij_Intel
Employee
2,089 Views

Hi,


it's good to hear that your tx_ready asserts now. Are you toggling the reset line on the reconfiguration controller as part of post-device configuration. If not, toggle once.


Thank you,

Kshitij Goel


0 Kudos
jmcguire3
Novice
2,004 Views

The design I uploaded has the F-Tile reset signal sourced from (init_done || 1v2reset).  1v2reset is connected to GPIO Pin AB53 (FPGA_RESETn on the schematic, page 16) that's routed through the MAX10 to SYS_PB1 on the EVM. 

Pressing-and-holding the SYS_PB1 button initiates and maintains the reset.  The FGT outputs enter a "hard low" where both signals of the diff pair are down near ground. 

Releasing the SYS_PB1 button returns the FGT outputs to a differential-zero - the (p) signals are above ground by about 100mV, and the (n) signals are up around 350mV.  It's definitely a distinct change from the in-reset state, but the FGT outputs are stuck at all-zeroes.

0 Kudos
Kshitij_Intel
Employee
2,015 Views

Hi,


As we do not receive any response from you on the previous reply that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.


Thank you,

Kshitij Goel


0 Kudos
jmcguire3
Novice
2,002 Views

Scraping me off to community support already?  C'mon, man, you haven't even asked me to reboot my PC or to try upgrading to Windows 11.

0 Kudos
Reply