Re: TSE MAC catatonic

Altera_Forum · ‎06-23-2011

I've got a real head-scratcher of a problem with the Altera TSE MAC core. Here's the outline:

* Altera/Terasic DE2-115 board, Cyclone IV-E FPGA

* Full Quartus subscription, but no TSE license (using OpenCore+ for TSE)

* TSE instantiated in SOPC builder, two mSGDMAs instantiated for RX/TX DMA.

* TSE is configured in 10/100 small mode, MII.

* PHY is an 88E1111 strapped to MII mode (the DE2-115 has 2 of these and each one has an MII/RGMII strap jumper).

* System is running full-blown vanilla Linux. Wrote a custom driver from scratch for the TSE/DMAs (replacing altera_tse.c and atse.c, which are absolutely terrible).

So here's the problem: The TSE is initialized and the link is brought up. PHY MDIO stuff works 100% perfectly and it is able to autonegotiate the correct mode.

Packets are coming in on the PHY'x RX interface, I can see them on SignalTap, and the packets look good as well (0x55 55 ... 5D header plus correct packet contents). However, the TSE MAC appears to completely ignore them, no data ever appears on the Avalon-ST receive source. All MAC-sourced Avalon-ST signals on the receive side just idle low. The MAC is in promiscuous mode for now, so all receive filtering should be disabled.

When the OS attempts to send packets, the TX DMA kicks in and does its thing, a very correct-looking packet goes out to the TSE's Avalon-ST transmit sink... and it completely ignores that too. No data appears on the MII TX signals, they are constantly low. The Avalon-ST valid/sop/eop signals all appear to be operating correctly as well.

I am using Quartus 9.1sp2 on Linux, but I tried regenerating/resynthesizing the entire system with Quartus 11.0 as well since there is quite the errata list for the TSE. The entire system's behavior was identical. I have also tried using the TSE in 10/100/1000 mode with RGMII and the behavior was identical (actually, I was trying this first until the RGMII DDR timing constraint nightmare set in).

So, the question is: Under what circumstances would the TSE MAC not allow packet RX/TX data to pass? As far as I can tell the MAC has been reset and initialized properly, readback of the Command_Config register gives 0x01000053 (RX_ENA and TX_ENA set among others). What could possibly be wrong?!

Altera_Forum · ‎06-24-2011

You said you don't have the license for the TSE core. I don't know how this is handled in Linux, but with Windows you have a small alert window saying that you are evaluating an Opencore licensed IP. If you click on cancel or close that window, then all the unlicensed Opencore IPs will stop working. This would result in what you see.

Altera_Forum · ‎06-24-2011

Yes, I get that window with the cancel button. The system behavior is identical whether I click Cancel or not, both in 9.1sp2 and 11.0, which seems very strange.

I tried instantiating the OpenCore+ evaluation timeout indicator core; the timeout indicator works correctly (asserts only after the Cancel button is pressed). I wonder if the TSE core is timing out immediately after programming, since core timeout matches the problem description.

Altera_Forum · ‎06-27-2011

No in that case the problem is somewhere else... If the opencore timeout indicator says everything is okay, then the TSE should work too.

Can you add the statistics counter option when generating the TSE core, and have your software read them? You should be able to see if the received packets counter and/or error counter are increased.

I supposed you did it already, but check that the TSE core isn't in loopback mode.

Altera_Forum · ‎06-27-2011

--- Quote Start ---

Can you add the statistics counter option when generating the TSE core, and have your software read them? You should be able to see if the received packets counter and/or error counter are increased.

--- Quote End ---

I tried this. etherStatsOctets increases in a plausible way (sees network broadcast traffic). ifInErrors goes to 1 after the first received packet but never increases after that. The MAC should be configured to pass along RX error frames (RX_ERR_DISC cleared) but it does not appear to be doing so, nothing ever happens on the RX ST interface. None of the counters relevant to the TX side seem to be incrementing.

Here is a partial dump of some of the counters after a few seconds, during which several packets should have been successfully transferred in both directions:


 aFramesTransmittedOK=00000000    
 aFramesReceivedOK=00000000   
 aFramesCheckSequenceErrors=00000000
 aAlignmentErrors=00000000          
 aOctetsTransmittedOK=00000000
 aOctetsReceivedOK=00000000   
 ifInErrors=00000001       
 ifOutErrors=00000000
 etherStatsDropEvent=00000000
 etherStatsOctets=000009a2   
 etherStatsPkts=00000001  
 etherStatsUndersizePkts=00000000
 etherStatsOversizePkts=00000000 
 etherStatsJabbers=00000000     
 etherStatsFragments=00000000

--- Quote Start ---

I supposed you did it already, but check that the TSE core isn't in loopback mode.

--- Quote End ---

It is not, if I turn loopback mode on then the octet counters take off counting at high speed. No actual packets are transferred though. I bet this is an implementation artifact (maybe they assume there is valid MII data every cycle).

Altera_Forum · ‎06-28-2011

That's odd, it seems that something gets stuck inside the TSE core. But at least the fact that the octets counter increases shows that the opencore evaluation is still running.

I don't remember if this is possible with the small TSE core, but can you disable CRC checking?

Are the clocks correctly connected between the TSE core and the PHY chip?

Does the hardware design complies with all the timing requirements?

If you could replace the DMAs with Altera's SGDMAs and try to run a standard software example with uc/OS and Niche stack (such as the sockets server example) you could check if this as a hardware of software issue.

Altera_Forum · ‎06-29-2011

--- Quote Start ---

I don't remember if this is possible with the small TSE core, but can you disable CRC checking?

--- Quote End ---

I already have it off. I do need the full core anyway; the small TSE can't do TX store-and-forward mode, which my design will need to eliminate a potential source of TX packet loss.

--- Quote Start ---

Are the clocks correctly connected between the TSE core and the PHY chip?

--- Quote End ---

As far as I can tell, yes. For plain MII mode I am getting both RX and TX clocks directly from the PHY. Both clocks are definitely toggling, the TSE will get stuck in its software reset if they don't (I had it happen). Further, the RX MII data looks perfect in SignalTap.

--- Quote Start ---

Does the hardware design complies with all the timing requirements?

--- Quote End ---

Aside from a couple of harmless issues, yes. (There is an asynch reset that feeds all the reset-CDCs which has overly tight constraints on it). All clock domains related to the TSE should be good. The entire rest of the system (3 clock domains) works flawlessly.

--- Quote Start ---

If you could replace the DMAs with Altera's SGDMAs and try to run a standard software example with uc/OS and Niche stack (such as the sockets server example) you could check if this as a hardware of software issue.

--- Quote End ---

Hm, now we come to the tough one. This isn't a Nios system, it is a 4-CPU superscalar MIPS with full cache coherence (yes, cache-coherent DMA). The CPUs and system design are well-proven, and the fact that the MAC and ST interfaces are what's stuck should hopefully rule out all the really tough problems. But unfortunately I can't just drop in Altera's software stack and test it out.

One particularly notable issue related to the different CPU type is the I/O access width when poking the TSE registers. The driver I wrote is very careful to do only aligned full-word reads/writes, so that byte-enables are unnecessary for the TSE. The TSE registers do get written with the intended values so I know this is done right. The Avalon-MM transactions look correct on SignalTap as well. If it would help to post and annotate the exact sequence of read/write transactions to the TSE let me know.

Altera_Forum · ‎06-30-2011

Yes a capture of the Avalon slave interface could be useful. I have a (working!) board with some MII PHY chips, I can do it on my side too.

There are two more things that would come to mind, since you are using a different CPU... one would be to check you are using the correct endianness (I think you do, but you never know, it can be worth mentioning) and that the register addresses described in the TSE datasheet are double word addresses, not byte addresses.

Altera_Forum · ‎07-01-2011

While trying to get you a clean capture of the MAC initialization, I found the bug... it was quite embarassing. I missed a ~ operator when masking out the half-duplex enable bit on link-up, so when the link came up the Command_Config register got mistakenly cleared. Oops.

It looks like things are working more now, I have some DMA alignment issues to take care of but the MAC seems to be OK. I did have to switch to Quartus 11.0 after encountering an RX FIFO bug in 9.1sp2 (underflow, most likely).

Thanks for the help though!

Altera_Forum · ‎07-04-2011

There is an option in the MAC to add/remove two extra null bytes at the beginning of the packet. As the Ethernet header is 14 bytes, those two padding bytes make it 16 bytes instead, which is useful for some drivers that prefer to have the IP header aligned on 32-bits. Your alignment problem could come from there.

Altera_Forum · ‎07-04-2011

--- Quote Start ---

There is an option in the MAC to add/remove two extra null bytes at the beginning of the packet.

--- Quote End ---

I'm aware of this option, under Linux there is a constant NET_IP_ALIGN set to either 0 or (more typically) 2 which determines whether the network stack expects the extra 2 bytes or not. On mipsel it is 2, so my driver sets RX_SHIFT16 and TX_SHIFT16 and adjusts the DMA pointers as it should. Everything here works as expected. (The existing altera_tse.c and atse.c drivers seem to completely misunderstand it)

The main alignment issue is because my system busses and DMA are 64-bit wide, as are the CPUs' cache interfaces, but the CPUs are otherwise 32-bit. This means my buffers are only aligned on 4-byte boundaries, not 8. The unaligned transfer support in the mSGDMA controllers does work as advertised, which is an easy fix.

With those problems sorted I can boot my system on a uniprocessor kernel over the network for about 2 minutes, then the transmit DMA deadlocks due to a missed TX completion. With all 4 CPUs active the DMA lockup happens within a second.

The issue is racing between the RX/TX IRQ handlers and the DMA hardware. It is possible for the mSGDMA to complete a descriptor and push it to the response buffer at the same time the CPU drains the response buffer and clears the interrupt flag. Altera's SGDMA has similar problems, the descriptor update can race with the descriptor chain walk and the CPU can wind up not seeing freshly completed packets.

Altera's TSE/SGDMA driver appears to avoid this problem by simply polling for TX completion... which is an utter travesty. The entire point of having scatter-gather DMA is to queue multiple packets and then move on to processing something else, not to sit there busy-looping.

It looks like I am going to have to write my own DMA controller to avoid this bug. I don't want to change the mSGDMA code, it is already far more complicated than it really should be. The DMA hardware needs to have cleaner IRQ behavior to close the race windows without polling hacks. It will just take me a few days to sort out properly.

Altera_Forum · ‎07-05-2011

I wonder if you could "fix" this problem by polling the DMA status just once, after having cleared the IRQ. It should be enough to ensure that you don't miss any completed descriptor.

Of course a more elegant way would be to have a clear IRQ signal on each descriptor rather than in a common register, so that any unprocessed descriptor would still raise the IRQ. I don't know how much modifications to the mSGDMA code would be required, but BadOmen should have a good idea about this I guess.