FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

TSE or SGDMA related problem with concurrent tx and rx

Altera_Forum
Honored Contributor II
2,514 Views

Hi, 

 

Premise: I don't know if this problem is actually related to tse or sgdma; maybe even to Nios tse driver, although I don't think so.  

 

I'm sending and receiving Ethernet packets using the standard tse reference design with minimal modifications (only added another timer and a few pio for test purposes). 

I don't have timing issues in the design and it has no evident problems with tcp stack and a software like simple_socket_server. 

In my design I actually don't use tcp stack but I simply send and receive raw ethernet packets. 

All works fine as long as I don't have both a rx and a tx packet in the same moment (at least this seems to be the issue): a rx packet is lost if it arrives when my board is transmitting. 

Example: I have a test a device which loopbacks the tx packet; if I connect it directly to my dev board I usually don't receive the answer (although sometimes I do...); if I connect them through an Ethernet switch (so a small delay between tx and rx is introduced) it works. 

The PHY is correctly set to 100Mbit full duplex (I checked with an Ethernet tester), so there shouldn't be problems with concurrent rx and tx packets. 

Maybe there could be any problem with memory available for rx? I use the ref design standard settings. 

Can anyone suggest me if the problem is related to tse or sgdma? 

 

Regards
0 Kudos
17 Replies
Altera_Forum
Honored Contributor II
918 Views

More information... and more help request 

 

I think it is definitely a tse device issue. Infact this is what I discovered: 

I directly inspected the tse statistic counters (tse base address + 0x60 to 0xe0) after sending 14 packets which were looped back by my external test device. 

aFramesTransmittedOK = 14 

aFramesReceivedOK = 9 

aFrameCheckSequenceErrors = 0 

aAlignmentErrors = 0 

ifInErrors = 5 

ifOutErrors = 0 

etherStatsJabbers = 0 

etherStatsFragments = 5 

 

This data matches with my application result: only 9 packets received out of 14. 

The etherStatsFragments counter tells me the 5 lost packets were too short frames with crc error.  

How can this happen? I'm not an ethernet guru and I don't know the mac and phy internal processes. 

Could it be that the part of rx frame data which overlaps with a tx frame will eventually get lost? As if the device was working in a half duplex mode (which is not the case) 

How can I overcome the problem? Is there any special configuration in the tse I'm missing? 

Thank you for any help
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

That's strange... could it be noise going from the tx line to the rx line? 

You could put some signaltap probes on the MII interface (between the MAC and the PHY) to see exactly what is sent and receive, and compare them. That way you'll know if the problem is inside the TSE MAC or further out.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Thank you for the answer 

I had just made what you suggest and I discovered it's probably related to noise. IMHO is not related to fpga but to ethernet phy. 

I'm using a DBC3C40 dev board which have a National DP83640 phy. 

I monitored the CRS signal and I see that when I get the rx error CRS goes low before the complete frame has been received. When tx and rx frames don't overlap CRS behaves correctly, as far as I could test. 

This is the timing: 

time 0: start transmission of tx_frame 

time 640ns: start receiving rx_frame (crs goes high) 

time 5.36us: end of tx_frame 

time 6.32us: end of tx_frame 

When the frame is correctly received from my application, crs goes low at time 6.32us. When I get the error, crs goes low at time=2.40us (this is quite deterministic). 

 

I changed the ethernet cable but the behaviour is always the same. On the dev board there's a second DBC3C40 device available and I could try to switch the design on this one. Anyway I'd like to understand why it fails
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Found the problem cause!!! 

But now I really need help from you expert guys. 

Actually there's a bug in the RMII2MII block supplied with the dev board; I attached the vhd file. 

 

The block defines two internal signals rx_clk and tx_clk but there is some sort of synchronization issue with rx_clk: the block restarts it when an input frame is received (triggered by crs high) but this clk is also used as output clock for the tse mac. 

Depending on frame arrival time (crs on) the mac_rx_tx_clk will go half period out of phase (then the frame error) or remain in phase with tx_clk (no error case) 

I don't know much of MII interface; can anyone suggest me how can I modify the vhd code to fix the bug? 

Or better: Are there any other RMII2MII converters available? 

 

Thank you 

Cris
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

The other problem with that code is that it is generating a gated clock for the MAC. It isn't recommended and can lead to problems. I would rewrite the converter to use two clocks generated from a PLL instead. Unfortunately I don't have any card with a RMII PHY so I can't help you. 

 

There seems to be a Verilog code posted here (that obviously needs some serious reformatting): http://www.opencores.org/forum,ethernet%20mac,0,1407. I don't know if it works, and I find it rather strange to use two clocks in the sensitivity list. I'm not sure it is synthesizable.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

 

--- Quote Start ---  

The other problem with that code is that it is generating a gated clock for the MAC. It isn't recommended and can lead to problems. I would rewrite the converter to use two clocks generated from a PLL instead. 

--- Quote End ---  

 

 

Hi Daixiwen, 

I replaced the original gated clock for the MAC with another one which is simply generated from the input clock dividing by 2 without a pll. Now it works perfectly, although I've not tested it for a long time, yet. 

Probably this solution is not that good and it may be susceptible to timing problems upon reprocessing. 

In case of further problems I'll try to use the pll solution as you suggest. 

 

Thank you
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

I return on this old post because I discovered the above problem is still present in some situations. 

I hope Daixiwen or other Altera Gurus have any hints for me. 

 

I think the problem is not with mac or phy, but with dma. 

Infact I switched from TSE to opencores mac and I have exactly the same behaviour. 

I can reproduce in a deterministic way a condition where the tx frame gets corrupted if a rx frame is received in the same time. 

Here is more exaclty what I see: 

I monitor txen and txd signals that drive the phy, and crs or mac_rxdv for identifying rx frame timing. 

Tx frame is about 400bytes long (txen pulse about 32us). 

Rx frame starts being received a few us after txen rising edge. 

On most cases txen length is correct; same for data on txd. While in many cases I see corrupted data on txd, as if it has been overwritten with rx data. In consequence of this, also the txen pulse becomes lenghtened or shortened. 

As I said in previous posts this only happens if I have rx data.  

I checked the mac txclk, too, but it's perfectly in phase and it doesn't miss any cycle. 

 

I remarked that the first 240-270 bytes of the tx frame are ALWAYS correct: the data corruption always happen from a random position AFTER this threshold. 

I don't know the dma descriptors inner workings; probably 256bytes is the size of a single descriptor? So I suppose I have some problem with descriptor chaining. 

Please help. 

 

Thank you in advance for any support
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

The SGDMA buffers in the descriptors can be up to 64k long. That said I faced a similar problem when improving/debugging the TSE drivers for eCOS. I had a bug in buffer allocation and some times a received packet would end up in a buffer used for transmit. 

Are you using the standard altera TSE drivers with uCOS/Interniche? In that case, what version? 

 

You could add some debug messages displaying the buffer address and length used during each DMA transfer, just to check that you don't have any overlap. 

It could also be a cache coherency problem. Check that the driver properly flushes the cache before a DMA read and invalidates it after the DMA write (or uses the alt_remap_uncached() functions).
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Hi Daixiwen, thanks for answering me. 

As I said before, I had the same problem both with TSE and with OpenCores MAC, which is what I currently use in this design. 

Apart the mac, I use the standard uCOS/iniche. 

I have two ethernet ports in my design. 

The only diffence from standard usage is that on one port I send raw Ethernet packet eth the net->pkt_send function. 

On other port I have no problems, but in this case I think the TCP protocol automatically fixes it. 

 

Anyway, yesterday I found a way to solve the problem. 

I defined ETH_OCM_SYNC_TX in OpenCores mac driver, so that drive syncronously transmits and blocks until frame is transmitted, rather than using interrupts. 

It now uses the net->raw_send function rather then net->pkt_send. 

Now everything is working perfectly; so it was really a driver problem when managing concurrent tx and rx data. 

The performance has not decreased. I only have minimal performance drawbacks on the other ethernet port (the one using the tcp/ip stack), so I could use this as the final solution. I think I can also modify the driver in order to use sync mode only on the first port. 

 

Regarding your message: 

Version: I use Nios IDE 9.0sp2 with ucos and iniche which come with it. 

Cache: I assumed the standard driver already flushed cache; I've not driver knowledge, so I don't know how to check if flushing is performed, nor how to force it. 

Debug: same as above, I really don't know where I need to place the messages; maybe where putq and getq are called? 

 

 

Cris
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

For the debug messages; I was more thinking about the tse_mac_raw_send() and tse_mac_rcv() functions in the driver. 

I assumed too that the Altera drivers properly handle the data cache, I was just wondering if you were using a more "exotic" OS. 

I just had a look at the driver, and from what I see, it requires uncached buffers. The Interniche stack is set up properly to use them, but if you directly call the raw_send() function, you need to use an uncached buffer. If you don't you'll run into cache problems one day. I haven't looked at the opencores driver but I guess it is the same thing.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Hi Daixiwen, 

I agree it could be a cache problem. How do I flush the cache as you suggest? 

Here you find my code. As you can see, I tried to insert a couple of flush functions I found around (now commented), but they had no effect. 

Maybe this is not the correct location to place them. Do I need to place them inside pkt_send function? 

 

void ecat_send(char*data, int len) { PACKET outpkt; /* OS_ENTER_CRITICAL(); alt_dcache_flush_all (); alt_icache_flush_all (); OS_EXIT_CRITICAL(); */ LOCK_NET_RESOURCE(FREEQ_RESID); outpkt = pk_alloc(len+20); UNLOCK_NET_RESOURCE(FREEQ_RESID); if (!outpkt) { printf("Error: pk_alloc() failed"); dtrap(); return; } outpkt->net = ecat_net; memcpy(outpkt->nb_buff + ETHHDR_BIAS, data, len+ETHHDR_SIZE); outpkt->nb_plen = ETHHDR_BIAS + len; /* if a packet oriented send exists, use it: */ if (outpkt->net->pkt_send) { outpkt->nb_prot = outpkt->nb_buff; outpkt->net->pkt_send(outpkt); } else { outpkt->net->raw_send(ecat_net, outpkt->nb_buff, outpkt->nb_plen); LOCK_NET_RESOURCE(FREEQ_RESID); pk_free(outpkt); UNLOCK_NET_RESOURCE(FREEQ_RESID); } }  

 

Regards
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

In these cases it is better to use uncached buffers, through the alt_remap_uncached() function. Using the alt_dcache_flush_all() will flush all the data cache (as the name suggests) and it can also have bad side-effects if a DMA is writing to memory at the same time. 

That said, I see that you use the pk_alloc() function, that should already return you an uncached buffer, so you shouldn't need to do anything. To be sure, you could add a printf() to display the value of outpkt->nb_buff, in hexadecimal. It should have its bit 31 set. 

 

I see that you free the packet just after calling raw_send(). This will work properly only if you ensure that the packet has been sent before raw_send() returns. This could explain why you had to add this option with the Opencores driver, but AFAIK the TSE driver uses a synchronous write, so it doesn't explain why you have the same problem with the TSE.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Thank you, 

I'll check the bit 31 in nb_buff address when I will test again the board. 

 

About freeing the packet: 

I derived my send function from arp_send() or similar ones. I have no knowledge of proper usage of those free and lock calls. 

In the Opencore driver the raw_send function is NULL when I don't define the sync mode, so pkt_send is used. In sync mode it's the opposite: pkt_send is defined as NULL and a raw_send is used. 

In all case I always had TSE driver using pkt_send. 

Clearly, the pkt_send functions differ from one driver to the other.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

I don't remember how the interniche drivers are made, but basically the free and lock process are very simple: 

[list][*]whenever you touch to the packet queues or allocation/deallocation, you must lock the stack first[*]when sending in sync mode, the send function only returns once the packet has been sent. The packet therefore can (and must!) be deallocated after the send call 

[*]when sending in async mode, the send function returns once it has set up the DMA, and probably before the packet has actually been sent. In that case you musn't deallocate the packet, because if it is allocated again before it has been sent, it's contents could be overwritten. The deallocation must be done from the ISR that is triggered by the DMA after the transfer has been complete.[/list] 

I'm guessing that the difference between pkt_send and raw_send is that one is synchronous and the other isn't, but they really didn't choose descriptive names for them. As far as I know the TSE driver doesn't use interrupts on the transmit DMA and only does synchronous transfers.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

hi,SEILASER ,Daixiwen::confused:  

I wanna using the RJ45 on NEEK,The hardware part have been built,now in software part,I meet some troubles: 

1: Tripple speed ethernet.pdf I cannot read it clearly 

2: In NIOSII sofware ,I wanna to use HAL ,not tcp/ip ,IWIP,Interniche ,because I donnt understand them at all. 

I wanna to begin with the realizing the simple funtion,that is loopback function or somthing,now I try the loopback funtion,but I dont know how to write MAC frame,what exactly I need to configure the MAC registers?
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Read the MAC datasheet to learn everything about the registers. You'll need to set up the MAC right, detect the PHY, configure it correctly, detect if the cable is connected, what is the speed of the link, if it is half or full duplex, and configure the MAC accordingly. You can have a look at the Interniche driver to see how it is done. 

Then you write your Ethernet packet contents somewhere in memory and set up the SGDMA to read it and send it to the MAC. 

In my opinion it is easier to learn how to use the TCP/IP stack than try to send the packets yourself. Have a look at the simple sockets server example and modify it to suit your needs. Read some documentation about the BSD sockets API, it isn't that hard to use.
0 Kudos
Altera_Forum
Honored Contributor II
918 Views

Can you post your vhdl code about the mii to rmii,thank you

0 Kudos
Reply