FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

no free buffers for rx

Altera_Forum
Honored Contributor II
5,092 Views

Hello  

How do i increase the receive buffer size.... 

till now i have done following... 

 

1. increase the receive Depth of TSE Mac... 

2. Increase the MAXBIGPKTS from 30 to 512 in "ipport.h" 

3. Increase the NUMBIGBUFS in "ipport.h" from 30 to 512 

4. putting .heap and .stack in different memory region 

 

but till now there is no success. 

i have a image server who sends 512 UDP packet of size 512....on another end nios based receiver is running his job is to collect all these UDP packet in either internal or external memory...and send serieally to other cyclone device for further processing... 

 

currently i am able to receive 4 UDP packets of size 512...not more then that.. 

 

 

regards 

 

kaushal
0 Kudos
52 Replies
Altera_Forum
Honored Contributor II
1,964 Views

You can try to enable -O3 optimization, which would increase the stack speed, but I doubt You'll be able to achieve Your task using the stack. I would offer to do the UDP reception in hardware. I've successed sending and receiving UDP data only in hardware without any problems.

0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

You can also check if it is possible for your server to wait a bit between each packet, giving time to the Nios system to process them. Or use a TCP connection.

0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

If you have systematic packet loss at high packet rates I'm not sure that using TCP will work as well as you want it to. Yes, it will do retransmittions for lost packets and things like 'slow start', but the overall outcome won't necessarily be what you want. 

 

If the sending side doesn't spread the packates out (over time), they will be transmitted at LAN line rate, a 100MHz cpu is unlikely to be able to do 'normal' IP processing for line rate at for a 100MHz ethernet, and Ge speeds will be completely out. It is also very likely that the sending system discards UDP transmit requests when the application exceeds the network bandwidth (rather than blocking the send() call). 

 

For maximum throughput you may need to write your own ethernet code that does the minimal required. 

You might get away with a much faster version of the IP checksum routine - the list achives might be illuminative... 

Also consider using UDP with a simple ack strategy.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

One useful feature of TCP when the receiving CPU is a bit slow is the window size adjustment. It should make the source automatically reduce the transmit rate to give the destination enough time to process the data, without any packet loss. 

IIRC a checksum on the UDP data contents is optional (at least on IPv4), so if the source doesn't provide UDP data checksum, optimizing the IP checksum routine won't speed things up a lot.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

I've seen serious problems due to unwanted interactions between TCP 'slow start', 'delayed acks' and Nagle (disabled) on zero-delay local networks with connections carrying uni-directional non-bulk traffic (ie relaying messages received on another network). 

To ensure data is actually sent we had to send data in the reverse dircection at least every 4 ethernal packets.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

 

--- Quote Start ---  

You can also check if it is possible for your server to wait a bit between each packet, giving time to the Nios system to process them. Or use a TCP connection. 

--- Quote End ---  

 

 

Hello There, 

 

Yes, I have put little delay in between two packet's (in Image source -server side)....and in that case my nios-ii based client receive all the UDP packets...and do the required job (also send to other FPGA for further processing), but in practical scenario server will send UDP packet's continuously without delay. 

 

Regards 

kaushal
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

The only way you'll guarantee the not discard any of the packets (assuming they actually reach your ethernet interface) is to have enough buffering for the entire packet burst availably directly to the ethernet MAC unit - eg some kind of descriptor DMA engine with over 512 rx buffers. 

(Well, a full hardware (or 95% hardware) solution might work - but you'd have to implement what is effectively an ethernet switch.) 

 

Do some sums: 512 bytes UDP data + (about) 48 bytes header is about 4500bits. At 100M (ethernet) your 100MHz nios has about 4500 clocks to process each packet. Run the network at Ge speeds and that might drop as low as 450 - depending on the performance of the sender. 

 

Hand crafted code might manage to process the packets, but a general UDP/IP stack wont. And you wouldn't want the processor to be doing anything else at all.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

 

--- Quote Start ---  

You can try to enable -O3 optimization, which would increase the stack speed, but I doubt You'll be able to achieve Your task using the stack. I would offer to do the UDP reception in hardware. I've successed sending and receiving UDP data only in hardware without any problems. 

--- Quote End ---  

 

 

Please put some more light on your suggestion....i mean how do i do UDP reception in hardware as ..i am working at application layer, and UDP packet reception done on TCP/IP Stack & MAC/PHY layer.
0 Kudos
Altera_Forum
Honored Contributor II
1,963 Views

 

--- Quote Start ---  

I've seen serious problems due to unwanted interactions between TCP 'slow start', 'delayed acks' and Nagle (disabled) on zero-delay local networks with connections carrying uni-directional non-bulk traffic (ie relaying messages received on another network). 

To ensure data is actually sent we had to send data in the reverse dircection at least every 4 ethernal packets. 

--- Quote End ---  

 

I've had similar problems, and found out that the cause was that the RX FIFO in the TSE component was much lower than the TCP window size reported by the Niche Stack (16k IIRC), and as the stack is too slow to receive a full burst some packets would be lost. Increasing the Rx FIFO size to something higher than the maximum TCP window size solved all our problems, but you must of course be sure that you have enough resources in the FPGA for that (especially if you plan to have simultaneous high speed TCP connections, and must increase the FIFO size even further). 

 

--- Quote Start ---  

Please put some more light on your suggestion....i mean how do i do UDP reception in hardware as ..i am working at application layer, and UDP packet reception done on TCP/IP Stack & MAC/PHY layer. 

--- Quote End ---  

 

I think he is talking about this design (http://www.alterawiki.com/wiki/nios_ii_udp_offload_example) which would indeed be the best choice for high performance UDP reception. I think there were some problems adapting this example to Quartus 11 though, but a search on the forum should give you more information.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

If your bursts of packets are well spaced out (in time), then some sort of DMA engine to copy them from the RX FIFO into external memory (from where the IP stack would process them) might help for 100M ethernet, for Ge you don't stand a chance (external memory isn't fast enough). 

 

I don't know if there is one available (I've not looked at the TSE MAC). But, if you aren't that tight for fpga resources, you could use a nios cpu as a custom dma controller. A single M9K block dual ported to tightly coupled I & D ports should be enough code/data. You'll need to do burst writes to SDRAM which might require a small data cache with 32byte lines, but check that the SDRAM interface doesn't merge sequential writes first. 

At 100M ethernet a 100MHz nios has 32clocks per 32bit word to copy the data - probably just enough.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

Hello There, 

 

Please see the Print-Screen of TSE_MAC Option in SOPC builder. 

 

1. I have enable -O3 optimization too but i won't get desire result.  

2. i have disabled the unwanted services from ipport.h also 

 

in ipport.h there is statement like 

# define BB_ALLOC(size) npalloc(size) /*Big packet buffer alloc*/ 

 

is this define statement the "size" is the same thing which i given to MAXBIGPKTS 512, or something else. by seeing this statemnet it look like it is assigning the memory region for the received/transmit packet. 

 

With regards 

kaushal
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

Yes this is the function that is used to allocate the packet buffers for the interniche stack. You can override it with your own functions if you want to use a different memory for that, such as a high speed on-chip RAM block.

0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

i am able to receive 460 UDP packets of size 512 out of 512 UDP packet, i have add on-chip memory using sopc builder and then connect it to cpu and tse_mac (please see the attached configuration) and made small changes related to packet memory in ipport.h. 

if i add more packet to transmit from server side it start saying again "no free buffer for rx" 

 

but i am still running out of 512-460 = 52 UDP packet to get it done .  

i just wanted to know what is the data flow in this case after adding packet memory, is it something like 

 

tse_mac--->packet memory----->TCP/IP_inich----->CPU------>Application 

 

or something else so that i can add sufficient memory on desire place  

 

with regards 

kaushal
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

I'm not sure I understand how you made your connections to the packet memory. If you are attempting to do as they describe in application note 440 (http://www.altera.com/literature/an/an440.pdf), then the packet memory needs to be connected to the SGDMA's read and write masters (and only them, not the descriptor masters) and to the a CPU data master. 

What I would do to get the best performance in to connect the m_read and m_write masters to the packet memory's s1 interface (nothing else from m_read/m_write, and nothing else to s1) and add a tightly coupled data master to the CPU, individually connected to the packet memory's s2 interface. 

And of course once you do that, modify the ipport.h file so that the memory allocation functions use the packet memory.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

Hello daixiwen, 

 

I have connected my packet memory according to your previous post 

 

1. S1 of packet memory is connected to m_read & m_write of SGDMA 

2. S2 is Connected to the data master of CPU 

 

by doing so data master bus clash with flash (please see the attached print screen of SOPC) if i remove flash connection , there is no reset vector for CPU 

 

as you mentioned in your previous post, please suggest how do i direct the memory allocation function to use packet memory..or what changes do i made in ipport.h..so that the memory allocation functions use the packet memory.? 

 

With Regards 

 

kaushal
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

Did you use the 'reassign base address' menu item? It should move the addresses around to avoid clash between peripherals. Just ensure that both s1 and s2 stay at the same address (if they change, you can put them back to a fixed address and click on the lock next to it, to prevent the 'reassign base address' option to change them). 

As for the ipport.h modifications, I don't remember the exact procedure, as it was made a long time ago, but it should be explained in the application note. I think there is just a define to add or to uncomment somewhere to tell the driver to use a specific address instead of allocating the packet memory through malloc().
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

It is also worth checking that the sending side (or a network switch) isn't discarding some of the packets. 

The 'traditional' behaviour of UDP sockets is to discard rather than block the application when the app exceeds the number of messages the sending OS is willing to queue.
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

hello Daixiwen, 

 

I have add the tightly coupled data master to the CPU, individually connected to the packet memory's s2 interface.  

 

and connect the m_read and m_write masters to the packet memory's s1 interface,  

 

after that i have assign the base address through menu-->assign base address 

i have got the two error saying "descripter_memory.s1:appear at more than one address" and "ddr2_bot.s1: apper at more than one address", though i have reassign base address but nothing happen and these two error remain there.  

please see attached print screen of my sopc builder , am i doing wrong connection ..?
0 Kudos
Altera_Forum
Honored Contributor II
1,964 Views

hello dsl, 

 

when i increase the TSE_MAC, FIFO DEPTH i have got more number of UDP packet, (though i have reach maximum tse_mac fifo depth) so it seems that network switch won't discard any UDP packet.
0 Kudos
Altera_Forum
Honored Contributor II
1,754 Views

If you have some pipeline bridges, try to set their base address to 0 and lock it. It should get rid of the message.

0 Kudos
Reply