Hello.I am trying to get as much data rate as possible from Niche stack that is working on NIOS processor (now with MicroOS). Now I am using TCP for transmission. I think I've get all that I can with HW optimization (NIOS memory interfaces and so on). Now I want to understand how can I optimize the SW to send information faster. So, several questions like "have somebody tried?". 1) What is the actual requirements of MicroOS+Niche stack to the stack/heap memory? It is used only for data transmission (no other tasks), so if to know digits for these 2 components, it will be usable for me. All my memory is on-chip, and this is limiting factor of the Fmax now. 2) Do somebody tried to compare the transfer rate for cases with and without MicroOS? What rate benefit can "Soperloop" give? 3) Is it possible to tune Niche stack so to force it to use jumbo frames (8K for example). Now it uses 1500 bytes packets and up to this time I can't force it to increase the size of the packets. Can Ipv6 help with this? If this is possible, what benefit can be expected? 4) What is the benefit for UDP comparing with TCP for NIOS+MicroOS+Niche stack? Thank you all in advance for answers.
hi,you may want to start from this altera an440 (http://www.altera.com/support/examples/nios2/exm-ethernet-acceleration.html). UDP vs TCP: speed vs reliability. * In UDP you send and assume it got received - Nios has less to process (less communication overhead) * In TCP it is ensured that the packet is received (retransmission, etc)
Thank you for the answer.>you may want to start from this Altera AN440. I've just read AN440. There are a lot of good ideas, but no clear answers for my questions above. >UDP vs TCP: speed vs reliability. I know what is the difference. I would like to know the practical speed benefit of UDP compared to TCP for NIOS/Niche implementation (approx speed gain in percents).
What's the desired throughput? And what's your current figure?Nios is a rather "slow" machine when compared to hw processors, so it's unlikely you can exploit the full link bandwidth; this is true at least with TCP which involves a lot of protocol management. UDP is much more fast, especially with short packets; but, as dipling has already pointed out, you have to trade speed with reliability.
Be careful with jumbo frames. You need to know that every device in the path can support such frames. Otherwise the frame will be truncated or dropped. I would recommend sticking with the 1518 (1500 MTU) max frame size.
>What's the desired throughput? And what's your current figure?This is not matter. I don't want to get any given speed. I want only to take so much as possible from the HW that I have. >UDP is much more fast, especially with short packets; Much more - how much? 200%? 50%? I need to know what will be the benefit to know is it usefull to spend time to get this benefit or not.
--- Quote Start --- I want only to take so much as possible from the HW that I have. --- Quote End --- I am not aware of any cookbook or example project that supplies a "hardware and software design for maximum network performance using NIOS-II and software-only" AN440 is about as good as it gets, and you probably need to work through it and do your own measurements. From past experience, I think of all the things you are asking about (UDP vs. TCP, uCOS-II vs. superloop, Jumbo vs. not), the only one worth pursuing is UDP simply because once you are talking about UDP, you open the door to easy hardware implementation which will get you performance as +/- fast as the media will allow. Most people are interested in superloop due to uCOS-II licensing fees that must be paid. Good luck.
With the UDP offload example (on the wiki (http://www.alterawiki.com/wiki/nios_ii_udp_offload_example)) you can hit line rate (or very near)...if your application can work in this way.It's what I would do with Nios II and UDP. You could see 10x+ performance increase over TCP on a GbE network here. Cheers, slacker
Thank you all for your answers.Please, can somebody answer, what is the actual requirements of MicroOS+Niche stack to the stack/heap memory? How can I determine this value, from datasheets or maybe during debug? I am using on-chip memory, so memory size is critical for Fmax.
--- Quote Start --- >UDP is much more fast, especially with short packets; Much more - how much? 200%? 50%? I need to know what will be the benefit to know is it usefull to spend time to get this benefit or not. --- Quote End --- A comparision between TCP and UDP speed performance must take into account how data is exchanged. Let's assume your client need to transfer 100 bytes of data at once and then wait for a 100 bytes answer from the server. UDP would send a single packet, 100bytes data payload + 42 bytes protocol overhead; then you'd have to wait the single packet answer from the server. TCP instead would initially send a packet with the 100bytes data payload + 54 bytes protocol, then it would wait for acknowledgement from the server; the ack frame could or could not contain the actual answer: depending from server speed and its tcp protocol implementation, the actual answer data could be send later with another packet. Eventually the client must send its acknowledgement for receiving the answer frame. So, in this simple case you would have 3 to 4 packets for a simple client- server transaction, with the associated delays due to protocol management and trasmission times on the physical layer medium. In a different situation, where the client is supposed to send a lot of data in streaming fashion, TCP could be more convenient, since the transport protocol allows continuos transfer of reliable data without introducing significative delays.
Please can somebody explain, how can I get MTU more than 15xx bytes for niche stack to transmit packets with sizes like 9600 bytes.I've tried to do something, but my TCP packets are dividing by the stack into 15xx pieces (I am looking with Wireshark on the PC connected to the board). By my experiments, using the large packets should give high actual performance gain, but I still can't use packets larger than 15xx bytes.
You'll find the requested information in this document (Section 11-5): http://www.altera.com/literature/ug/ug_ethernet.pdfJust ensure that the peer NIC supports them as well. The TSE MAC (if that's what you are using) does support Jumbo frames.
I haven't used the niche stack in a long while but I think it will need to be modified in several places to support jumbo frames. To begin with the MBUFs used to store the packet data can have two different sizes, and the biggest one is just big enough for a standard frame. You'll either need to increase the MBUF size for the big packets, or somehow manage to use chains of MBUFs to store the data. There are probably several places in the code where the packet size is checked too, so you'll have to go in the source and modify it yourself.