Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12589 Discussions

Actual maximum data rate of Nios w/ 10/100/1000 TSE MAC on Stratix IV

Altera_Forum
Honored Contributor II
1,279 Views

Dear all, 

 

Currently I'm developing a Gigabit Ethernet Interface project on Stratix IV which employs Nios (uCOS-II) with 10/100/1000 TSE MAC. This interface project is dedicated to forward data from several DSP boards to a PC via gigabit ethernet connection. 

 

The current progress, the data rate of my Stratix IV is only 20 Mbps. I'm wondering why the data rate is very slow. 

From the Altera 440 app.note, it is written that the data rate of TCP connection using 10/100/1000 TSE MAC on Stratix II is about 116 Mbps (Have anyone tried to implement this app.note? :confused:).  

 

I've been trying to implement some design tips in this app.note. However, when I tried to use faster packet memory (using onchip memory instead of DDR3 to store my data packet, and making some modifications in ipport.h as written in app.note 440), my Nios software stopped. It stopped before entering the main program (from the Nios console, it only showed this text "==== Software License Reminder ====" , which is the initial part of the program).  

 

Do anyone know what causes this problem? :confused: 

 

Please help me :( I really appreciate any helps or comments. Thanks..
0 Kudos
15 Replies
Altera_Forum
Honored Contributor II
538 Views

In my experience, the two actions that have the most impact on network performance are the on-chip packet memory and turning the compiler optimizations. You should at least use -O2, I don't think that -O3 had a huge impact. 

Are the SGDMAs properly connected to the on-chip memories? 

Does your hardware design meet all timing requirements?
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Hi Daixiwen, thanks for your reply! 

 

I adopted the Altera 440 app.note design in my current design. I added an onchip 

memory (an M144K type) which dual ports are connected to TSE's SGDMA Tx, SGDMA Rx, 

and the CPU data master (I attached my SOPC system's snapshot).  

Then, as written in the 440 app.note I made some modifications in the ipport.h. 

There are : 

 

# define ALTERA_M144K_FOR_PACKETS 1 # define ALTERA_M144K_ALLOC_BASE PACKET_MEMORY_BASE # define ALTERA_M144K_ALLOC_SPAN PACKET_MEMORY_SPAN  

 

# ifdef ALTERA_TRIPLE_SPEED_MAC char * ncpalloc(unsigned size); void ncpfree(void *ptr); //char * ncpballoc(unsigned size); //int ncpbfree(void *ptr); # if ALTERA_M144K_FOR_PACKETS # define BB_ALLOC(size) ncpballoc(size) /* Big packet buffer alloc */ # define BB_FREE(ptr) ncpbfree(ptr) # define LB_ALLOC(size) ncpballoc(size) /* Little packet buffer alloc */ # define LB_FREE(ptr) ncpbfree(ptr) # else # define BB_ALLOC(size) ncpalloc(size) /* Big packet buffer alloc */ # define BB_FREE(ptr) ncpfree(ptr) # define LB_ALLOC(size) ncpalloc(size) /* Little packet buffer alloc */ # define LB_FREE(ptr) ncpfree(ptr) # endif # else /* Not ALTERA_TRIPLE_SPEED_MAC */ # define BB_ALLOC(size) npalloc(size) /* Big packet buffer alloc */ # define BB_FREE(ptr) npfree(ptr) # define LB_ALLOC(size) npalloc(size) /* Little packet buffer alloc */ # define LB_FREE(ptr) npfree(ptr) # endif /* ALTERA_TRIPLE_SPEED_MAC */  

 

# if ALTERA_M144K_FOR_PACKETS # define NUMBIGBUFS 30 # define NUMLILBUFS 30 /* some maximum packet buffer numbers */ # define MAXBIGPKTS 30 # define MAXLILPKTS 30 # define MAXPACKETS (MAXLILPKTS+MAXBIGPKTS) # define BIGBUFSIZE 54272 # define LILBUFSIZE 1280 # else # define NUMBIGBUFS 30 # define NUMLILBUFS 30 /* some maximum packet buffer numbers */ # define MAXBIGPKTS 30 # define MAXLILPKTS 30 # define MAXPACKETS (MAXLILPKTS+MAXBIGPKTS) # endif  

 

That's all and the result was like I wrote in the first post. My program just 

stopped and only showed  

"==== Software License Reminder ====" 

 

Do you know what's wrong with my system design? Are the modifications that I made in 

ipport.h enough to bypass NicheStack from copying & removing from memory buffer 

process? 

 

In my current system the original data is stored in the DDR3. What I'm thinking is 

the process during TSE transmission will be like this : 

- My system receives data from DSP board and stores them in DDR3 

- When transmitting data via TSE, the original data will be copied to the onchip 

memory (as TSE buffer), then they are sent. 

 

Is my opinion correct? 

 

In your experience, what is the maximum data rate that you can achieve?
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

And this is the code for allocating memory that I got from app.note 440 

 

# include "system.h" # include "ipport.h" # include "libport.h" # include "in_utils.h" # include "osport.h" # ifdef ALT_INICHE # include "alt_iniche_dev.h" # include "sys/alt_cache.h" # endif # ifdef UCOS_II /* If uCOS-II, bring in the needed include files */ # include "includes.h" # endif # ifdef ALTERA_M144K_FOR_PACKETS # ifndef ALTERA_M144K_ALLOC_BASE # error ALTERA_M144K_ALLOC_BASE must be defined when using ALTERA_M144K_FOR_PACKETS # endif # ifndef ALTERA_M144K_ALLOC_SPAN # error ALTERA_M144K_ALLOC_SPAN must be defined when using ALTERA_M144K_FOR_PACKETS # endif extern OS_EVENT * mheap_sem_ptr; char * ncpballoc(unsigned size) { static unsigned int next=0; // offset into memory char* mem=0; # ifdef UCOS_II INT8U err; # endif # ifdef UCOS_II OSSemPend(mheap_sem_ptr, 0, &err); if(err) { int errct = 0; /* sometimes we get a "timeout" error even though we passed a zero * to indicate we'll wait forever. When this happens, try again: */ while(err == 10) { if(errct++ > 1000) { panic("npalloc"); /* fatal? */ return NULL; } OSSemPend(mheap_sem_ptr, 0, &err); } } # endif if(size > (ALTERA_M144K_ALLOC_SPAN - next)) { panic("out of mem\n"); return (NULL); } # ifdef UCOS_II err = OSSemPost(mheap_sem_ptr); # endif mem = (char *) alt_remap_uncached((char*)(ALTERA_M144K_ALLOC_BASE + next), size); if(size & 0x3) { size &= ~0x3; size += 0x4; } //printf("mram alloc: %u for %d\n", mem, size); next+=size; MEMSET(mem, 0, size); return mem; } int ncpbfree(char *ptr) { panic("M144K_FREE CALLED --- ERROR"); dtrap(); return 0; } # endif //ALTERA_MRAM_FOR_PACKETS
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Those two lines seem suspicious. Are you sure you need such big sizes? 

#define BIGBUFSIZE 54272 # define LILBUFSIZE 1280 

With 128k memory you won't even be allowed to allocate 3 big buffers
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Hi! Thanks for your reply. 

 

Hmm...the reason why I need 128K buffer is because I planned to store data from the DSP board (in the SOPC system that I attached in the previous post, the data from FIFO module) in the on-chip memory. So I'm thinking to use DDR3 to store my program, and the on-chip memory to store data. Is it OK? 

 

Btw, ah .. you right I need to decrease the number of packets. It's too many.
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

I think you can reduce the packets sizes too... They won't be bigger than 1500 bytes anyway.

0 Kudos
Altera_Forum
Honored Contributor II
538 Views

And sorry I didn't answer your question, but yes 128k is fine. I used 64k on the cyclone III development board and I managed to make it work.

0 Kudos
Altera_Forum
Honored Contributor II
538 Views

We did get 900 mbs using the offset UDP example on a cyclone III..... 

Ask you rep for the design. 

 

Daixiwen probably know about the design
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Hi all! Thanks for the reply 

 

I found what the problem was. Thanks to Daixiwen for pointing the suspicious part :). The problem was caused by unappropriate buffer & packet size configuration. From several trials, I found that if the defined buffer & packet size were bigger than the required buffer by NicheStack, the NicheStack would be not created / failed (the FPGA couldn't be contacted via TCP/IP connection). Now my FPGA datarate is 72 Mbps. 

 

Even though the datarate becomes higher, it is still lower than what I expected. So now I'm trying to replace the network checksum function with the accelerator hardware using C2H compiler. However when I built the application file, I got this following error : 

getValue Error : unknown parameter c2hacceleratorgroup  

Anybody know what caused the error & how to solve it? 

 

Is it possible to make offload TCP design? If it is possible, what things that I have to consider? 

 

Thanks
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Possibly the simplest way do get a significant tcp checksum speedup is to use a custom instruction to add the high and low 16bis of one register onto a second. This will save all the faffing about that will currently happen trying to generate 'add with carry'. Also ensure that the checksum code loop has no data stalls.

0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Hey Guys, 

 

I'm about to do the same thing with a cyclone 3/4. What data rate did you guys actually achieve. I have got a basic design done that runs a TCP and UDP task on a Nios running uCos and the niche stack. 

I was hoping worse case scenarios to get that to trasmit 96Mbps + ovrhead. 

What did you guys achive with the cyclone 3 and did you use the niche stack?
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Hi, 

we used an adapted version of the offset UDP example and nichestack. 

TCP IP for control and UDP for stream. 

And we are filling the bus, so 92-96 Mb. 

Works great, the first products has already shipped.
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Thanks for the reply agdepus. 

  • For UDP, by offset example you mean this example?http://www.alterawiki.com/wiki/nios_ii_udp_offload_example

  • Correct me if i'm wrong: You use the Nios, niche and uCOS for the TCP. For UDP you bypass all that and used just HDL? 

  • What did you adapt? My undertsanding was that the the offload example fully bypassed the Nios, have you mixed it? I've still got to understand this fully though. 

  • Was the UDP offload example difficult to implement? Do you have any tips or example code to lessen my learning curve? 

0 Kudos
Altera_Forum
Honored Contributor II
538 Views

 

--- Quote Start ---  

We did get 900 mbs using the offset UDP example on a cyclone III..... 

Ask you rep for the design. 

 

Daixiwen probably know about the design 

--- Quote End ---  

 

 

Hi agdepus and Daixiwen, 

 

I have ported udp offload example on DE2-115 board(Cyclone IVE) and speed I am achieving is 231 Mbps. I want to achieve speed of around 700 Mbps but When I increase the speed of PLL clock feeding sys(100 MHz)/pio(35 MHz), .elf file failed to download on the board (.sof file have no problem with downlaoding). I have used fastest Nios in the design. 

 

currently I am working on time constraint.  

 

What are the other factors which have to be considered to achieve speed around 700 Mbps? 

why .elf file failed to download on the board after increasing speed?
0 Kudos
Altera_Forum
Honored Contributor II
538 Views

Hi, 

Im a bit rusty in this field (do not much design these days), but you may not expect a fully working offloader without proper constraining the design. Especially the 100MHz clock can be a stretch for the Nios. 

If the logic cannot handle the speed the your elf will fail.
0 Kudos
Reply