Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12606 Discussions

Altera TSE driver and example program for lwIP (1.3.2)

Altera_Forum
Honored Contributor II
17,485 Views

After many many requests and complaints about lack of support and/or documentation for support of lwIP for the Altera TSE, I have developed a drop-in TSE driver and example program and made this available to the NIOS II community. This was done for NIOS II 8.1 SP0.01. I don't expect difficulty with version 9.x. 

 

This is for the latest version of lwIP (the latest is as of this post) for a minimal program and HTTP server based on the http server in the lwIP contrib folder. The lwIP TSE driver uses the altera_avalon_tse driver and SGDMA as-is. There is a complete (as in 41-step) set of instructions on creating the project and example program. More information and the link to the driver is available here: 

 

http://lwip.wikia.com/wiki/available_device_drivers#lwip_1.3.2 

 

Please direct any questions, changes for NIOS II 9.1, or comments to this thread. 

 

12-16-2010 update: This example works with NIOS Version 10.0 with some tweaks to the procedure to create the project. Also, a lwIP 1.4 release candidate has been out for a while and it drops into this example (in place of 1.3) without changes. 

 

Bill
0 Kudos
257 Replies
Altera_Forum
Honored Contributor II
806 Views

 

--- Quote Start ---  

 

ERROR : MAC Group[0] - No PHY connected!  

ERROR : PHY[0.0] - No PHY connected! Speed = 100, Duplex = Full  

 

--- Quote End ---  

 

 

WAIT!!! I missed this - the PHY is not detected. In 8.1 (which I'm still using) the Marvell 88E1119R is not supported. If it's not supported in altera_avalon_tse.h/.c you will have to add it. Model the other PHY's# defines and the initialization structure table to add support for the 88E1119R. 

 

The OUI model and rev are: 

 

enum { MV88E1119_OUI = 0x005043, MV88E1119_MODEL = 0x28 , MV88E1119_REV = 0x2 };Bill A
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

You can also call alt_tse_phy_add_profile (I'm not sure when) to add the PHY without customizing altera_avalon_tse.c/.h. 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

 

--- Quote Start ---  

Hi! 

 

Many thanks to Bill for useful example and step-by-step pass-throw. I successfully compiled lwIP_NIOS_II_Example in EDS 10.0. A couple of minor issues I faced with: 

1) steps 25-40. I tried to set paths and symbols through the GUI, but this didn't work. The workaround was to edit lwIP_NIOS_II_Example/Makefile manually. Also, to get “INFO” output like in Readme.txt ALT_DEBUG must be defined for lwIP_NIOS_II_Example_bsp project. 

2) TSE-related components in SOPC must have standard names: 'tse_mac', 'sgdma_tx', 'sgdma_rx' and 'descriptor_memory'. If different names are used the code successfully compiles, but does not function. 

 

--- Quote End ---  

 

 

Hi Igor! Can you post your lwIP_NIOS_II_Example/Makefile please?  

Thanks!
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

 

--- Quote Start ---  

You can also call alt_tse_phy_add_profile (I'm not sure when) to add the PHY without customizing altera_avalon_tse.c/.h. 

 

Bill 

--- Quote End ---  

 

 

Bill!  

 

I've done as you suggested and now the driver detects my PHY and correctly reports network speed. DHCP does not work at 1000M yet, but I think this might be a hardware problem, probably with clock phase. Trying to make out with this... 

 

Thank you for your help,  

Igor
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

 

--- Quote Start ---  

Hi Igor! Can you post your lwIP_NIOS_II_Example/Makefile please?  

Thanks! 

--- Quote End ---  

 

Attached. Diff at line 7 only: 

ALT_INCLUDE_DIRS := . ./lwIP/src/include ./lwIP/src/include/ipv4
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

Guys,I am in troubles. 

First of all,I cannot get this example working. 

Basically, I need to build application, similar to FTP. Application, which has two(2) open connections with two ports.Over one of ports, received commands and sent responses. 

Over second port,I send data.But this is in general. 

At this stage I want to build application (based on BillA's example), which shall be able to receive packets from other side(PC,for example). 

When i use BillA's example,and there is connection to external net,i receive the packets,exactly like BillA sad, but it happens not on Transport layer(not by tcp_... functions).I think,it happens on IP leyer(by netif... functions). 

If i want to receive packets, do i need to define callback functions and why? 

If YES, how to do it? Somebody can show me on example.I do not need the application,i need a little push,some help. 

 

P.S. When i tried to fined the problem, i fount,that in httpd.c file in httpd_init there is some error(maybe it is not an error and it is suppose to be like this,but i think there is an error). 

In line: pcb = tcp_listen(pcb); tcp_listen function returns NULL and think it is must be different. 

 

Thank,you guys!!!!!!!!!!!!!!
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

 

--- Quote Start ---  

Guys,I am in troubles. 

First of all,I cannot get this example working. 

Basically, I need to build application, similar to FTP. Application, which has two(2) open connections with two ports.Over one of ports, received commands and sent responses. 

Over second port,I send data.But this is in general. 

At this stage I want to build application (based on BillA's example), which shall be able to receive packets from other side(PC,for example). 

When i use BillA's example,and there is connection to external net,i receive the packets,exactly like BillA sad, but it happens not on Transport layer(not by tcp_... functions).I think,it happens on IP leyer(by netif... functions). 

--- Quote End ---  

 

 

Why do you say this? 

 

If you use the code in httpd.c, you see how to make a connection, handle incoming data (replace http_recv with your function), send data (send data first and then send subsequent data (if any) using the http_sent callback). 

 

HTTP runs on top of TCP so this is using the TCP transport. If you want a reliable connection this is how you should do it. 

 

 

--- Quote Start ---  

 

If i want to receive packets, do i need to define callback functions and why? 

 

--- Quote End ---  

Yes. Because the stack runs and gets data at any time so you have to be expected to be told when data is there - at any time. You response to that data (your commands) by sending data out (on the same connection or another if you choose). 

 

 

--- Quote Start ---  

 

If YES, how to do it? Somebody can show me on example.I do not need the application,i need a little push,some help. 

--- Quote End ---  

Use the code in httpd.c - replace http_recv with your incoming data handler, and send data as is done there - use http_send and use the http_sent callback if one call to send data isn't enough (you cannot control this because of the tcp_sndbuf check so implement sending just as is done in httpd.c). 

 

 

--- Quote Start ---  

 

P.S. When i tried to fined the problem, i fount,that in httpd.c file in httpd_init there is some error(maybe it is not an error and it is suppose to be like this,but i think there is an error). 

In line: pcb = tcp_listen(pcb); tcp_listen function returns NULL and think it is must be different. 

 

--- Quote End ---  

Maybe you need to change LWIPOPTS.h to allow for more TCP connections or PCBs? It is an error to return NULL and you need to debug why but my guess is not enough resources for your call. 

 

Bill A
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

Actually, when i connect my board(at this stage i use Cyclone III Development Kit board), i do not see it on network sniffer.Even on ARP layer.

0 Kudos
Altera_Forum
Honored Contributor II
806 Views

Do you know the hardware is OK? Can you build simple_socket_server with Interniche and be sure board talks and connects? Several people have reported that this example works as-is, so if it doesn't, I wouldn't be looking into lwIP or this example but deeper into the hardware (PHY, MAC, SGDMA, SOPC configuration, etc...). 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

I am using Altera evaluation board,so I think,that HW is OK.

0 Kudos
Altera_Forum
Honored Contributor II
806 Views

I'm not sure what to tell you. You have the hardware and 2 examples that should work (lwIP and Interniche). lwIP doesn't work. Only you can tell if Interniche does or doesn't work. If it does work, then we know to look for a difference in the interface on this board to the TSE driver. If Interniche doesn't work, there is no point looking at lwIP since it won't work either. They use the same TSE/SGDMA driver code and I know the lwIP specific code is good - others have confirmed this (plus, I use a derivative of it in shipping products). 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

And what about MAC Address? 

Do I must to enter CORRECT MAC address or enough to enter MAC address,which is different from other MAC addresses in my subnet?
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

You can use the one that the dev kit will use when running e.g. the Interniche examples. If it doesn't come with a MAC address, you will have to use a temporary one but be sure to stay behind a router or directly connected to a PC. I use a real test MAC address from our block of addresses that I know is not in use locally. 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

Bill! 

 

Could you give some hints how to improve throughput in lwIP-based solution? 

 

The main task of my application is transmitting constant rate bitstream (ADC data) to remote computer. Expected payload is about 100 Mbit/sec, so, to have some margin, I’d like to achieve 150 Mbit/sec UDP throughput. 

 

For experimenting I use lwIP_Example you kindly posted here + I added routine for generating UDP traffic (see udp_test.c attached; udp_test_init() is called just after IP address is assigned by DHCP, udp_test() is called repeatedly from lwIP loop in main()). I measure throughput by ‘bm’ utility from Altera's “Nios II Ethernet Acceleration Design Example”. On the moment I I have 55 MBit/sec. 

 

What improvements can be made from software side? 

1) Switching from –O0 to –O3 improved bitrate from 11.5M to 55M (almost 5 times). What about other compiler options? 

2) I see that lwIP code contains provisions for linking several packets into a chain. Can I benefit from this feature and how to use it? 

3) What are the most performance critical parts of lwIP that can be disabled? 

4) I have point-to-point link and can assume that receiver has known in advance IP and MAC. Is it possible to save on routing? 

5) Any other ideas? 

 

Thank you in advance, 

Igor
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

5) to have higher udp bandwidth, you can also generate the UDP packets in hardware, as shown in this example: http://www.nioswiki.com/exampledesigns/nios2udpoffloadexample

0 Kudos
Altera_Forum
Honored Contributor II
806 Views

Igor, 

 

This question comes up a lot on the lwIP forums. Unfortunately it often is responded to with "lwIP is lightwieght and performance is secondary", and IMO the performance part of lwIP is neglected. It could perform *way* better. 

 

However, I spent months optimizing lwIP, Altera drivers and my code to get the performance we require for the 100MHz NIOS II we're running. Without my effort our product line for this would have been canned. Out of the box performance is poor with the Altera hardware and lwIP. 

 

I will try to order this in the order of importance:
  1. Optimize -O3 as you saw helps a lot. As difficult as it is, I debug this way. For better debugging and not a huge hit in performance, use -O1. 

  2. Use the inline IP header checksum in lwIP (I contributed this by the way). It helps a lot. 

  3. Do the UDP/TCP checksum in Verilog/VHDL (in hardware). If you can't, use assembly code for inet_chksum. If you can't, use the optimized (option 3) C inet_chksum. Or, simply disable UDP checksum in LWIPOPTS.h. UDP tends to drop packets, not change bytes in packets. Running with UDP checksumming disabled will not be an issue normally. 

  4. Replace SMEMCPY with an efficient inline memory copy. 

  5. Do the following code/data relocations:
  6. Put inet_chksum in onchip RAM (if you use it) 

  7. Put ethernetif in onchip RAM 

  8. Put ethernetif->lwipRxPbuf in onchip RAM 

  9. Put tse in onchip RAM 

  10. Put tse_mac_device in onchip RAM 

  11. Use separate memory pools and put PBUF_POOL in onchip RAM 

  12. Put netif_list and netif_default in onchip RAM 

  13. Put pbuf_header in onchip RAM 

  14. Put lwip_stats in onchip RAM 

  15. Put arp_table in onchip RAM 

  16. Put find_entry in onchip RAM 

  17. Put etharp_send_ip in onchip RAM 

  18. Put etharp_find_addr in onchip RAM 

  19. Put etharp_output in onchip RAM 

  20. Put etharp_query in onchip RAM
  21. Use udp_sendto_if to send UDP packets. 

  22. Replace memcpy with a more efficient memcpy than Altera's. 

  23. Remove the memory copy for unaligned transfers in lwip_tse_mac.c. 

  24. Use chained SGDMA transfers. 

  25. Don't wait for a packet to be sent - use pbuf reference counts and delete the previously sent pbuf on the next pbuf send. This removes the wait for completion of each packet sent. 

  26. Rewrite/refactor the SGDMA driver - it's very inefficient. 

  27. Rewrite/refactor the TSE driver - it's very inefficient.
You can do only some of these things if you want lower yet still much better performance. If you do 1 through 6, I believe that you will meet your goals. I did all of the above (UDP checksumming off) and have over 800Mb UDP output. 

 

This might seem overwhelming, but as I said I had to preserve a line of products we develop. 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

For the inet checksum there has to be a significant gain in defining a custom instruction that adds the two 16-bit halves of a 32 bit word together, then adds that 17 bit result to a 32 bit result. 

 

A second version that includes a 'rotate by 8 bits' for misaligned data would also help. 

 

These would speed up a C/asm version without adding the full complexity of a full VHDL version.
0 Kudos
Altera_Forum
Honored Contributor II
806 Views

Yes, true dsl - we also did a custom opcode for htonl and ntohl.

0 Kudos
Altera_Forum
Honored Contributor II
806 Views

I've a custom opcode that uses the 'B' register number to select between: 

- 32 bit byte reverse 

- 16 bit byte reverse 

- 32 bit bit reverse 

- 16 bit bit reverse 

- 8 bit bit reverse 

It should be possible to make the supported transformations configurable in the sopc builder - to save fpga real estate, but we aren't that short of gates. 

I also did G.711 a-law <=> u-law in combinatorial logic, but they take about 12ns :-(
0 Kudos
Altera_Forum
Honored Contributor II
798 Views

Bill! 

 

I have tried options 2 and 3 only by now and really amazed! After I upgraded lwIP_Example with lwIP-1.4.0rc1 (LWIP_INLINE_IP_CHKSUM defined) the benchmark improved from 55M to 95M. And after I also disabled UDP checksum I got 310M!!! 

 

Daixiwen! 

 

Very interesting link! I didn't think about moving main data stream away from Nios internals before, but will surely try it now. 

 

Many thanks to all for hints and suggestions! 

Igor
0 Kudos
Altera_Forum
Honored Contributor II
798 Views

Hi BillA and guys.I am still with my troubles. 

It still does not working. 

As i understand, when i define as a accept callback function(as you defined) http_accept and define there(inside the http_accept), so after then external tcp-client(PC program) connects to the board, callback function must to start and i should stop on breakpoit,which has been defined. But it does not happens. And in follows actions, program do not stop in breakpoints of callback function. What could be a reason? 

 

Thanks,Slava.
0 Kudos
Reply