We are currently working with two motherboards, both have dual GigE 82574L based NICs. The 82574L, with current drivers, appears to be losing or mangling data when multicast traffic above a very modest level is present.
They are a dual Xeon system on a Supermicro X8DTL-i motherboard and an Atom D510 based system on a Supermicro X7SPA-H based motherboard.
The Xeon based system is a stock 32-bit Ubuntu 10.04 install:
Kernel: Linux ubuntu 2.6.32-24-generic # 39-Ubuntu SMP Wed Jul 28 06:07:29 UTC 2010 i686 GNU/Linux
e1000e: 1.2.10-NAPI (direct download from intel support site)
The Atom based system is a custom 32-bit kernel.org install:
Kernel: 2.6.31-12.xxxx32 # 5 SMP Wed Jul 21 13:58:54 PDT 2010 i686 GNU/Linux
The systems are purpose built for handling video flows. The Xeon system, at the moment, does have a stock Ubuntu desktop installed in that particular image. The Atom based system has virtually no processes running as it is running an embedded OS build with no desktop whatsoever. Either way, load and memory are not an issue, the relative footprints are tiny on both systems.
We have confirmed that ASPM is disabled in the BIOS and in the kernel on both system, as mentioned on the SourceForge forum. We have upgraded the BIOS on the Xeon based system to the latest version from Supermicro, and it made no difference. The Atom based board is already current for BIOS.
Both systems are exhibiting similar issues - they start losing data under even modest multicast transmit load.
On the same VLAN is an MPEG-2 video encoder that outputs streams (2x) at ~5 Mbps and also HD streams at ~19 Mbps. The encoder generates the traffic without error and it can be received and processed at various XP and Vista desktops running professional analyzer software, without errors, when multicast directly from the encoder. However, if video streams are processed on either 82574L GigE controller basd machine, or even if known "good" streams are relayed (multicast) through the 82574 based machines, the streams become impaired and data does not arrive at the end point. Data flow does not stop, data is simply missing. However, using ethtool, there are no reports of errors or lost data on the transmit side.
The problem appears to be directly related to sending multicast traffic. It is possible to receive and relay a single multicast stream (~5Mbps), multicast in and multicast out. If, however, a second stream of either type (UC or MC UDP) is added to the output, the second and subsequent streams are impaired (data is missing). Using two instances of a different, but similar application, two parallel streams, each with its own process, exhibit the same impairment condition. The streams are UDP multicast, as is common in IPTV networks. The same stream or streams can play normally up to over 40 Mbps of simultaneous video, on the same switched network, without any issue when 82574 based devices are not involved.
It is possible to receive a single multicast on an 82574 based machine and output it to a number of unicast addresses without error. We have successfully relayed a single input stream to five simultaneous unicast outputs (directed at an analyzer) and all give unicast streams are clean at the analyzer, with no missing data. However, if we send one multicast output and one unicast output, the unicast out (if it is the second output generated) is missing data when it arrives at the analyzer. Likewise, dual multicast output results in the second stream also being impaired at source and destination.
When this test is performed using the Xeon based board, the errors in the stream are also detected on the console as detected by our relay application. The identical test, performed on the Atom based system, does NOT report the stream errors at the console, but they are still detected at the receiving analyzer station.
There appears to be a material problem with multicast handling on the 82574.
REQUEST FOR HELP
Our initial day of testing leads us to believe that there is a material problem with the implementation of multicast support in the current 1.2.10 and earlier versions of the e1000e driver. We are happy to test new versions and provide feedback where possible, as this is a major blocking issue for our commercial projects.
Can Intel or any third party provide any insight in to possible solutions or workarounds for this issue? We have some experience with driver development and can engage in discussions at the driver level.
This issue has been resolved, multicast appears to function correctly on the 82574 and performs well in a stable test environment. Wireshark for packet sniffing, and a line by line review of switch configurations and test program parameters helped resolve the issue.
The cause of our issues appears to be a misconfiguration in a VLAN and the lack of a defined output interface for the second output multicast stream, the result of which caused multicast to flood on to the admin network and triggered an XOFF flow control broadcast storm that caused the network to become temporarily unusable.
Once the VLAN was reconfigured and the streams were kept on the video network where they belong, everything started working normally.
Not sure if your solution can resolve my issue which I have just posted here (last reply):
I am not using multicast but simply see any network activity be it NFS, scp quickly hang. Surprisingly, the node has not problems with
outward bound NFS requests.
What were you looking for on Wireshark?