The issue is wrt the onboard Intel NICs which keeps dropping multicast packets:
NIC Name: Intel 82574L GigE Eth Controller
Driver Date: 9/29/2010
Driver Version: 220.127.116.11
BIOS Rev: 2.0a
Intel westmere CPUs: X5680 @3.33GHz (6 Core- 2 CPUs installed for total of 12 cores).
Operating System Details:
Windows 7 (64bit)
Memory: 8GB Reg. ECC DDR3
We have a network where we connect multicast senders to a DLINK Websmart DGS-1224T switch. We also connect our supermicro unit with the above NIC to the same switch. The unit acts as consumer of multicast packets for e.g VLC player running on the unit.
We see packet losses on a multicast session when following condition happens:
1. Start 2 stream (5Mbps each) from the sender to some multicast address such as 18.104.22.168:1231 & 22.214.171.124:1232
2. Start a consumer (e.g. VLC player) instance on the unit in question and tune to the mutlicast 126.96.36.199:1231. This session will experience a lot of packet losses.
3. Start another consumer (e.e VLC player) isntance on the unit and tune it to the other multicast 188.8.131.52:1232. Now the packet losses on the first session (184.108.40.206) disappears and the stream runs fine, in fact both streams run fine without any packet losses.
4. Stop any consumer and the reamining running consumer starts to experience packet losses again.
A lot of debugging at our end indicated that if there are "m" multicast streams on the network, and if the swtich is not doing a true multicast but is just flooding all the packets on all the ports on the switch, then to have loss free multicast streams on the unit we have to run _exactly_ "m" number of consumers on the unit. This is a very strange behavior as the NIC should be able to handle all the packets which are getting sent to the NIC with out any errors as the NIC is a gigabit ethernet card and the total multicast traffic is only at 10Mbps.
When we added another NIC ( a Broadcomm gigabit NIC) on the unit (by removing the graphics card and using that PICe slot) we did not see this behavior. The NIC was able to handle the traffic flood without any problem and the consumers on the unit did not encounter any multicast packet losses.
We also noticed that if we set the speed and duplex settings on the NIC and the port on the switch to be at "100MbpsFull Duplex", this behavior of packet drops is not observed. However, we need to use the 1GbpsFull duplex on the NIC for our product.
Another observation wrt to the Intel onboard NIC is that, if the multicast is set as 220.127.116.11:1231 & 18.104.22.168:1232 (same mutlicast destination IP, but different port) then the multicast sessions work just fine without any packet losses.
Please let us know how to fix this issue.
PS: Even though I have mentioned in the above text that the switch is a DLINK switch, this issue was reproduced on other switches from Alcatel-Lucent and Cisco switches at one of our client sites.
We are not seeing the issue you described, but then we are not testing on the exact same system you are using. In fact, our test system does not have nearly as powerful processing capabilities as yours. Maybe the load on the core doing the processing is so small that the core is going into some lower power state. Therefore, I would like you to try an experiment to collect more information.
Just as an experiment, try disabling C-States on your system. Even if this makes the problem go away, I am not suggesting this as a fix. This is an experiment to help collect more data. The SuperMicro system we used has this BIOS setting on the Advanced tab, Processor & Clock Options, Intel (R) C-STATE tech; however, your system might have a different setting name.
You could also experiment with driver settings to find out if you get better performance for this appication. For example, if you disable interupt moderaion or set interupt moderartion rate to minimal or low, does the dropped packet issue disappear?
Let us know what you find.
Sorry for the late reply. We had our only unit in our office shipped back to SuperMicro due to some issues with it. We got a replacement unit just a day back. However, now this new unit is behaving differently. The only difference on this new unit is the BIOS rev is changed from 2.0a to 2.0b.
Now I see the following behavior:
1. Suppose there are 3 multicast video streams continuously flowing in the network.
2. Previously, if we join only one stream, we used to see packet losses. Now we do not see those packet drops.
3. Now, I join one more stream from the available 3 streams. I do not see any packet drops in both the player sessions.
4. If I stop any of the player, I see packet drops for about 10-11 seconds on the remaining stream and then the stream do not show any packet drops. This is the changed behavior: we used to see continuous packet drops (if the number of player sessions were less than the number of multicast streams) before the update to BIOS rev 2.0b
Also, I tried disabling the C-States on the BIOS as well as interrupt moderation (disable/low/moderate) as you asked me to do without any luck.
Your experience with the BIOS update causing a change in your results is not uncommon. One of the other engineers here told me that the latest BIOS version for your motherboard is version 2.0c. I recommend upgrading your BIOS to the latest version as the next step.
We have not been able to see any dropped packets when we test here. However, we want to make sure that we do not have a driver problem, so we are not giving up. I will send a private message with a list of questions that we hope will help us figure out the cause and resolution.
We are experiencing the exact same problem with Linux also with the Intel 82574L. Tried different PCI adapters with the same chip set, and different drivers, and results are still bad. In our system we are kind of stuck because the chipset is on the motherboard anyway.
Was this issue ever resolved? Is this a hardware problem with the chipset? I would love to hear back with a solution.
Thanks in advance!
We made a change in the Windows driver for this controller in July 2011 that helped in some multicast environments with multiple multicast streams. (See version 16.4 changes at http://www.intel.com/support/network/sb/CS-006333.htm http://www.intel.com/support/network/sb/CS-006333.htm.)
I am not sure if there were any corresponding changes to the Linux driver since then or if any changes were needed for the Linux driver.
I only know of one report for issues with multicast in Linux with this controller, and that turned out to be a configuration issue. Once the network configuration issues were taken care of, the issue was resolved. You can see what was posted at /message/101307# 101307 http://communities.intel.com/message/101307# 101307.
I sent an inquiry to the Linux driver developer, but I will be surprised if there are any outstanding issues being worked like the one covered on this thread.
Are you using the latest e1000e driver and a recent stable kernel?
If you could, please share some details about your setup, configuration, distro, kernel, hardware, exact adapters used, etc. Maybe someone will spot something in the details that will help.
I just found out that the Linux driver versions 1.4.1 and later had an update for the same issue as the July Windows driver update. So if this is a driver bug, we will have to troubleshoot it from scratch. I will send you my email address in a private message, so I can get the information needed to replicate what you are seeing.