I am looking for some help on a tricky problem that is only occurring occasionally. I am getting random receiver hangs on PRO 1000 GE ports configured in 100Mbs / Full Duplex mode under Windows 2003 Server. Link stays up to the switch but device stops receiving packets at the server. It comes back to life after 30 or 40 minutes if we wait it out. Not sure if it would always start working again but on several occasions it has.
Ports hang a couple times a day, more on some machines than others. I have 6 servers running in this solution. Two machines hang more often than the others. We have moved from the motherboard NICs to the Pro1000 NIC but that is also showing the problem and at about the same rate.
The symptom is that the devices configured in 100Mbs / full duplex stop receiving packets. If I disable and enable the device under the Windows 2003 Server Control Panel / Network Connection Tool the device starts working again.
We have been running the ProSet Drivers (Wx64 DRV - 10.3.49.2 shp 200,800 10-17-2008 e1q51x64.sys) on other platforms without this issue. I know the drivers are older but I would like to solve the issue before I just start swapping out SW and/or HW.
Traffic on the 100mbs link is very light. It is the control / management connection to the system.
I have a high performance Supermicro Seaburg server platform that has four Intel devices. Two 82576EB on a Pro1000 PCI-E card and two 82575EB devices on the motherboard. Three devices are configured for 100mbs/full duplex and one (PCI-E Card) is 1Gbe.
The 1Gbe port is running 300-600mbs rcv under constant load w/ a light xmit load.
We have written a watchdog script to ping and reset the device when it stops for now.
Does anyone know what is causing the 100Mbs device to stop receiving packets?
I suspect it has something to do with the interrupt rcv ring processing or due to the high traffic on the 1Gbe link.
Thanks in advance,
Switch - Catalyst 2950 switch (IOS 12.1(9) EA1)
SUPERMICRO X7DWU Motherboard, BIOShoenix Technologies LTD 1.2 11/04/2008 BMC:60 OS: Windows 2003 64b
Four NICs - 2 on Motherboard 82575EB and 2 82576EB on dual port Pro1000
Onboard = Intel 82575EB Gigabit Card
Dualport Card = Intel Gigabit ET Dual Port Server Adapter
Main driver is the E1Q51X64.sys, version 10.3.49.2
Gigabit Master Slave Mode = Auto Detect
Jumbo Packet = Disabled
Locally Administered Address = (blank)
Log Link State Event = Enabled
Adaptive Interframe Spacing = Disabled
Flow Control = Rx and Tx Enabled
Interrupt Moderation Rate = Adaptive
Low Latency Interrupts = Nothing configured
Receive Buffers = 256
Trasmit Buffers = 512
Priority and VLAN = Priority and VLAN Enabled
Receive Side Scaling = Enabled
Receive Side Scaling Queues = 1 Queue
TCP/IP Offloading Options:
IPv4 Checksum Offload = Checked/Enabled
TCP Checksum Offload (IPv4) = Checked/Enabled
UDP Checksum Offload (IPv4) = Checked/Enabled
Offload TCP Segmentation = Checked/Enabled
Wait for Link = Auto Detect
C:\Temp>filever /v \winnt\system32\drivers\e1q51x64.sys
--a-- Wx64 DRV - 10.3.49.2 shp 200,800 10-17-2008 e1q51x64.sys
Language 0x0000 (Language Neutral)
CharSet 0x04b0 Unicode
CompanyName Intel Corporation
FileDescription Intel(R) Gigabit Adapter NDIS 5.x driver
ProductName Intel(R) Gigabit Adapter
FileVersion 10.3.49.2 built by: WinDDK
LegalCopyright Copyright(C) 2008, Intel Corporation. All rights reserved.
Thanks for the suggestions.
Don't think its a flow control issue. Networking guy on-site said the Catalyst 2950 switch was still sending packets to the Intel NIC.
Interestingly enough, we put a keep alive script in place to detect the failure on one system yesterday. It pings a neighbor every three seconds. That system has not failed but 3 of the other systems' recvrs stopped multiple times last night. All systems are running the keep alive script now.
I am testing the ProSet 15.2 drivers now.
I have a similar problem on a Intel Gibabit CT Desktop Adapter, on Windows XP Professional Service Pack 3. After some hours, or days, we lost the network (no reception). The only solution was to manually disable and enable the network interface. Since we don't have find a real solution, we also have write a small script like yours. Our switchs are HP ones.
The e1q drivers were updated in the recently posted version 15.5 software package. You might want to give them a try
Drivers for Windows XP* are here: http://downloadcenter.intel.com/detail_desc.aspx?agr=Y&DwnldID=18717 http://downloadcenter.intel.com/detail_desc.aspx?agr=Y&DwnldID=18717.
Hi Mark -
I work with the original poster of this message and there is some curiosity on whether your mention of the new drivers implies acknowledgement of a correction to the issue listed or if the recommended action is to install the latest drivers in hopes that it will fix the issue. Could you let us know which it is?
Thanks in advance!
This suggestion falls in the category of "install the latest drivers in hopes that it will fix the issue." I suggested updating the driver because the e1q driver used by both your onboard network connections and the adapter was updated on this release. Also,
I think sage99's suggestions have some merit even if flow control is not the issue. Turning off some features that are not needed might cause the issue to go away. However, since you have a workaround in place that is working for you, I can understand your reluctance to try other workarounds.
Besides the driver, the other thing in common would be the other system components. I noticed your BIOS version is 1.2. I see that the version posted by Super Micro is version 1.2b. I could not see any information about what might have been changed in the BIOS, so I have no way of knowing if a BIOS update would make any difference.
I see from trassatti's post that the unresponsive ports are set for 100Mbs / full duplex. If you configure them to auto do the symptoms change?
Do you know if anything is recorded in the event logs when the port stops responding to traffic?
Again, I am not aware of any specific fix for your issue. I am just thinking about things to look at to better understand the issue and find a workaround that does not require a script to monitor and then reset the connection.
I noticed fbernon reports a similar symptom. The Ethernet controller on that adapter is different from the controllers you have, but the driver is the same. Nevertheless, a fix for one might not be a fix for the other, because the cause might be different (or maybe the same).
Please post here if you try the new drivers or any configuration changes and your results.