I have three IBM x3850 servers each with two Intel pro 1000 pt quad nics. All but one port on the two nics seems to flap when connected to our Cisco switch. The problems exists on all three servers. Essentially when the network is connected to one of these ports, the counters in the network status resets to 0 every few seconds. The connections work fine when connected to the internal Broadcom nic on same server.
Currently running windows 2008r2 sp1
Intel Driver reported from device manager is: 220.127.116.11
The configuration of the ports on our Cisco switch are as follows.
switchport access vlan 211
switchport mode access
I am sorry to hear your having connection issues. The base driver you are using has been very stable. The driver has not been changed in a year and is the latest driver. I am not aware of any connection issues with that driver.
Let me see if I can help you out by seeting up something similar here and running some tests. I do not have an IBM server for testing, but I can check for any network driver issues by installing the adapter on my desktop machine and running traffic through a VLAN on my switch.
Do you have any connectivity? Was the connection working before some change and then stopped working? Are there any messages in the event log concerning the network connection?
Do you have any teams configured? If yes, what are the configuration details?
Are the adapters assigned to any virtual connections via Hyper-V?
What is the Intel(R) PROSet version? You can check the version on the Link Speed tab for the physical adapter port.
What is the part number of the adapter? You can get the part number by clicking on the Identify Adapter button on the Link Speed tab.
What is the VLAN driver version? You can check this version on the driver tab on the VLAN network adapter in Windows device manager.
Actually these servers were former vmware esx servers and yes they will become hyper-v servers, but the hyper-v roll has not been applied yet.
As for teams yes. We do have two internal Broadcom NIC's that are configured as trunks and teamed with the Broadcom BACS tool. For now I have connectivity by creating a virtual nic using the Broadcom tool that tags the appropriate vlan on the trunk.
I have found if that if I tinker with some of the nic setting I can get connectivity, but it never survives a reboot. For example this morning I changed the "Wait for link" from Auto detect to Off, but when I reverted back to the default setting it started working, but again it didn't survive a reboot. It's always reporting a 1gb connection its just we never get connectivity(unless i tinker) and the receive counter resets to 0 every few secs.
As for the event logs I can't find anything that stick out other than "Intel® PRO/1000 PT Quad Port LP Server Adapter # 4 Link as been established: 1000Mbps"
Proset Version is 18.104.22.168
Vlan driver: 9.8.33.00
I actually didn't have a vlan interface created, I creating an access port at the switch. In my troubleshooting earlier I did attempt to create an untagged vlan nic but that failed also(I'm not even sure that would work normally).
As for my configuration the goal is to have our two Broadcom nics teamed and connected to a Microsoft Virtual Switch in Hyper-v. Then on the two Intel cards, have one port be the an isolated metadata network for the cluster on one card. Then on the other card have the hyper-v host connect on another port. Again the metadata network working fine. At first I thought this was a hardware failure but realized I had the same problem on the other two servers as well.
The driver and software versions you reported are the latest. Howerver, this sounds like a previously reported issue when using that adapter model with the IBM 3850. The good news is that we might be able to resolve the issue by making a driver change.
I do not know of any workaround, but since you have an IBM version of the adapter in an IBM server you might want to check with IBM support to see if they know of a way to work around the issue. I have not researched this issue on IBM's web site at all. You could try whatever version of Ethernet drivers and software IBM has posted to see if those versions work any better. You must uninstall any software and drivers you downloaded from the Intel Download Center before you will be able to install what you get from IBM.
I will post here if I find out about any workarounds or a new driver release for this issue.
First, thanks for all your help. I decided to move the tests to our lab to see if I could find a solution. Using an isolated Cisco 3650. I was quickly able to recreate the problem as soon as I installed windows 2008r2 on one of the other boxes. In the lab I tried every driver version I could find(both IBM and INTEL) still no success. I also experimented various spanning-tree settings and configurations the port, but it always flapped.
I had my laptop connected to the console of the Cisco switch so I could exactly when the flapping started. The funny thing is, when the server was booted the link came up and was stable. It was only when windows 2008r2 started to load that the flapping started. Aginn that points to a driver issue.
In the end I gave up trying to figure it out. My solution was to pull the Intel cards from each of the servers and replace them when Broadcom Netxtream II nics. After that change everything seemed to work. After a comprehensive test in the lab replaced all Intel cards out of the IBM's at the datacenter. We were finely able to bring up our Hyper-V cluster have been trouble free since.
I guessing there is some incompatibility with windows 2008r2 on this particular ibm. Again these nics worked just fine with ESX installed on the servers. Not sure what to think. I now have a stack of 15 or so Intel Cards sitting on our work bench.