We've just gotten in a bunch of new Dell R840 servers. We run Oracle Linux 6.10, and we have the Intel X710 card.
We have multiple servers that have NICs that seem to be flapping up/down. From dmesg:
[ 3343.096396] bond0: link status definitely down for interface eth2, disabling it [ 3346.090416] bond0: link status definitely up for interface eth2 [ 3350.082850] bond0: link status definitely down for interface eth2, disabling it [ 3358.067763] bond0: link status definitely up for interface eth2
The same info shows up in syslog. We note in /proc/net/bonding/bond0, that "MII Status" seems to flap between up and down, though the speed and duplex always show correct values.
We have tried:
# ethtool -i eth2 driver: i40e version: 2.1.14-k firmware-version: 6.80 0x80003d74 18.8.9 bus-info: 0000:17:00.2 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes
# uname -r 4.1.12-124.27.1.el6uek.x86_64
Any help would be appreciated!
# cat ifcfg-bond0 BOOTPROTO="none" DEVICE="bond0" ONBOOT=yes PEERDNS=no PEERROUTES=no DEFROUTE=no TYPE="Bond" BONDING_MASTER=yes NM_CONTROLLED=no BONDING_OPTS="mode=1 use_carrier=0 primary=eth0 arp_ip_target=10.10.10.254 arp_interval=1000" # cat ifcfg-bond0.156 BOOTPROTO="none" IPADDR="10.10.10.104" NETMASK="255.255.255.0" GATEWAY="10.10.10.254" DEVICE=bond0.156 ONBOOT=yes PEERDNS=yes PEERROUTES=yes DNS1=10.10.10.21 DNS2=10.10.10.22 VLAN=yes
#2: Yes, it's the 4 port onboard NIC. Here is info from Dell's DRAC:
Intel(R) 10GbE 4P X710 rNDC
Number 3 is going to be harder to get. I will need to have someone at our data center open the case and get that information. I'll try to get it.
Also, we have a case open with Oracle support for this as well. I'll follow up if I get any further information from them. They had me add the following, which did not make any difference:
# cat /etc/modprobe.d/bonding.conf alias bond0 bonding options bonding max_bonds=2 ## ADDED THIS: options bond0 mode=balance-alb miimon=100
I really appreciate the answers. After a lot of troubleshooting, we have determined that the issue was our B-side top-of-rack switches. They were not passing traffic correctly. A reboot of the switch actually fixed our issue.
Thanks for the views and the replies!