- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hello everybody,
we're running ubuntu-natty 2.6.38-13.53 kernel, with ixgbe 3.2.9-k2 and ixgbevf 1.0.19-k0 drivers. We use 82599EB dual-port NICs. Each port spawns 10 VFs, which are further attached to virtual machines with KVM.
Frequently we experience network failures, which start like this:
Dec 22 14:41:07 ccmaster kernel: [190048.835136] DRHD: handling fault status reg 2
Dec 22 14:41:07 ccmaster kernel: [190048.864523] DMAR:[DMA Read] Request device [03:11.7] fault addr 79634000
Dec 22 14:41:07 ccmaster kernel: [190048.864525] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:07 ccmaster kernel: [190049.014923] DRHD: handling fault status reg 102
Dec 22 14:41:07 ccmaster kernel: [190049.044511] DMAR:[DMA Read] Request device [03:11.7] fault addr 79634000
Dec 22 14:41:07 ccmaster kernel: [190049.044513] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:08 ccmaster kernel: [190050.355215] DRHD: handling fault status reg 202
Dec 22 14:41:08 ccmaster kernel: [190050.385040] DMAR:[DMA Read] Request device [03:11.7] fault addr 77a92000
Dec 22 14:41:08 ccmaster kernel: [190050.385041] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:09 ccmaster kernel: [190051.007798] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:09 ccmaster kernel: [190051.043515] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:09 ccmaster kernel: [190051.471541] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:09 ccmaster kernel: [190051.510908] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:10 ccmaster kernel: [190051.885971] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:10 ccmaster kernel: [190051.923664] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:10 ccmaster kernel: [190051.925334] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:10 ccmaster kernel: [190051.964411] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:10 ccmaster kernel: [190052.195640] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:10 ccmaster kernel: [190052.235159] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:11 ccmaster kernel: [190053.001909] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:11 ccmaster kernel: [190053.040401] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:12 ccmaster kernel: [190053.882821] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:12 ccmaster kernel: [190053.920700] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:12 ccmaster kernel: [190053.922305] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:12 ccmaster kernel: [190053.960197] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:13 ccmaster kernel: [190054.612941] ixgbe 0000:03:00.1: eth103: Detected Tx Unit Hang
Dec 22 14:41:13 ccmaster kernel: [190054.612943] Tx Queue <0>
Dec 22 14:41:13 ccmaster kernel: [190054.612944] TDH, TDT <100>, <122>
Dec 22 14:41:13 ccmaster kernel: [190054.612944] next_to_use <122>
Dec 22 14:41:13 ccmaster kernel: [190054.612945] next_to_clean <102>
Dec 22 14:41:13 ccmaster kernel: [190054.612946] tx_buffer_info[next_to_clean]
Dec 22 14:41:13 ccmaster kernel: [190054.612946] time_stamp <10121fa2d>
Dec 22 14:41:13 ccmaster kernel: [190054.612947] jiffies <10121fc55>
Dec 22 14:41:13 ccmaster kernel: [190054.838626] ixgbe 0000:03:00.1: eth103: tx hang 1 detected on queue 0, resetting adapter
Dec 22 14:41:13 ccmaster kernel: [190054.838782] ixgbe 0000:03:00.1: eth103: Reset adapter
Dec 22 14:41:13 ccmaster kernel: [190054.866337] ixgbe 0000:03:00.1: eth103: RXDCTL.ENABLE on Rx queue 20 not cleared within the polling period
Dec 22 14:41:13 ccmaster kernel: [190055.083995] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:13 ccmaster kernel: [190055.232255] ixgbe 0000:03:00.1: master disable timed out
Dec 22 14:41:15 ccmaster kernel: [190057.418550] ixgbe 0000:03:00.1: eth103: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 22 14:41:15 ccmaster kernel: [190057.420402] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:15 ccmaster kernel: [190057.420405] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:15 ccmaster kernel: [190057.451889] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:15 ccmaster kernel: [190057.491834] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:15 ccmaster kernel: [190057.538455] ixgbe 0000:03:00.1: eth103: NIC Link is Down
Dec 22 14:41:16 ccmaster kernel: [190058.001181] DRHD: handling fault status reg 302
Dec 22 14:41:16 ccmaster kernel: [190058.029084] DMAR:[DMA Read] Request device [03:11.7] fault addr 79634000
Dec 22 14:41:16 ccmaster kernel: [190058.029086] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:16 ccmaster kernel: [190058.338892] DRHD: handling fault status reg 402
Dec 22 14:41:16 ccmaster kernel: [190058.367063] DMAR:[DMA Read] Request device [03:11.7] fault addr 77a92000
Dec 22 14:41:16 ccmaster kernel: [190058.367064] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:16 ccmaster kernel: [190058.508637] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:17 ccmaster kernel: [190058.874750] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:17 ccmaster kernel: [190058.874797] ixgbe 0000:03:00.1: eth103: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 22 14:41:17 ccmaster kernel: [190058.876606] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:17 ccmaster kernel: [190058.876609] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:17 ccmaster kernel: [190058.912721] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:17 ccmaster kernel: [190058.914695] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:17 ccmaster kernel: [190058.952633] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:17 ccmaster kernel: [190058.981264] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:17 ccmaster kernel: [190059.021207] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:17 ccmaster kernel: [190059.184576] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:17 ccmaster kernel: [190059.224515] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:17 ccmaster kernel: [190059.449368] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:17 ccmaster kernel: [190059.488744] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:19 ccmaster kernel: [190060.872227] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:19 ccmaster kernel: [190060.911541] ixgbe 0000:03:00.1: eth103: ...
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Alex,
Was this issue ever resolved? Was it related to the slot like your post at /message/146581# 146581 http://communities.intel.com/message/146581# 146581?
Mark H
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Mark,
thank you for noticing my question.
We did not find an exact root cause for the issue.
Couple of things we did:
- downgraded to kernel 2.6.38-8.48 (which has the same ixgbe/ixgbevf drives)
- used the following systcl settings for bridges:
net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-ip6tables=0
net.bridge.bridge-nf-call-arptables=0
Currently we're not experiencing this issue. Do you have any clue what issues those DRHD/DMAR faults and "Tx Hang" messages point to?
Thanks,
Alex.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Alex,
Most Linux questions are outside my area of expertise, so I will do some checking with our developers to see what I can find out. Since your connections are stable, you might not want to spend time experimenting with updated drivers, KVM, or an updated kernel. Newer versions of any of these might make a difference. Of course, I do not have any specific information that making any of those changes will help.
I know that suggesting driver and component updates is the default answer from technical support guys, but that is what I am. I cannot help myself. I always recommend updating to the latest versions UNLESS everything is stable and performing as desired. Sometimes leaving things alone is a good choice.
I will let you know what I find out from our developers. Have a great day.
Mark H
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Mark, thanks for your reply.
I am basically looking for a clue what these error messages might indicate. Being a dev myself, whenever a component prints an error message, it indicates that something went wrong, and a component knows what operation did not succeed (maybe it doesn't know why, though). So if you can dig out a clue what error these prints indicate, it might help a lot.
Thanks again!
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Alex,
Did some digging around with the experts and discovered this was found 2+ years ago now. Some bugs in BIOS, Kernel and driver it would appear.
here are the bugzilla reports that should help with understanding the problem.
https://bugzilla.redhat.com/show_bug.cgi?id=541397 https://bugzilla.redhat.com/show_bug.cgi?id=541397
https://bugzilla.redhat.com/show_bug.cgi?id=538163 https://bugzilla.redhat.com/show_bug.cgi?id=538163
https://bugzilla.redhat.com/show_bug.cgi?id=568153 https://bugzilla.redhat.com/show_bug.cgi?id=568153 à ixgbe related.
Regards,
Patrick
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hello Patrick,
thanks for looking at this as well.
Those bug reports, however, seem to indicate older kernel versions, then the one we're using (2.6.38-8). The ixgbe driver shipped by Ubuntu with this kernel (3.2.9-k2) is somewhat dated, although.
Currently, we don't see this issue anymore, perhaps due to disabling the bridge-nf settings.
Thanks!
Alex.

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite