- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everybody,
we're running ubuntu-natty 2.6.38-13.53 kernel, with ixgbe 3.2.9-k2 and ixgbevf 1.0.19-k0 drivers. We use 82599EB dual-port NICs. Each port spawns 10 VFs, which are further attached to virtual machines with KVM.
Frequently we experience network failures, which start like this:
Dec 22 14:41:07 ccmaster kernel: [190048.835136] DRHD: handling fault status reg 2
Dec 22 14:41:07 ccmaster kernel: [190048.864523] DMAR:[DMA Read] Request device [03:11.7] fault addr 79634000
Dec 22 14:41:07 ccmaster kernel: [190048.864525] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:07 ccmaster kernel: [190049.014923] DRHD: handling fault status reg 102
Dec 22 14:41:07 ccmaster kernel: [190049.044511] DMAR:[DMA Read] Request device [03:11.7] fault addr 79634000
Dec 22 14:41:07 ccmaster kernel: [190049.044513] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:08 ccmaster kernel: [190050.355215] DRHD: handling fault status reg 202
Dec 22 14:41:08 ccmaster kernel: [190050.385040] DMAR:[DMA Read] Request device [03:11.7] fault addr 77a92000
Dec 22 14:41:08 ccmaster kernel: [190050.385041] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:09 ccmaster kernel: [190051.007798] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:09 ccmaster kernel: [190051.043515] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:09 ccmaster kernel: [190051.471541] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:09 ccmaster kernel: [190051.510908] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:10 ccmaster kernel: [190051.885971] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:10 ccmaster kernel: [190051.923664] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:10 ccmaster kernel: [190051.925334] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:10 ccmaster kernel: [190051.964411] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:10 ccmaster kernel: [190052.195640] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:10 ccmaster kernel: [190052.235159] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:11 ccmaster kernel: [190053.001909] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:11 ccmaster kernel: [190053.040401] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:12 ccmaster kernel: [190053.882821] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:12 ccmaster kernel: [190053.920700] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:12 ccmaster kernel: [190053.922305] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:12 ccmaster kernel: [190053.960197] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:13 ccmaster kernel: [190054.612941] ixgbe 0000:03:00.1: eth103: Detected Tx Unit Hang
Dec 22 14:41:13 ccmaster kernel: [190054.612943] Tx Queue <0>
Dec 22 14:41:13 ccmaster kernel: [190054.612944] TDH, TDT <100>, <122>
Dec 22 14:41:13 ccmaster kernel: [190054.612944] next_to_use <122>
Dec 22 14:41:13 ccmaster kernel: [190054.612945] next_to_clean <102>
Dec 22 14:41:13 ccmaster kernel: [190054.612946] tx_buffer_info[next_to_clean]
Dec 22 14:41:13 ccmaster kernel: [190054.612946] time_stamp <10121fa2d>
Dec 22 14:41:13 ccmaster kernel: [190054.612947] jiffies <10121fc55>
Dec 22 14:41:13 ccmaster kernel: [190054.838626] ixgbe 0000:03:00.1: eth103: tx hang 1 detected on queue 0, resetting adapter
Dec 22 14:41:13 ccmaster kernel: [190054.838782] ixgbe 0000:03:00.1: eth103: Reset adapter
Dec 22 14:41:13 ccmaster kernel: [190054.866337] ixgbe 0000:03:00.1: eth103: RXDCTL.ENABLE on Rx queue 20 not cleared within the polling period
Dec 22 14:41:13 ccmaster kernel: [190055.083995] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:13 ccmaster kernel: [190055.232255] ixgbe 0000:03:00.1: master disable timed out
Dec 22 14:41:15 ccmaster kernel: [190057.418550] ixgbe 0000:03:00.1: eth103: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 22 14:41:15 ccmaster kernel: [190057.420402] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:15 ccmaster kernel: [190057.420405] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:15 ccmaster kernel: [190057.451889] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:15 ccmaster kernel: [190057.491834] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:15 ccmaster kernel: [190057.538455] ixgbe 0000:03:00.1: eth103: NIC Link is Down
Dec 22 14:41:16 ccmaster kernel: [190058.001181] DRHD: handling fault status reg 302
Dec 22 14:41:16 ccmaster kernel: [190058.029084] DMAR:[DMA Read] Request device [03:11.7] fault addr 79634000
Dec 22 14:41:16 ccmaster kernel: [190058.029086] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:16 ccmaster kernel: [190058.338892] DRHD: handling fault status reg 402
Dec 22 14:41:16 ccmaster kernel: [190058.367063] DMAR:[DMA Read] Request device [03:11.7] fault addr 77a92000
Dec 22 14:41:16 ccmaster kernel: [190058.367064] DMAR:[fault reason 06] PTE Read access is not set
Dec 22 14:41:16 ccmaster kernel: [190058.508637] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:17 ccmaster kernel: [190058.874750] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:17 ccmaster kernel: [190058.874797] ixgbe 0000:03:00.1: eth103: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 22 14:41:17 ccmaster kernel: [190058.876606] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:17 ccmaster kernel: [190058.876609] br103: port 1(eth103) entering forwarding state
Dec 22 14:41:17 ccmaster kernel: [190058.912721] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:17 ccmaster kernel: [190058.914695] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:17 ccmaster kernel: [190058.952633] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 6
Dec 22 14:41:17 ccmaster kernel: [190058.981264] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:17 ccmaster kernel: [190059.021207] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 2
Dec 22 14:41:17 ccmaster kernel: [190059.184576] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:17 ccmaster kernel: [190059.224515] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 4
Dec 22 14:41:17 ccmaster kernel: [190059.449368] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:17 ccmaster kernel: [190059.488744] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 7
Dec 22 14:41:19 ccmaster kernel: [190060.872227] ixgbe 0000:03:00.1: eth103: VF Reset msg received from vf 3
Dec 22 14:41:19 ccmaster kernel: [190060.911541] ixgbe 0000:03:00.1: eth103: ...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex,
Was this issue ever resolved? Was it related to the slot like your post at /message/146581# 146581 http://communities.intel.com/message/146581# 146581?
Mark H
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mark,
thank you for noticing my question.
We did not find an exact root cause for the issue.
Couple of things we did:
- downgraded to kernel 2.6.38-8.48 (which has the same ixgbe/ixgbevf drives)
- used the following systcl settings for bridges:
net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-ip6tables=0
net.bridge.bridge-nf-call-arptables=0
Currently we're not experiencing this issue. Do you have any clue what issues those DRHD/DMAR faults and "Tx Hang" messages point to?
Thanks,
Alex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex,
Most Linux questions are outside my area of expertise, so I will do some checking with our developers to see what I can find out. Since your connections are stable, you might not want to spend time experimenting with updated drivers, KVM, or an updated kernel. Newer versions of any of these might make a difference. Of course, I do not have any specific information that making any of those changes will help.
I know that suggesting driver and component updates is the default answer from technical support guys, but that is what I am. I cannot help myself. I always recommend updating to the latest versions UNLESS everything is stable and performing as desired. Sometimes leaving things alone is a good choice.
I will let you know what I find out from our developers. Have a great day.
Mark H
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mark, thanks for your reply.
I am basically looking for a clue what these error messages might indicate. Being a dev myself, whenever a component prints an error message, it indicates that something went wrong, and a component knows what operation did not succeed (maybe it doesn't know why, though). So if you can dig out a clue what error these prints indicate, it might help a lot.
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex,
Did some digging around with the experts and discovered this was found 2+ years ago now. Some bugs in BIOS, Kernel and driver it would appear.
here are the bugzilla reports that should help with understanding the problem.
https://bugzilla.redhat.com/show_bug.cgi?id=541397 https://bugzilla.redhat.com/show_bug.cgi?id=541397
https://bugzilla.redhat.com/show_bug.cgi?id=538163 https://bugzilla.redhat.com/show_bug.cgi?id=538163
https://bugzilla.redhat.com/show_bug.cgi?id=568153 https://bugzilla.redhat.com/show_bug.cgi?id=568153 à ixgbe related.
Regards,
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Patrick,
thanks for looking at this as well.
Those bug reports, however, seem to indicate older kernel versions, then the one we're using (2.6.38-8). The ixgbe driver shipped by Ubuntu with this kernel (3.2.9-k2) is somewhat dated, although.
Currently, we don't see this issue anymore, perhaps due to disabling the bridge-nf settings.
Thanks!
Alex.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page