Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
4811 Discussions

Kernel Panic when bridging two 10G x520-DAs

idata
Employee
3,649 Views

I have two 10G x520-DA2 nics (82599EB) running latest ixgbe driver 3.8.21 under latest centos 6.2 kernel (2.6.32-220.13.1.el6.x86_64).

Due to issues with vlan tagging over the bridged interfaces ( http://communities.intel.com/message/152866 http://communities.intel.com/message/152866 ), I have the bridges configured as:

# : brctl show

 

bridge name bridge id STP enabled interfaces

 

br0 8000.001b21d73a78 no eth0

 

eth2

 

br253 8000.001b21d73a78 no eth0.253

 

eth2.253

 

br353 8000.001b21d73a78 no eth0.353

 

eth2.353

 

br653 8000.001b21d73a78 no eth0.653

 

eth2.653

Iptables and ip6tables are called on the bridge devices:

net.bridge.bridge-nf-call-ip6tables = 1

 

net.bridge.bridge-nf-call-iptables = 1

About 5-10 seconds after passing traffic through the br253 bridge device, the kernel panics with the following:

kernel:general protection fault: 0000 [# 1] SMP

 

kernel:last sysfs file: /sys/devices/virtual/net/br653/bridge/multicast_startup_query_interval

kernel:Stack:

kernel:Call Trace:

kernel:Code: 5f 3a 00 48 8b 05 19 1a e6 00 48 c7 c2 f8 b2 fa 81 48 85 c0 74 26 48 8b 4b 08 48 3b 48 08 77 11 eb 1a 66 0f 1f 84 00 00 00 00 00 <48> 39 48 08 73 0b 48 89

kernel:Kernel panic - not syncing: Fatal exception

The console also has an addition line:

stack-protector: Kernel stack is corrupted in: ffffffff8148f073

Any help/pointers would be appreciated.

5 Replies
idata
Employee
1,054 Views

Using a replayable pcap of live traffic through a 1G based bridge, I am able to replicate the panic on demand.

Usng this pcap, I captured a crash dump of the kernel during the panic:

KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.7.1.el6.x86_64.debug/vmlinux

 

DUMPFILE: ./vmcore [PARTIAL DUMP]

 

CPUS: 6

 

DATE: Fri Apr 20 06:21:21 2012

 

UPTIME: 00:02:37

 

LOAD AVERAGE: 0.03, 0.04, 0.01

 

TASKS: 179

 

NODENAME: test.cluster

 

RELEASE: 2.6.32-220.7.1.el6.x86_64.debug

 

VERSION: # 1 SMP Wed Mar 7 01:52:51 GMT 2012

 

MACHINE: x86_64 (3200 Mhz)

 

MEMORY: 15.9 GB

 

PANIC: "Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff814bd0c3"

 

PID: 0

 

COMMAND: "swapper"

 

TASK: ffffffff81821020 (1 of 6) [THREAD_INFO: ffffffff81794000]

 

CPU: 0

 

STATE: TASK_RUNNING (PANIC)

From the log dump:

th2: no IPv6 routers present

 

eth0: no IPv6 routers present

 

eth4: no IPv6 routers present

 

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff814bd0c3

Pid: 0, comm: swapper Not tainted 2.6.32-220.7.1.el6.x86_64.debug # 1

 

Call Trace:

 

[] ? panic+0x78/0x148

 

[] ? icmp_send+0x743/0x780

 

[] ? __stack_chk_fail+0x1b/0x30

 

[] ? icmp_send+0x743/0x780

 

[] ? ipt_do_table+0x3cb/0x678 [ip_tables]

 

[] ? sched_clock+0x9/0x10

 

[] ? sched_clock_local+0x25/0x90

 

[] ? sched_clock_cpu+0xb8/0x110

 

[] ? ipt_hook+0x23/0x30 [iptable_filter]

 

[] ? nf_iterate+0x69/0xb0

 

[] ? br_nf_forward_finish+0x0/0x140 [bridge]

 

[] ? nf_hook_slow+0xa4/0x140

 

[] ? br_nf_forward_finish+0x0/0x140 [bridge]

 

[] ? br_nf_forward_ip+0x1ee/0x3c0 [bridge]

 

[] ? nf_iterate+0x69/0xb0

 

[] ? br_forward_finish+0x0/0x60 [bridge]

 

[] ? nf_hook_slow+0xa4/0x140

 

[] ? br_forward_finish+0x0/0x60 [bridge]

 

[] ? cpu_clock+0x57/0x80

 

[] ? __br_forward+0x0/0xc0 [bridge]

 

[] ? __br_forward+0x72/0xc0 [bridge]

 

[] ? br_flood+0xc1/0xd0 [bridge]

 

[] ? br_flood_forward+0x15/0x20 [bridge]

 

[] ? br_handle_frame_finish+0x27e/0x2a0 [bridge]

 

[] ? nf_bridge_alloc+0x30/0xc0 [bridge]

 

[] ? br_nf_pre_routing_finish+0x228/0x340 [bridge]

 

[] ? br_nf_pre_routing+0x45f/0x760 [bridge]

 

[] ? nf_iterate+0x69/0xb0

 

[] ? br_handle_frame_finish+0x0/0x2a0 [bridge]

 

[] ? nf_hook_slow+0xa4/0x140

 

[] ? br_handle_frame_finish+0x0/0x2a0 [bridge]

 

[] ? br_handle_frame+0x18c/0x250 [bridge]

 

[] ? __netif_receive_skb+0x569/0x740

 

[] ? __netif_receive_skb+0x130/0x740

 

[] ? get_rps_cpu+0x126/0x3b0

 

[] ? get_rps_cpu+0x0/0x3b0

 

[] ? netif_receive_skb+0x58/0x60

 

[] ? napi_skb_finish+0x50/0x70

 

[] ? vlan_gro_receive+0x84/0xa0

 

[] ? ixgbe_poll+0xd43/0x1410 [ixgbe]

 

[] ? net_rx_action+0x188/0x3a0

 

[] ? net_rx_action+0x100/0x3a0

 

[] ? __do_softirq+0xdd/0x200

 

[] ? call_softirq+0x1c/0x30

 

[] ? do_softirq+0xad/0xe0

 

[] ? irq_exit+0x95/0xa0

 

[] ? do_IRQ+0x75/0xf0

 

[] ? ret_from_intr+0x0/0x16

 

[] ? intel_idle+0xe8/0x170

 

[] ? intel_idle+0xe1/0x170

 

[] ? __atomic_notifier_call_chain+0x0/0xa0

 

[] ? cpuidle_idle_call+0xa7/0x150

 

[] ? cpu_idle+0xbb/0x110

 

[] ? rest_init+0x7a/0x80

 

[] ? start_kernel+0x456/0x462

 

[] ? x86_64_start_reservations+0x125/0x129

 

[] ? x86_64_start_kernel+0xfa/0x109

=================================

 

[ INFO: inconsistent lock state ]

 

2.6.32-220.7.1.el6.x86_64.debug # 1

 

---------------------------------

 

inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.

 

swapper/0 [HC0[0]:SC1[2]:HE0:SE0] takes:

 

(pgd_lock){+.?...}, at: [] vmalloc_sync_all+0x80/0x170

 

{SOFTIRQ-ON-W} state was registered at:

 

[] __lock_acquire+0x63c/0x1570

 

[] lock_acquire+0xa4/0x120

 

[] _spin_lock+0x36/0x70

 

[] __change_page_attr_set_clr+0x1ac/0xbd0

 

[] change_page_attr_set_clr+0x13e/0x530

 

[] _set_memory_wb+0x2f/0x40

 

[] ioremap_change_attr+0x17/0x40

 

[] kernel_map_sync_memtype+0x86/0xf0

 

[] __ioremap_caller+0x292/0x3c0

 

[] ioremap_cache+0x14/0x20

 

[] acpi_os_map_memory+0x17/0x20

 

[] acpi_tb_verify_table+0x2e/0x5c

 

[] acpi_load_tables+0x3e/0x133

 

[] acpi_...
0 Kudos
st4
New Contributor III
1,054 Views

Hi GaryMol,

Thank you for the information.We will check on this.

rgds,

wb

0 Kudos
st4
New Contributor III
1,054 Views

Hi Garymol,

We did further checking on this but sorry to inform you. that we don't support the bridging driver.

rgds,

wb

 

0 Kudos
SMcLe4
Beginner
1,054 Views

This (or similar) is also occurring on Kernel 4.1.3 (Centos 7.1).

[126295.760534] ------------[ cut here ]------------ [126295.760567] WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x24f/0x260() [126295.760572] NETDEV WATCHDOG: eno2 (ixgbe): transmit queue 8 timed out [126295.760575] Modules linked in: fuse(E) btrfs(E) xor(E) raid6_pq(E) ufs(E) hfsplus(E) hfs(E) vfat(E) msdos(E) fat(E) xfs(E) binfmt_misc(E) target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) target_core_iblock(E) iscsi_target_mod(E) drbd(E) lru_cache(E) libcrc32c(E) target_core_mod(E) iptable_filter(E) ip_tables(E) bonding(E) dm_mod(E) iTCO_wdt(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) raid10(E) aesni_intel(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) pcspkr(E) sb_edac(E) edac_core(E) i2c_i801(E) joydev(E) lpc_ich(E) mei_me(E) ioatdma(E) mfd_core(E) mei(E) shpchp(E) wmi(E) ipmi_devintf(E) ipmi_si(E) 8250_fintek(E) ipmi_msghandler(E) [126295.760641] acpi_power_meter(E) acpi_pad(E) ext4(E) mbcache(E) jbd2(E) raid1(E) sd_mod(E) ast(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) i2c_algo_bit(E) drm_kms_helper(E) ttm(E) drm(E) ahci(E) libahci(E) ixgbe(E) mdio(E) libata(E) ptp(E) pps_core(E) nvme(E) dca(E) [126295.760670] CPU: 10 PID: 0 Comm: swapper/10 Tainted: G E 4.1.3-1.el7.elrepo.x86_64 # 1 [126295.760674] Hardware name: Supermicro X10DRW-E/X10DRW-NT, BIOS 1.0a 01/07/2015 [126295.760679] 0000000000000000 b7a43922beff28d0 ffff88087fd03d28 ffffffff816d6058 [126295.760686] 0000000000000000 ffff88087fd03d80 ffff88087fd03d68 ffffffff8107d51a [126295.760691] 0000000000000000 0000000000000008 ffff88046a7e0000 0000000000000040 [126295.760696] Call Trace: [126295.760700] [] dump_stack+0x45/0x57 [126295.760720] [] warn_slowpath_common+0x8a/0xc0 [126295.760725] [] warn_slowpath_fmt+0x55/0x70 [126295.760734] [] dev_watchdog+0x24f/0x260 [126295.760739] [] ? dev_graft_qdisc+0x80/0x80 [126295.760750] [] call_timer_fn+0x39/0x110 [126295.760754] [] ? dev_graft_qdisc+0x80/0x80 [126295.760760] [] run_timer_softirq+0x240/0x350 [126295.760771] [] ? lapic_next_deadline+0x33/0x40 [126295.760777] [] __do_softirq+0xf4/0x2d0 [126295.760782] [] irq_exit+0x125/0x130 [126295.760792] [] smp_apic_timer_interrupt+0x4a/0x60 [126295.760798] [] apic_timer_interrupt+0x6e/0x80 [126295.760801] [] ? cpuidle_enter_state+0xa9/0x1f0 [126295.760815] [] ? cpuidle_enter_state+0x78/0x1f0 [126295.760821] [] cpuidle_enter+0x17/0x20 [126295.760828] [] cpu_startup_entry+0x35c/0x3f0 [126295.760835] [] start_secondary+0x173/0x1e0 [126295.760839] ---[ end trace 3143549a7bfdab83 ]--- [126295.760847] ixgbe 0000:01:00.1 eno2: initiating reset due to tx timeout [126295.761042] ixgbe 0000:01:00.1 eno2: Reset adapter *-network:0 description: Ethernet interface product: Ethernet Controller 10-Gigabit X540-AT2 vendor: Intel Corporation physical id: 0 bus info: pci@0000:01:00.0 logical name: eno1 version: 01 serial: 00:25:90:fa:60:7e size: 10Gbit/s width: 64 bits clock: 33MHz capabilities: pm msi msix pciexpress bus_master cap_list ethernet physical tp 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=ixgbe driverversion=4.0.1-k duplex=full firmware=0x800003e2 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s resources: irq:24 memory:c7800000-c79fffff ioport:6020(size=32) memory:c7a04000-c7a07fff memory:90000000-900fffff memory:90100000-901fffff *-network:1 description: Ethernet interface product: Ethernet Controller 10-Gigabit X540-AT2 vendor: Intel Corporation physical id: 0.1 bus info: pci@0000:01:00.1 logical name: eno2 version: 01 serial: 00:25:90:fa:60:7e size: 10Gbit/s width: 64 bits clock: 33MHz capabilities: pm msi msix pciexpress bus_master cap_list ethernet physical tp 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=ixgbe driverversion=4.0.1-k duplex=full firmware=0x800003e2 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s resources: irq:50 memory:c7600000-c77fffff ioport:6000(size=32) memory:c7a00000-c7a03fff memory:90200000-902fffff memory:90300000-903fffff

Offtopic here: It's taken me longer to create a login, verify my email 4 times, choose a username twice, accept Intel's broken SSL cert and write this reply than it did for me to find this bug. intel_admin your forums / community is woeful - no wonder no one replies to issues here.

0 Kudos
st4
New Contributor III
1,054 Views

Hi s_mcleod,

Thank you for the post. We will check on this.

rgds,

wb

0 Kudos
Reply