Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
1,700 Views

i40e / X710-DA2 segfault on Ubuntu 16.04

Hello!

I have a problem with running X710-DA2 on my servers. When I try to load the i40e driver it crashes. It happened on a stock fw, drivers, etc. and on the upgraded versions too.

Platform: Supermicro X9DRW with dual Intel(R) Xeon(R) CPU E5-2620, latest BIOS

OS: Ubuntu 16.04.1 LTS, linux 4.4.0-57

NIC firmware: fw 5.0.40043 api 1.5 nvm 5.04 0x800024c6 0.0.0 (latest)

i40e driver: 1.5.25 (latest, downloaded and compiled)

Modules installed: GBC Photonics SP-MM85030D-GP -SFP+

dmesg:

Jan 3 18:01:58 ceph6 kernel: [ 739.510036] i40e: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver - version 1.5.25

Jan 3 18:01:58 ceph6 kernel: [ 739.510041] i40e: Copyright(c) 2013 - 2016 Intel Corporation.

Jan 3 18:01:58 ceph6 kernel: [ 739.527324] i40e 0000:04:00.0: fw 5.0.40043 api 1.5 nvm 5.04 0x800024c6 0.0.0

Jan 3 18:01:58 ceph6 kernel: [ 739.765165] i40e 0000:04:00.0: MAC address: 3c:fd:fe:a2:19:54

Jan 3 18:01:58 ceph6 kernel: [ 739.789909] i40e 0000:04:00.0: AQ command Config VSI BW allocation per TC failed = 14

Jan 3 18:01:58 ceph6 kernel: [ 739.789912] i40e 0000:04:00.0: Failed configuring TC map 255 for VSI 390

Jan 3 18:01:58 ceph6 kernel: [ 739.789915] i40e 0000:04:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL

Jan 3 18:01:59 ceph6 kernel: [ 739.833189] divide error: 0000 [# 1] SMP

Jan 3 18:01:59 ceph6 kernel: [ 739.833324] Modules linked in: i40e(OE+) vxlan ip6_udp_tunnel udp_tunnel intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni

_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper input_leds joydev sb_edac cryptd serio_raw edac_core ipmi_si mei_me 8250_fintek mei ipmi_msghandler shpchp ioatdma lpc_ich mac_hid autofs4 hid_generic usbhid hid psmouse isci

igb ahci libsas libahci dca ptp scsi_transport_sas megaraid_sas pps_core i2c_algo_bit wmi fjes

Jan 3 18:01:59 ceph6 kernel: [ 739.835034] CPU: 0 PID: 2386 Comm: insmod Tainted: G OE 4.4.0-57-generic # 78-Ubuntu

Jan 3 18:01:59 ceph6 kernel: [ 739.835306] Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014

Jan 3 18:01:59 ceph6 kernel: [ 739.835518] task: ffff880868b9f000 ti: ffff88046c1c0000 task.ti: ffff88046c1c0000

Jan 3 18:01:59 ceph6 kernel: [ 739.835754] RIP: 0010:[] [] i40e_pf_config_rss+0x1ef/0x230 [i40e]

Jan 3 18:01:59 ceph6 kernel: [ 739.836059] RSP: 0018:ffff88046c1c37a0 EFLAGS: 00010246

Jan 3 18:01:59 ceph6 kernel: [ 739.836227] RAX: 0000000000000000 RBX: ffff88086bd33c00 RCX: 0000000000000000

Jan 3 18:01:59 ceph6 kernel: [ 739.836452] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000200

Jan 3 18:01:59 ceph6 kernel: [ 739.836679] RBP: ffff88046c1c3808 R08: ffff88046fc1a120 R09: ffff88046f8032c0

Jan 3 18:01:59 ceph6 kernel: [ 739.836904] R10: ffff88086bd33c00 R11: 0000000000000000 R12: 0000000000000000

Jan 3 18:01:59 ceph6 kernel: [ 739.837130] R13: ffff88046da74008 R14: ffff88046c099000 R15: ffff88046da74000

Jan 3 18:01:59 ceph6 kernel: [ 739.837359] FS: 00007f5815768700(0000) GS:ffff88046fc00000(0000) knlGS:0000000000000000

Jan 3 18:01:59 ceph6 kernel: [ 739.837615] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Jan 3 18:01:59 ceph6 kernel: [ 739.837796] CR2: 00007fe8a4fcc13c CR3: 000000046a7f2000 CR4: 00000000000406f0

Jan 3 18:01:59 ceph6 kernel: [ 739.838022] Stack:

Jan 3 18:01:59 ceph6 kernel: [ 739.838085] 0000000000000005 00000000001c0ac0 00000000000e0000 ffff88046c1c37e8

Jan 3 18:01:59 ceph6 kernel: [ 739.838335] ffffffffc03b9e39 ffff88046da74f28 ffff88046da74008 00000000ffd84a52

Jan 3 18:01:59 ceph6 kernel: [ 739.847061] ffff88046da74000 0000000000000000 ffff88046da74008 0000000000000000

Jan 3 18:01:59 ceph6 kernel: [ 739.855800] Call Trace:

Jan 3 18:01:59 ceph6 kernel: [ 739.864529] [] ? i40e_write_rx_ctl+0x39/0x90 [i40e]

Jan 3 18:01:59 ceph6 kernel: [ 739.873487] [] i40e_setup_pf_switch+0x308/0x590 [i40e]

Jan 3 18:01:59 ceph6 kernel: [ 739.882566] [] i40e_probe.part.58+0xd50/0x1be0 [i40e]

Jan 3 18:01:59 ceph6 kernel: [ 739.891572] [] ? radix_tree_lookup+0xd/0x10

Jan 3 18:01:59 ceph6 kernel: [ 739.900540] [] ? irq_to_desc+0x17/0x20

Jan 3 18:01:59 ceph6 kernel: [ 739.909424] [] ? irq_get_irq_data+0xe/0x20

Jan 3 18:01:59 ceph6 kernel: [ 739.918278] [] ? mp_map_pin_to_irq+0xb5/0x300

Jan 3 18:01:59 ceph6 kernel: [ 739.927153] [] ? acpi_ut_remove_reference+0x2e/0x31

Jan 3 18:01:59 ceph6 kernel: [ 739.936072] [] ? __slab_free+0xcb/0x2c0

Jan 3 18:01:59 ceph6 kernel: [ 739.944972] [] ? mp_map_gsi_to_irq+0x98/0xc0

Jan 3 18:01:59 ceph6 kernel: [ 739.953757] [] ? acpi_register_gsi_ioapic+0xbe/0x180

Jan 3 18:01:59 ceph6 kernel: [ 739.962466] [] ? acpi_pci_irq_enable+0x1bf/0x1e4

Jan 3 18:01:59 ceph6 kernel: [ 739.971114] [] ? pci_conf1_read+0xb8/0xf0

Jan 3 18:01:59 ceph6 kernel: [ 739.979739] [] ? raw_pci_read+0x23/0x40

Jan 3 18:01:59 ceph6 kernel: [ 739.988340] [] ? pci_bus_read_config_word+0x9c/0xb0

Jan 3 18:01:59 ceph6 kernel: [ 739.996976] [] ? do_pci_enable_device+0xdd/0x110

Jan 3 18:01:59 ceph6 kernel: [ 740.005459] [] ? pci_enable_device_flags+0xe4/0x130

Jan 3 18:01:59 ceph6 kernel: [ 740.013867] [] i40e_probe+0x1e/0x30 [i40e]

Jan 3 18:01:59 ceph6 kernel: [ 740.022228] [] local_pci_probe+0x45/0xa0

Jan 3 18:01:59 ceph6 kernel: [ 740.030571] [] pci_device_probe+0x103/0x150

Jan 3 18:01:59 ceph6 kernel: [ 740.038806] [] driver_probe_device+0x222/0x4a0

Jan 3 18:01:59 ceph6 kernel: [ 740.046946] [] __driver_attach+0x84/0x90

Jan 3 18:01:59 ceph6 kernel: [ 740.055009] [] ? driver_probe_device+0x4a0/0x4a0

Jan 3 18:01:59 ceph6 kernel: [ 740.063118] [] bus_for_each_dev+0x6c/0xc0

Jan 3 18:01:59 ceph6 kernel: [ 740.071205] [] driver_attach+0x1e/0x20

Jan 3 18:01:59 ceph6 kernel: [ 740.079029] [] bus_add_driver+0x1eb/0x280

Jan 3 18:01:59 ceph6 kernel: [ 740.086629] [] ? 0xffffffffc01ea000

Jan 3 18:01:59 ceph6 kernel: [ 740.093977] [] driver_register+0x60/0xe0

Jan 3 18:01:59 ceph6 kernel: [ 740.101076] [] __pci_register_driver+0x4c/0x50

Jan 3 18:01:59 ceph6 kernel: [ 740.107997] [] i40e_init_module+0xa6/0x1000 [i40e]

Jan 3 18:01:59 ceph6 kernel: [ 740.114831] [] do_one_initcall+0xb3/0x200

Jan 3 18:01:59 ceph6 kernel: [ 740.121523] [] ? kmem_cache_alloc_trace+0x183/0x1f0

Jan 3 18:01:59 ceph6 kernel: [ 740.128183] [] do_init_module+0x5f/0x1cf

Jan 3 18:01:59 ceph6 kernel: [ 740.134706] [] load_module+0x166f/0x1c10

Jan 3 18:01:59 ceph6 kernel: [ 740.141064] [] ? __symbol_put+0x60/0x60

Jan 3 18:01:59 ceph6 kernel: [ 740.147352] [] ? kernel_read+0x50/0x80

Jan 3 18:01:59 ceph6 kernel: [ 740.153662] [<ffffffff8110b...

0 Kudos
12 Replies
Highlighted
Beginner
38 Views

I've discovered, that above segfault exist only, when the physical link is active when the driver is loading. When the interfaces on the switch are disabled the driver loads successfully and both eth* are present. Some errors occurs when I enable interfaces on the switch on loaded driver:

Jan 4 14:50:34 ceph6 kernel: [ 169.326507] i40e 0000:04:00.1: VEB bw config failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EINVAL

Jan 4 14:50:34 ceph6 kernel: [ 169.326515] i40e 0000:04:00.1: Failed configuring TC for VEB seid=161

Jan 4 14:50:34 ceph6 kernel: [ 169.327690] i40e 0000:04:00.1: AQ command Config VSI BW allocation per TC failed = 14

Jan 4 14:50:34 ceph6 kernel: [ 169.327697] i40e 0000:04:00.1: Failed configuring TC map 255 for VSI 391

Jan 4 14:50:34 ceph6 kernel: [ 169.327701] i40e 0000:04:00.1: Failed configuring TC for VSI seid=391

Jan 4 14:50:48 ceph6 kernel: [ 184.011042] i40e 0000:04:00.0: VEB bw config failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EINVAL

Jan 4 14:50:48 ceph6 kernel: [ 184.011050] i40e 0000:04:00.0: Failed configuring TC for VEB seid=160

Jan 4 14:50:48 ceph6 kernel: [ 184.013100] i40e 0000:04:00.0: AQ command Config VSI BW allocation per TC failed = 14

Jan 4 14:50:48 ceph6 kernel: [ 184.013107] i40e 0000:04:00.0: Failed configuring TC map 255 for VSI 390

Jan 4 14:50:48 ceph6 kernel: [ 184.013110] i40e 0000:04:00.0: Failed configuring TC for VSI seid=390

Jan 4 14:52:33 ceph6 kernel: [ 289.006692] i40e 0000:04:00.0 eth2: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None

Jan 4 14:52:55 ceph6 kernel: [ 310.487711] i40e 0000:04:00.1 eth3: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None

Beyond that I established 20Gbps LACP, but TX rate does not exceed 10Mbps. RX is ok, over 5Gbps. IRQs are balanced across the cores.

0 Kudos
Highlighted
Community Manager
38 Views

Hi Domel,

 

 

Thank you for the post. I can see the module you used is GBC Photonics SP-MM85030D-GP -SFP+ which is not the supported model. Please refer to the URL below for the validated module:

 

 

http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html

 

 

We recommend to use validated and supported fiber module for X710 series network adapter, can you help check further? thanks.

 

 

 

Rgds,

 

wb

 

 

 

 

0 Kudos
Highlighted
Community Manager
38 Views

HI Domel,

 

 

Please feel free to update if you have tested with a supported fiber module.

 

 

rgds,

 

wb

 

0 Kudos
Highlighted
Beginner
38 Views

Hi wb,

thanks for your advice. We need some more time to check that, I'll keep you informed.

Thanks,

Dominik

0 Kudos
Highlighted
Community Manager
38 Views

Hi Dominik,

 

 

Thank you for the reply. I will wait for your further update, hope to hear good news from you.

 

 

rgds,

 

wb

 

0 Kudos
Highlighted
Community Manager
38 Views

Hi Dominik,

 

 

Any update? Please feel free to provide the result.

 

 

Thanks,

 

wb
0 Kudos
Highlighted
Beginner
38 Views

Hi,

finally we have changed X710 to X520-DA2 and it works ok. I don't know what was wrong with them. We experienced 2-3 situations when these adapters worked with full speed, but after reboot the problems appeard again.

Thanks,

Dominik

0 Kudos
Highlighted
Community Manager
38 Views

Hi Dominik,

 

 

Thank you for the update. Are you saying X520-DA2 has the same issue after reboot?

 

 

rgds,

 

wb

 

0 Kudos
Highlighted
Beginner
38 Views

No, with X520-DA2 is everything right. Sorry for the imprecise.

0 Kudos
Highlighted
Community Manager
38 Views

Hi Dominik,

 

 

NO worries and thank you for the clarification. Just to double check are you referring if you are using the validate fiber module on the X710-DA2 the same will occur? Are you going to use the X520 NIC instead, if that is the case any further assistance needed?

 

 

Please feel free to update me.

 

 

Thanks,

 

wb

 

0 Kudos
Highlighted
Beginner
38 Views

Hi,

I don't have any module from supported module list, so I couldn't check that. I will not use X710 at the moment and I don't know if even in the future. If so, I'll let you know. For now you can close this topic.

Thanks!

Dominik

0 Kudos
Highlighted
Community Manager
38 Views

Hi Dominik,

 

 

Thank you for the update :) Please feel free to contact us if you have other inquiries.

 

 

Rgds,

 

wb

 

0 Kudos