due to HERE we have a need to upgrade our Intel X710 NICs from firmware 6.01 to 6.80. I'm currently trying to do this with the official update tool from the Intel website. During update, the server generates a kernel panic and reboots. Afterwards the firmware is updated however. I have attached the Crashdump from the kernel panic as well as the log the updateprogramm created.
Obviously this is not accetable, I need to be able to install the firmware and reboot at a later time.
Any tip or help is greatly appreciated.
[15:16:22] root@<servername> ~ # [347584.120173] Kernel panic - not syncing: 00: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources: [347584.120173] 1. Integrated Management Log (IML) [347584.120173] 2. OA Syslog [347584.120173] 3. OA Forward Progress Log [347584.120173] 4. iLO Event Log [347584.258142] CPU: 0 PID: 0 Comm: swapper/0 ve: 0 Tainted: G OE ------------ 3.10.0-1062.4.2.vz7.116.7 #1 116.7 [347584.312085] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 05/24/2019 [347584.343782] Call Trace: [347584.355497] <NMI> [<ffffffffae3acae2>] dump_stack+0x19/0x1b [347584.383154] [<ffffffffae3a5bb8>] panic+0xe8/0x21f [347584.401024] i40e 0000:07:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on [347584.419433] bond0: link status definitely up for interface transfer0, 10000 Mbps full duplex [347584.419451] bond0: first active interface up! [347584.513762] [<ffffffffadc9b97f>] nmi_panic+0x3f/0x40 [347584.537354] [<ffffffffc06ce3df>] hpwdt_pretimeout+0x6f/0xb0 [hpwdt] [347584.566974] [<ffffffffae3b7a0c>] nmi_handle.isra.0+0x8c/0x150 [347584.594184] [<ffffffffae3b61bb>] ? save_paranoid+0xfb/0x140 [347584.615264] i40e 0000:07:00.1 transfer1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None [347584.663018] [<ffffffffae3b7c94>] do_nmi+0x1c4/0x460 [347584.686556] [<ffffffffadc30d61>] ? is_ISA_range+0x1/0x20 [347584.712394] [<ffffffffae3b6daf>] end_repeat_nmi+0x1e/0x81 [347584.719428] bond0: link status definitely up for interface transfer1, 10000 Mbps full duplex [347584.778384] [<ffffffffae3b4cf4>] ? intel_idle+0xd4/0x225 [347584.804509] [<ffffffffae3b4cf4>] ? intel_idle+0xd4/0x225 [347584.830202] [<ffffffffae3b4cf4>] ? intel_idle+0xd4/0x225 [347584.857389] <EOE> [<ffffffffae1f0825>] cpuidle_enter_state+0x45/0xd0 [347584.888669] [<ffffffffae1f098e>] cpuidle_idle_call+0xde/0x230 [347584.916457] [<ffffffffadc3833e>] arch_cpu_idle+0xe/0xc0 [347584.941577] [<ffffffffadd06b6a>] cpu_startup_entry+0x14a/0x1e0 [347584.969434] [<ffffffffae39aff7>] rest_init+0x77/0x80 [347584.992989] [<ffffffffae99c1da>] start_kernel+0x45f/0x480 [347585.018947] [<ffffffffae99bb7b>] ? repair_env_string+0x5c/0x5c [347585.047655] [<ffffffffae99b120>] ? early_idt_handler_array+0x120/0x120 [347585.079859] [<ffffffffae99b72f>] x86_64_start_reservations+0x24/0x26 [347585.111871] [<ffffffffae99b885>] x86_64_start_kernel+0x154/0x177 [347585.142131] [<ffffffffadc000d5>] start_cpu+0x5/0x14 [347585.204565] Kernel Offset: 0x2cc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [347585.264217] [Firmware Warn]: ERST: Firmware does not respond in time [347585.295900] Rebooting in 10 seconds..
Intel(R) Ethernet NVM Update Tool NVMUpdate version 126.96.36.199 Copyright (C) 2013 - 2018 Intel Corporation. ./nvmupdate64e -l x710.log -b Config file read. Config file doesn't have any OROM components specified for device 'XL710'. Tool will use current device's combo set for the OROM update. Inventory [00:007:00:00]: Intel(R) Ethernet Converged Network Adapter X710-2 Flash inventory started. Shadow RAM inventory started. Alternate MAC address is not set. Shadow RAM inventory finished. Flash inventory finished. OROM inventory started. OROM inventory finished. [00:007:00:01]: Intel(R) Ethernet Converged Network Adapter X710 Device already inventoried. Update [00:007:00:00]: Intel(R) Ethernet Converged Network Adapter X710-2 Creating backup images in directory: 3CFDFEE819CC. Backup images created. Flash update started.
Thank you for contacting Intel Customer Support!
Please share the following information for us to check on your query.
1.) PBA number of the adapter for us to identify if you are using an OEM or retail version of Intel Ethernet adapter. You may refer to the link below on where to find the PBA number. You may also provide photos of the adapters focusing on the markings (white sticker) found on the physical card for us to double check on it. The PBA consists of 6-digit number located at the last part of the serial number.
2.) You mentioned that during the update, the server generates a kernel panic and reboots. Afterwards, the firmware is updated however.
Have you tried to double check if the NIC's firmware is still at 6.80 even after the reboot?
Looking forward to your response
In case we don't hear from you, we'll follow up after 3 business days.
Intel Customer Support
thank you for your reply. Whilst I was waiting, I did some more googeling and found a solution. In THIS proxmox forum post, someone has the same error/kernel panic (though not caused by a NIC FW update). There the HP watch dog timer seems to be the problem. Removing hpwdt from the loaded kernel modules fixes my problem or rather lets the update finish properly! Thank you for your help anyway, this can be marked as resolved.
Appreciate your effort in sharing the issue resolution. This would be a great help for other users having the same problem. We are also glad to hear that the issue was now resolved. With this, please be advised that we will now close this request. If you may have any other inquiry in the future, please do not hesitate to post a new question.
May you have a lovely day!
Intel Customer Support