- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
How are you ? I hope this is the right location..
Below is an issue that we see with NVMe erros with Optane model type: PHAL135400VK400BGN - FW rev L0310100. (Yet, in my opinion the latest Fw. Is L0310200 for this P5800X)
Is anyone saw a similar issue to the below and can advise ?
- In the kernel log, we see the following just before the system hangs and reboots.
As you can see, the driver lost contact (timeout) with the NVMe controller.
4:23:07 nblab37 kernel: [ 9472.111963] nvme nvme10: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Aug 11 14:23:54 nblab37 kernel: [ 9506.768941] watchdog: BUG: soft lockup - CPU#32 stuck for 22s! [kworker/32:0:65627]
Aug 11 14:23:54 nblab37 kernel: [ 9515.090600] nvme nvme9: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Aug 11 14:23:54 nblab37 kernel: [ 9518.848748] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm input_leds joydev rndis_host cdc_ether usbnet mii isst_if_mbox_pci nbimpu(O) isst_if_mmio isst_if_common mei_me mei ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel sunrpc msr ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear hid_generic usbhid hid ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea crct10dif_pclmul sysfillrect crc32_pclmul sysimgblt fb_sys_fops ghash_clmulni_intel drm aesni_intel ixgbe glue_helper crypto_simd cryptd nvme mdio dca nvme_core ahci i2c_i801 libahci [last unloaded: diag_slim_drv]
Aug 11 14:23:59 nblab37 kernel: [ 9521.264707] CPU: 32 PID: 65627 Comm: kworker/32:0 Tainted: G OE 5.4.0 #3
Aug 11 14:23:59 nblab37 kernel: [ 9521.264707] Hardware name: Supermicro SYS-220U-TNR/X12DPU-6, BIOS 1.1 08/12/2021
Aug 11 14:23:59 nblab37 kernel: [ 9521.264715] Workqueue: events psi_avgs_work
Aug 11 14:23:59 nblab37 kernel: [ 9521.264722] RIP: 0010:_raw_spin_unlock_irqrestore+0x15/0x20
Aug 11 14:23:59 nblab37 kernel: [ 9521.264724] Code: ff 7f 5b 44 89 f0 41 5c 41 5d 41 5e 41 5f 5d c3 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 49 89 f8 b8 00
Aug 11 14:23:59 nblab37 kernel: [ 9521.264725] RSP: 0018:ffffaba18d33cc80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Aug 11 14:23:59 nblab37 kernel: [ 9521.264726] RAX: 0000000000000001 RBX: ffff9941f143eec0 RCX: 0000000000000000
Aug 11 14:23:59 nblab37 kernel: [ 9521.264727] RDX: ffff9941f143eec8 RSI: 0000000000000246 RDI: 0000000000000246
Aug 11 14:23:59 nblab37 kernel: [ 9521.264727] RBP: ffffaba18d33cc80 R08: ffff9941f5ecc660 R09: 000000000002aa00
Aug 11 14:23:59 nblab37 kernel: [ 9521.264728] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Aug 11 14:23:59 nblab37 kernel: [ 9521.264728] R13: 0000000000000246 R14: 0000000000000001 R15: 0000000000000001
Aug 11 14:23:59 nblab37 kernel: [ 9522.070031] FS: 0000000000000000(0000) GS:ffff9961fec00000(0000) knlGS:0000000000000000
Aug 11 14:23:59 nblab37 kernel: [ 9522.070033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 11 14:23:59 nblab37 kernel: [ 9522.070035] CR2: 000000059669e000 CR3: 0000001e55e84001 CR4: 0000000000760ee0
Aug 11 14:23:59 nblab37 kernel: [ 9522.070036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 11 14:23:59 nblab37 kernel: [ 9522.070037] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 11 14:23:59 nblab37 kernel: [ 9522.070037] PKRU: 55555554
Aug 11 14:23:59 nblab37 kernel: [ 9522.070038] Call Trace:
Aug 11 14:23:59 nblab37 kernel: [ 9522.070040] <IRQ>
Aug 11 14:23:59 nblab37 kernel: [ 9522.070045] __wake_up_common_lock+0x8a/0xc0
Aug 11 14:23:59 nblab37 kernel: [ 9522.070047] __wake_up_sync_key+0x1e/0x30
Aug 11 14:23:59 nblab37 kernel: [ 9522.875353] sock_def_readable+0x40/0x70
Aug 11 14:23:59 nblab37 kernel: [ 9522.875357] __netlink_sendskb+0x42/0x50
Aug 11 14:23:59 nblab37 kernel: [ 9522.875360] netlink_broadcast_filtered+0x332/0x3e0
Aug 11 14:23:59 nblab37 kernel: [ 9522.875361] nlmsg_notify+0xc9/0xe0
Aug 11 14:23:59 nblab37 kernel: [ 9522.875364] ? smp_irq_move_cleanup_interrupt+0xcb/0xd2
Aug 11 14:23:59 nblab37 kernel: [ 9522.875368] rtnl_notify+0x34/0x40
Aug 11 14:23:59 nblab37 kernel: [ 9522.875371] __neigh_notify+0x86/0xd0
Aug 11 14:23:59 nblab37 kernel: [ 9522.875373] ? neigh_periodic_work+0x220/0x220
Aug 11 14:23:59 nblab37 kernel: [ 9522.875375] neigh_timer_handler+0xaa/0x280
Aug 11 14:23:59 nblab37 kernel: [ 9522.875377] call_timer_fn+0x32/0x130
Aug 11 14:23:59 nblab37 kernel: [ 9522.875378] __run_timers.part.0+0x180/0x280
Aug 11 14:23:59 nblab37 kernel: [ 9523.680682] run_timer_softirq+0x2a/0x50
Aug 11 14:23:59 nblab37 kernel: [ 9523.680685] __do_softirq+0xd1/0x2c1
Aug 11 14:23:59 nblab37 kernel: [ 9523.680689] irq_exit+0xae/0xb0
Aug 11 14:23:59 nblab37 kernel: [ 9523.680690] smp_apic_timer_interrupt+0x7b/0x140
Aug 11 14:23:59 nblab37 kernel: [ 9523.680692] apic_timer_interrupt+0xf/0x20
Aug 11 14:23:59 nblab37 kernel: [ 9523.680693] </IRQ>
Aug 11 14:23:59 nblab37 kernel: [ 9523.680694] RIP: 0010:mutex_lock+0x0/0x40
Aug 11 14:23:59 nblab37 kernel: [ 9523.680695] Code: a6 4d 62 ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 be 02 00 00 00 48 89 e5 e8 fd fa ff ff 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 8d de ff ff 31 c0 65
Aug 11 14:23:59 nblab37 kernel: [ 9523.680696] RSP: 0018:ffffaba1a2a5be28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Aug 11 14:23:59 nblab37 kernel: [ 9523.680698] RAX: 0000000000000000 RBX: ffff9941f67e9428 RCX: ffff9940570e9430
Aug 11 14:23:59 nblab37 kernel: [ 9523.680698] RDX: 0000000000000001 RSI: ffff9941fec210b0 RDI: ffff9941f67e93c8
Aug 11 14:23:59 nblab37 kernel: [ 9523.680699] RBP: ffffaba1a2a5be60 R08: 000073746e657665 R09: 8080808080808080
Aug 11 14:23:59 nblab37 kernel: [ 9523.680700] R10: ffff99605a2cef6c R11: 0000000000000018 R12: ffff9941f67e9428
Aug 11 14:23:59 nblab37 kernel: [ 9523.680700] R13: ffff9941f67e93c8 R14: 0000000000000000 R15: ffff99605a2cef00
Aug 11 14:23:59 nblab37 kernel: [ 9523.680703] ? psi_avgs_work+0x32/0xd0
Aug 11 14:23:59 nblab37 kernel: [ 9523.680705] process_one_work+0x1eb/0x3b0
Aug 11 14:23:59 nblab37 kernel: [ 9523.680707] worker_thread+0x4d/0x400
Aug 11 14:23:59 nblab37 kernel: [ 9524.486013] kthread+0x104/0x140
Aug 11 14:23:59 nblab37 kernel: [ 9524.486015] ? process_one_work+0x3b0/0x3b0
Aug 11 14:23:59 nblab37 kernel: [ 9524.486016] ? kthread_park+0x90/0x90
Aug 11 14:23:59 nblab37 kernel: [ 9524.486017] ret_from_fork+0x1f/0x40
- There are also errors in the NVMe error-log (added below)
nvme error-log /deb/nvme10
Error Log Entries for device:nvme10 entries:64
.................
Entry[ 0]
.................
error_count : 1
sqid : 129
cmdid : 0xffff
status_field : 0xc00c(INTERNAL: The command was not completed successfully due to an internal error)
parm_err_loc : 0xffff
lba : 0
nsid : 0xffffffff
vs : 0
cs : 0
- The SSD model of the Xiphos is PHAL135400VK400BGN - FW rev L0310100
nblab62:~> sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 PHAL135400VK400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme10n1 PHAL11110060400AGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme11n1 PHAL135400VT400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme12n1 PHAL135401LT400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme13n1 81N0A0C5T5M8 KCD6XLUL960G 1 571.93 GB / 960.20 GB 512 B + 0 B 0106
/dev/nvme14n1 81N0A0BQT5M8 KCD6XLUL960G 1 167.47 GB / 960.20 GB 512 B + 0 B 0106
/dev/nvme15n1 S546NE0R600370 SAMSUNG MZWLJ7T6HALA-00007 1 7.68 TB / 7.68 TB 512 B + 0 B EPK9AB5Q
/dev/nvme16n1 PHAL135400Q7400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme17n1 PHAL135400PS400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme18n1 PHAL135400N3400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme1n1 PHAL135400VZ400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme2n1 PHAL135400S5400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme3n1 PHAL135401NB400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme4n1 PHAL135401P8400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme5n1 PHAL135400KS400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme6n1 PHAL135400MD400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme7n1 PHAL135400RF400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme8n1 PHAL135400NQ400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
/dev/nvme9n1 PHAL135400U7400BGN INTEL SSDPF21Q400GB 1 400.09 GB / 400.09 GB 4 KiB + 0 B L0310100
Thanks,
Aviv G.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
No crash of kernel has been reported by the end customer - so it seems updating the Fw. (=L0310200) solve the issue.
If change I will let you know.
Thank you.
Aviv G
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
Thank you for posting on the Intel Community Support Forum.
We received your ticket regarding this particular error with the Optane SSD, and I will be reviewing this with you.
There are no details about this exact problem, but it is always recommended to make sure the drive is running the latest version.
1. Can you update the firmware using the Intel Memory and Storage Tool CLI and check if the error persists?
If the error persists, please provide us the following reports using the Intel Memory and Storage Tool CLI:
- Download
Commands:
- "intelmas show -intelssd": Displays all the Intel drives connected and each index.
- "intelmas load -intelSSD X": Updates the firmware of the drive, just replace the X with the drive index.
2. If the error persists, you can follow the instructions found in this article to generate the "SMART", "Health", and "Show All" reports.:
- "intelmas show -smart -intelssd X": Just replace the X with the correct drive index.
- "intelmas show -a -intelssd X": Displays all the drive details.
- "intelmas show -nvmelog SmartHealthInfo -intelssd X"
3. Has this been tested in other systems? or other system versions or distributions?
4. How is the drive exactly connected to the system?
I will follow up on August 19th in case additional time is required.
Regards,
Bruce C.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
I wanted to follow up on your thread in case you had any questions regarding my previous message.
I will follow up again on August 24th in case additional time is required.
Regards,
Bruce C.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you at the moment - we got form Optane SSD Eng. team a new Fw. to end customer to update (=L0310200) and a new BIOS 1.4 for SuperMicro server - we will update with results once we have them.
Thanks,
Aviv G.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
Thank you for letting me know.
I hope you encounter no problems with the new firmware provided
I will keep the thread open and will follow up on August 25th just in case.
Regards,
Bruce C.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
This post is to follow up on the status of your thread and check if everything is working fine.
I will follow up again on August 30th o provide additional time.
Regards,
Bruce C.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Last Sunday NB (=end customer) tested Fw. L0310200 for Optane SSD p5800X (without an update to the SuperMicro BIOS server) - so far no crash of kernel has been reported by them, so it seems updating the Fw. solve the issue.
If any issues will appear I will update again.
Thank you.
Aviv G.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
Good day,
Thank you for letting us know.
I'm glad to hear that everything has been working fine so far.
I will follow up on August 31th just in case.
Regards,
Bruce C.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
Good day,
This post is just a quick follow up on the status of your thread.
I will follow up again on September 5th to provide additional time in case you want to keep the thread open.
Regards,
Bruce C.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
No crash of kernel has been reported by the end customer - so it seems updating the Fw. (=L0310200) solve the issue.
If change I will let you know.
Thank you.
Aviv G
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, AvivGraupen.
Thank you for letting us know, I'm glad to hear that no more issues showed up.
Since that is the case, the thread will be closed right now and no longer monitored by Intel support, but If you require any type of assistance from Intel in the future, please open a new thread and reference this one, or contact us using any of the available support methods:
- https://www.intel.com/content/www/us/en/support/contact-intel.html
Regards,
Bruce C.
Intel Customer Support Technician

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page