We are experiencing persistent I/O request timeouts on Linux with P4500/P4510 NVME drives.
The systems we are experiencing these request timeouts on are RHEL 7.5. The kernel version is 3.10.0-957.1.3.el7.x86_64.
The P4500 drives are on firmware version DV10170.
The P4510 drives are on firmware version VDV10131.
Both systems have 15 drives; one has 10 P4500 drives and 5 P4510 drives, and the other has 15 P4510 drives.
This is the output of one of the drives; all drives are showing these errors:
nvme nvme14: I/O 487 QID 83 timeout, aborting
nvme nvme14: Abort status: 0x0
nvme nvme14: I/O 487 QID 83 timeout, reset controller
INFO: task dmcrypt_write:2573 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Attached is the output of ssu.sh on the system with 10 P4500 and 5 P4510 drives.