cancel
Showing results for 
Search instead for 
Did you mean: 

I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs

SGoel5
New Contributor

Hi,

We are experiencing persistent I/O request timeouts on Linux with P3520/P4600 SSDs. We have tried multiple different kernels (3.10, 4.4, 4.9) and see the timeouts on all of them. The P4600 seems to be more prone to these than the P3520 though we see them on the latter as well. We have the latest firmware installed on both drives which are housed in the same machine (Supermicro 5018R-WR with X10SRW-F motherboard and E5-1650 V4 CPU). We can reproduce the timeouts by simply running mkfs -t xfs on the drive.

Here is the output from isdct (version isdct-3.0.9.400-17.x86_64):

- Intel SSD DC P3520 Series CVPF717100L01P2JGN -

Bootloader : MB1B0105

DevicePath : /dev/nvme0n1

DeviceStatus : Healthy

Firmware : MDV10271

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 0

ModelNumber : INTEL SSDPEDMX012T7

ProductFamily : Intel SSD DC P3520 Series

SerialNumber : CVPF717100L01P2JGN

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0110

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10150

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 1

ModelNumber : INTEL SSDPEDKE040T7

ProductFamily : Intel SSD DC P4600 Series

SerialNumber : BTLE736007F54P0KGN

Here are the messages the 4.9 kernel prints when using the P4600

[ 151.297903] nvme nvme1: I/O 568 QID 1 timeout, aborting

[ 151.303130] nvme nvme1: I/O 569 QID 1 timeout, aborting

[ 151.308347] nvme nvme1: I/O 570 QID 1 timeout, aborting

[ 151.313562] nvme nvme1: I/O 571 QID 1 timeout, aborting

[ 151.355465] nvme nvme1: completing aborted command with status: 0000

[ 151.411273] nvme nvme1: completing aborted command with status: 0000

[ 151.466903] nvme nvme1: completing aborted command with status: 0000

[ 151.522609] nvme nvme1: completing aborted command with status: 0000

[ 151.578226] nvme nvme1: completing aborted command with status: 0000

...

[ 165.395295] nvme nvme1: Abort status: 0x0

[ 165.399296] nvme nvme1: Abort status: 0x0

[ 165.403299] nvme nvme1: Abort status: 0x0

[ 165.407304] nvme nvme1: Abort status: 0x0

We would appreciate your help in resolving this issue.

Regards,

Shantanu Goel

25 REPLIES 25

SGoel5
New Contributor

Hi,

1. We have seen this issue on at least 3 P4600s and 2 P3520s. We have a total of 8 P4600s and 4 P3520s. We are in the process of replacing the P3520s with P4600s as the workload has proven to be more write-intensive than originally anticipated.

2. We have seen the timeouts on both Supermicro (X10SRW-F) and Intel (S2600WT) systems which suggests the motherboard model is not a factor here. In our test Supermicro machine, we have the 2 SSDs installed in separate PCIe slots and both exhibit timeouts which would seem to suggest that changing the PCIe slot is not likely to resolve the issue.

3. No, we are not using any RAID controller and access the drive directly via /dev/nvme* devices.

4. We will certainly try the new firmware once it is released. If you can provide us a beta version to test sooner we would be happy to do so as well.

Regards,

Shantanu

idata
Esteemed Contributor III

Hello Shantanu,

Again, thank you for answering our questions.We'll study this new information and as soon as I have relevant information I'll posted here.Unfortunately, we are not able to provide a beta version of the firmware version.Thank you for your patience.Regards,Andres V.

idata
Esteemed Contributor III

Hello Shantanu,

I would like to inform you that version 3.0.10 of the Intel® Solid State Drive Data Center Tool is now available, and it includes firmware version QDV10190 for your Intel® SSD DC P4600. Could you please download the corresponding tool ( https://downloadcenter.intel.com/download/27497/Intel-SSD-Data-Center-Tool?v=t) update the firmware, and test again? I'll be waiting for your response. Regards,Andres V.

SGoel5
New Contributor

Hi,

I am afraid the new firmware does not resolve the problem.

# isdct version

- Version Information -

Name: Intel(R) Data Center Tool

Version: 3.0.10

Description: Interact and configure Intel SSDs.

# isdct show -intelssd

- Intel SSD DC P3520 Series CVPF717100L01P2JGN -

Bootloader : MB1B0105

DevicePath : /dev/nvme0n1

DeviceStatus : Healthy

Firmware : MDV10271

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 0

ModelNumber : INTEL SSDPEDMX012T7

ProductFamily : Intel SSD DC P3520 Series

SerialNumber : CVPF717100L01P2JGN

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0122

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10170

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 1

ModelNumber : INTEL SSDPEDKE040T7

ProductFamily : Intel SSD DC P4600 Series

SerialNumber : BTLE736007F54P0KGN

When I run: mkfs -t xfs -f /dev/nvme1n1

The driver still prints the following errors:

nvme 0000:02:00.0: Aborting I/O 534 QID 1

nvme 0000:02:00.0: Aborting I/O 535 QID 1

nvme 0000:02:00.0: Aborting I/O 536 QID 1

nvme 0000:02:00.0: Aborting I/O 537 QID 1

nvme 0000:02:00.0: Aborting I/O 796 QID 1

nvme 0000:02:00.0: Aborting I/O 797 QID 1

nvme 0000:02:00.0: Aborting I/O 798 QID 1

nvme 0000:02:00.0: Aborting I/O 799 QID 1

Thanks,

Shantanu

idata
Esteemed Contributor III

Hello Shantanu,

I notice from your last post that the firmware version you have currently installed on your Intel® SSD DC P4600 is QDV10170.

The Detailed Description in the Intel® SSD Data Center Tool site (https://downloadcenter.intel.com/download/27497%3Fv%3Dt https://downloadcenter.intel.com/download/27497?v=t) states the following:

Could you please check this example and reproduce the firmware update procedure? These images are from page 67 of the Intel® Solid State Drive Data Center Tool – User Guide (https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/Intel_SSD_DCT_3_0_x_Use... https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/Intel_SSD_DCT_3_0_x_Use...). Please keep in mind that on Linux systems, the tool must be run with root privileges. This can be done through either sudo or su commands.

Linux users must call the load function twice with a system shutdown and reboot in between.

First update:

The user then shuts down the system and reboots.In the second update, the tool shows the next update.

The user shuts down the system and reboots.

In case you get any error message while performing the update, please share the screenshots associated with the firmware update process.

In a previous message you mentioned that you are using Red Hat* Enterprise Linux* 6, is it version 6.5 or 6.6?

I'll be waiting for your response.

Regards,

Andres V.