Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
3,919 Views

I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs

Hi,

We are experiencing persistent I/O request timeouts on Linux with P3520/P4600 SSDs. We have tried multiple different kernels (3.10, 4.4, 4.9) and see the timeouts on all of them. The P4600 seems to be more prone to these than the P3520 though we see them on the latter as well. We have the latest firmware installed on both drives which are housed in the same machine (Supermicro 5018R-WR with X10SRW-F motherboard and E5-1650 V4 CPU). We can reproduce the timeouts by simply running mkfs -t xfs on the drive.

Here is the output from isdct (version isdct-3.0.9.400-17.x86_64):

- Intel SSD DC P3520 Series CVPF717100L01P2JGN -

Bootloader : MB1B0105

DevicePath : /dev/nvme0n1

DeviceStatus : Healthy

Firmware : MDV10271

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 0

ModelNumber : INTEL SSDPEDMX012T7

ProductFamily : Intel SSD DC P3520 Series

SerialNumber : CVPF717100L01P2JGN

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0110

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10150

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 1

ModelNumber : INTEL SSDPEDKE040T7

ProductFamily : Intel SSD DC P4600 Series

SerialNumber : BTLE736007F54P0KGN

Here are the messages the 4.9 kernel prints when using the P4600

[ 151.297903] nvme nvme1: I/O 568 QID 1 timeout, aborting

[ 151.303130] nvme nvme1: I/O 569 QID 1 timeout, aborting

[ 151.308347] nvme nvme1: I/O 570 QID 1 timeout, aborting

[ 151.313562] nvme nvme1: I/O 571 QID 1 timeout, aborting

[ 151.355465] nvme nvme1: completing aborted command with status: 0000

[ 151.411273] nvme nvme1: completing aborted command with status: 0000

[ 151.466903] nvme nvme1: completing aborted command with status: 0000

[ 151.522609] nvme nvme1: completing aborted command with status: 0000

[ 151.578226] nvme nvme1: completing aborted command with status: 0000

...

[ 165.395295] nvme nvme1: Abort status: 0x0

[ 165.399296] nvme nvme1: Abort status: 0x0

[ 165.403299] nvme nvme1: Abort status: 0x0

[ 165.407304] nvme nvme1: Abort status: 0x0

We would appreciate your help in resolving this issue.

Regards,

Shantanu Goel

Tags (1)
0 Kudos
25 Replies
Highlighted
Community Manager
310 Views

Hello Shantanu Goel,

 

 

Thank you for your interest in the Intel® SSD P3520 Series and the Intel® SSD P4600 Series.

 

 

I understand that your system is experiencing persistent I/O request timeouts.

 

 

Could you please tell me which is the specific Linux* OS distribution that you are using, and also provide a brief description of the intended use of the SSDs?

 

 

Additionally, in order to provide the adequate assistance, please share the report generated by the Intel® System Support Utility for the Linux* Operating System ( https://downloadcenter.intel.com/download/26735/).

 

 

I'll be waiting for your response.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi Andres,

We are using RHEL 6 and create a filesystem on the SSD to store data.

Please see the attached SSU output from the system.

Thanks,

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

Thank you for providing the requested file.

 

 

We will analyze the provided data and get back to you via this community thread as soon as we have relevant information.

 

 

Thank you for your patience.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

In order to further understand the issue, your system, and the troubleshooting that you have performed, could you please answer the following questions?

 

  • How many SSDs of each model are experiencing the timeout issue? How many SSDs do you have of each series?
  • Have you tried connecting the SSDs to different slots? Have you been able to test the drives in another motherboard or system?
  • Are you using any king of RAID controller?

Regarding the Intel® SSD DC P4600 Series, a new firmware version will tentatively be available within the next couple of weeks as part of the latest Intel® Solid State Drive Data Center Tool version, so please keep checking the download link https://downloadcenter.intel.com/download/27248?v=t https://downloadcenter.intel.com/download/27248?v=t, update your firmware and test again.

 

 

I'll be waiting for your response.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi,

1. We have seen this issue on at least 3 P4600s and 2 P3520s. We have a total of 8 P4600s and 4 P3520s. We are in the process of replacing the P3520s with P4600s as the workload has proven to be more write-intensive than originally anticipated.

2. We have seen the timeouts on both Supermicro (X10SRW-F) and Intel (S2600WT) systems which suggests the motherboard model is not a factor here. In our test Supermicro machine, we have the 2 SSDs installed in separate PCIe slots and both exhibit timeouts which would seem to suggest that changing the PCIe slot is not likely to resolve the issue.

3. No, we are not using any RAID controller and access the drive directly via /dev/nvme* devices.

4. We will certainly try the new firmware once it is released. If you can provide us a beta version to test sooner we would be happy to do so as well.

Regards,

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

Again, thank you for answering our questions.

 

 

We'll study this new information and as soon as I have relevant information I'll posted here.

 

 

Unfortunately, we are not able to provide a beta version of the firmware version.

 

 

Thank you for your patience.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

I would like to inform you that version 3.0.10 of the Intel® Solid State Drive Data Center Tool is now available, and it includes firmware version QDV10190 for your Intel® SSD DC P4600.

 

 

Could you please download the corresponding tool ( https://downloadcenter.intel.com/download/27497/Intel-SSD-Data-Center-Tool?v=t) update the firmware, and test again?

 

 

I'll be waiting for your response.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi,

I am afraid the new firmware does not resolve the problem.

# isdct version

- Version Information -

Name: Intel(R) Data Center Tool

Version: 3.0.10

Description: Interact and configure Intel SSDs.

# isdct show -intelssd

- Intel SSD DC P3520 Series CVPF717100L01P2JGN -

Bootloader : MB1B0105

DevicePath : /dev/nvme0n1

DeviceStatus : Healthy

Firmware : MDV10271

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 0

ModelNumber : INTEL SSDPEDMX012T7

ProductFamily : Intel SSD DC P3520 Series

SerialNumber : CVPF717100L01P2JGN

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0122

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10170

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 1

ModelNumber : INTEL SSDPEDKE040T7

ProductFamily : Intel SSD DC P4600 Series

SerialNumber : BTLE736007F54P0KGN

When I run: mkfs -t xfs -f /dev/nvme1n1

The driver still prints the following errors:

nvme 0000:02:00.0: Aborting I/O 534 QID 1

nvme 0000:02:00.0: Aborting I/O 535 QID 1

nvme 0000:02:00.0: Aborting I/O 536 QID 1

nvme 0000:02:00.0: Aborting I/O 537 QID 1

nvme 0000:02:00.0: Aborting I/O 796 QID 1

nvme 0000:02:00.0: Aborting I/O 797 QID 1

nvme 0000:02:00.0: Aborting I/O 798 QID 1

nvme 0000:02:00.0: Aborting I/O 799 QID 1

Thanks,

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

I notice from your last post that the firmware version you have currently installed on your Intel® SSD DC P4600 is QDV10170.

 

The Detailed Description in the Intel® SSD Data Center Tool site (https://downloadcenter.intel.com/download/27497%3Fv%3Dt https://downloadcenter.intel.com/download/27497?v=t) states the following:

Could you please check this example and reproduce the firmware update procedure? These images are from page 67 of the Intel® Solid State Drive Data Center Tool – User Guide (https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/Intel_SSD_DCT_3_0_x_Use... https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/Intel_SSD_DCT_3_0_x_Use...). Please keep in mind that on Linux systems, the tool must be run with root privileges. This can be done through either sudo or su commands.

Linux users must call the load function twice with a system shutdown and reboot in between.

First update:

 

The user then shuts down the system and reboots.

 

In the second update, the tool shows the next update.

 

The user shuts down the system and reboots.

In case you get any error message while performing the update, please share the screenshots associated with the firmware update process.

In a previous message you mentioned that you are using Red Hat* Enterprise Linux* 6, is it version 6.5 or 6.6?

I'll be waiting for your response.

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi,

I powercycled the system and tried running the load again but it still reports the drive as having the latest firmware. When I first downloaded and ran isdct 3.0.10 it did report having newer firmware and successfully updated it on the drive and all commands were run as root.

Here is the version of the tool:

# isdct version

- Version Information -

Name: Intel(R) Data Center Tool

Version: 3.0.10

Description: Interact and configure Intel SSDs.

When I attempt to load the firmware now, this is the output I get from the tool:

# isdct load -intelssd 1

WARNING! You have selected to update the drives firmware!

Proceed with the update? (Y|N): Y

Updating firmware...

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Status : The selected Intel SSD contains current firmware as of this tool release.

# isdct show -intelssd 1

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0122

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10170

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 1

ModelNumber : INTEL SSDPEDKE040T7

ProductFamily : Intel SSD DC P4600 Series

SerialNumber : BTLE736007F54P0KGN

The version of RHEL is 6.9

Thanks,

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

There seems to be a software compatibility issue that may be causing this, because as you can see in the following image, the Intel® SSD Data Center Tool is supported for the following operating systems, and RHEL 6.9 is not one of those:

 

Do you have access to a PC with any of the listed operating systems? Could you please try again to install the latest firmware using the official tool?

 

 

It's important for us to find out if version QDV10190 solves the issue you are experiencing.

 

 

I'll be waiting for your response.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi,

RHEL 6.6 is very old (released in 2014) and we have long since upgraded our systems to 6.9 so I am unable to test on that release. I am surprised your tool releases have not kept up with vendor OS releases. Both isdct versions 3.0.9 and 3.0.10 did update the firmware to a newer release without complaint so it is not clear what the nature of the incompatibility is here since the tool itself does not print message indicating as such.

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

Thank you for your feedback.

Regarding your comment:

 

 

Both isdct versions 3.0.9 and 3.0.10 did update the firmware to a newer release without complaint so it is not clear what the nature of the incompatibility is here since the tool itself does not print message indicating as such.

 

 

Are you referring to an update to firmware version QDV10170 or to firmware version QDV10190? Have you been able to update the SSDs that do not show the persistent I/O request timeouts? Do you have any Intel® SSD DC P4600 with firmware version QDV10190?

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi,

I was referring to the fact that on the test machine we initially used isdct 3.0.9 to upgrade the P4600 firmware version from QDV10130 to QDV10150 and isdct 3.0.10 subsequently from QDV10150 to QDV10170. As I posted in the output above isdct 3.0.10 shows QDV10170 as the latest revision of the firmware available and states that the drive already has that revision installed on it. It does not report QDV10190 as being available. Could this be a discrepancy in the firmware revision between the documentation and the tool itself?

The P4600s we tried deploying in production have firmware QDV10130 and they all exhibit the timeouts so until this issue is resolved, these drives are unusable for us. We have had great success with your SATA SSDs (S3700, S3600, S3610, S3520) on various different versions of the OS and Linux kernels which is why we purchased their NVMe counterparts but as I now, the experience with them has been a disappointing one so we would really appreciate help in resolving the issue.

Thanks,

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

Thank you for your feedback.

 

 

I'll inform the corresponding team of the current state of the issue you are experiencing, and if there is an explanation for this kind of discrepancy in the firmware version.

 

 

I'll contact you as soon as I have more information.

 

 

Thank you for your patience.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

I would like to inform you that the discrepancy was due to a documentation error that has already been fixed. As you can see here https://downloadcenter.intel.com/download/27497?v=t https://downloadcenter.intel.com/download/27497?v=t, under the What's new? section:

 

 

Intel® SSD DC P4500/P4600 Series products; latest firmware revision QDV10170

 

 

This means that you have installed the adequate firmware version.

 

 

Regarding the timeouts issue, we are still investigating, and I will get in touch with you when we find something relevant.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi Andres,

Thank you for the information and please keep us posted.

Regards,

Shantanu

0 Kudos
Highlighted
Community Manager
310 Views

Hello Shantanu,

 

 

I just wanted to let you know that all the information associated with the issue you are experiencing has been escalated to our engineering team.

 

 

As soon as I receive any update from them I'll contact you.

 

 

Regards,

 

Andres V.
0 Kudos
Highlighted
Beginner
310 Views

Hi Andres,

Thanks for escalating this issue and please keep us posted.

Regards,

Shantanu

0 Kudos