cancel
Showing results for 
Search instead for 
Did you mean: 

nvme 3608 error msg: IO queues not created

idata
Esteemed Contributor III

Hi all,

On my

server (HPE DL380 G9) I have a Problem with my P3608 4TB Card.

After about 3 months in operations, I lose the first controller after a reboot, the second one by the next reboot. And now, I can't access the card with the isdct tool. It seems that the controller is offline.

Short description:

I have created a raid 0 with two cards (one file system whit 7.4 TB capacity).

I use a Debian 8 system with the newest SSD firmware (8DV101F0)Does anyone have a good idea?Thx

Roger

21 REPLIES 21

idata
Esteemed Contributor III

Hello Roger_MCH,

First of all, we would like to know if you followed http://www.intel.com/content/dam/support/us/en/documents/ssdc/data-center-ssds/Intel_Linux_NVMe_Guid... these instructions when it was working?What kind of workload do you put on the SSD?We will be waiting for your response, in case you need further assistance let us know here, or contact our http://www.intel.de/content/www/de/de/support/contact-support.html# @18 support department.Regards,NC

idata
Esteemed Contributor III

Hello NC,

Many thanks for your Feedback. We use "Proxmox" as a virtualization solution. This solution is using the Ubuntu 16.04 LTS 4.4 kernel. The driver is activated by default.

In the current system, I have combine two P3608 4TB cards to a Raid 0 volume (# zpool create -f -o ashift = 12 ssd_pool /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1)On the server runs a weather application which imports and writes many data. The data can be weather models (large files) or small files like point data.

It is already the second card which has logged off respectively has deactivated. I have opened a support case by Intel; the result was a 1:1 Replacement without any additional information.

Regards,

Roger

idata
Esteemed Contributor III

Hello Roger_MCH,

Without performing more troubleshooting and getting more data, it may be hard to positively diagnose the cause of your drive failures.However, it might be worth noting that each P3608 counts as two drives. So if you RAID 0 two pairs of these SSDs, it would be the same as creating a RAID 0 array with four SSDs. Which does increase the expected failure rate by about 80%. Perhaps a RAID 10 would be a better option?The endurance rating of the Intel® SSD DC P3608 allows for 3 drive writes per day, or 21.90 Petabytes Written. Exceeding this would be another possible cause for the drive to fail earlier than expected.In many cases if an SSD fails in a raid, we recommend removing the drive from the array and performing a secure erase/low level format. More often than not, this allows the drive to recover successfully. Although this may have been out of the question if your drives were no longer detected at BIOS level.Best regards,Carlos A.

idata
Esteemed Contributor III

Hi Carlos

I also think that a Raid 10 would be better but the cost is also much higher. We write about 500GB ~ 700GB data per day to the disc. This should be not a problem.

Question, how I can do a secure erase/low Level format from the ssd-card respectively from the effected controller?

Error message for the /var/log/messages

Dec 14 17:58:51 zuenjv05 kernel: [ 9.822799] nvme 0000:87:00.0: Failed status: 0x3, reset controller.

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823291] nvme 0000:87:00.0: Cancelling I/O 0 QID 4

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823294] nvme 0000:87:00.0: Cancelling I/O 1 QID 4

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823296] nvme 0000:87:00.0: Cancelling I/O 2 QID 4

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823298] nvme 0000:87:00.0: Cancelling I/O 0 QID 1

Dec 14 17:58:51 zuenjv05 kernel: [ 11.383135] nvme 0000:87:00.0: IO queues not created

Some additional details:

mailto:root@zuenjv05:/sys root@zuenjv05:/sys# lspci -nn | grep -i ssd

0d:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

0e:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

86:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

87:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

root@zuenjv05:/sys#

find . -name "*nvme*"

./bus/pci/drivers/nvme

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme/nvme0

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme/nvme0/nvme0n1

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme/nvme1

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme/nvme1/nvme1n1

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:01.0/0000:86:00.0/nvme

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:01.0/0000:86:00.0/nvme/nvme2

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:02.0/0000:87:00.0/nvme

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:02.0/0000:87:00.0/nvme/nvme3

./block/nvme0n1

./block/nvme1n1

./class/nvme

./class/nvme/nvme0

./class/nvme/nvme1

./class/nvme/nvme2

./class/nvme/nvme3

./class/block/nvme0n1

./class/block/nvme1n1

./module/nvme

./module/nvme/drivers/pci:nvme

2234 Handle 0x00EC, DMI type 203, 34 bytes

2235 OEM-specific Type

2236

Header and Data:

2237 CB 22 EC 00 FE FF FE FF 86 80

53 09 86 80 09 37

2238 01 08 EB 00 00 00 10 0A 02 01

FF FF 01 02 03 04

2239 00 00

2240

Strings:

2241

PciRoot(0x0)/Pci(0x3,0x2)/Pci(0x0,0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)

2242 NVMe.Slot.2.1

2243 NVM Express Controller

2244 Slot 2

2245

2246 Handle 0x00ED, DMI type 203, 34 bytes

2247 OEM-specific Type

2248

Header and Data:

2249 CB 22 ED 00 FE FF FE FF 86 80 53 09 86 80 09

37

2250 01 08 EB 00 00 00 10 0A 02 02

FF FF 01 02 03 04

2251 00 00

2252

Strings:

2253

PciRoot(0x0)/Pci(0x3,0x2)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)

2254 NVMe.Slot.2.2

2255 NVM Express Controller

2256 Slot 2

2282 Handle

0x00F0, DMI type 203, 34 bytes

2283 OEM-specific Type

2284

Header and Data:

2285

CB 22 F0 00 FE FF FE FF 8680 53 09 86 80 09 37

2286 01 08 EF 00 00 00 09 0A 05 02

FF FF 01 02 03 04

2287 00 00

2288

Strings:

2289

PciRoot(0x1)/Pci(0x2,0x0)/Pci(0x0,0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)

2290 PCI.Slot.5.2

2291 NVM Express Controller

2292 Slot 5

2293

2294 Handle 0x00F1, DMI type 203, 34 bytes

2295 OEM-specific Type

2296

Header and Data:

2297

CB 22 F1 00 FE FF FEFF 86 80 53 09 86 80 09 37

2298 01 08 EF 00 00 00 09 0A 05 03

FF FF 01 02 03 04

2299 00 00

2300

Strings:

2301

PciRoot(0x1)/Pci(0x2,0x0)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)

2302 PCI.Slot.5.3

2303 NVM Express Controller

2304 Slot 5

2305

root@zuenjv05:/dev#

dmidecode --type 9

# dmidecode

2.12

SMBIOS 2.8

present.

....

....

Handle

0x00BC, DMI type 9, 17 bytes

System Slot

X...