Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Community Manager
2,554 Views

nvme 3608 error msg: IO queues not created

Hi all,

On my

 

server (HPE DL380 G9) I have a Problem with my P3608 4TB Card.

After about 3 months in operations, I lose the first controller after a reboot, the second one by the next reboot. And now, I can't access the card with the isdct tool. It seems that the controller is offline.

Short description:

I have created a raid 0 with two cards (one file system whit 7.4 TB capacity).

 

I use a Debian 8 system with the newest SSD firmware (8DV101F0)

 

 

Does anyone have a good idea?

 

 

Thx

Roger

0 Kudos
21 Replies
Highlighted
Community Manager
162 Views

Hello Roger_MCH,

 

 

First of all, we would like to know if you followed http://www.intel.com/content/dam/support/us/en/documents/ssdc/data-center-ssds/Intel_Linux_NVMe_Guid... these instructions when it was working?

 

 

What kind of workload do you put on the SSD?

 

We will be waiting for your response, in case you need further assistance let us know here, or contact our http://www.intel.de/content/www/de/de/support/contact-support.html# @18 support department.

 

 

Regards,

 

NC
0 Kudos
Highlighted
Community Manager
162 Views

Hello NC,

Many thanks for your Feedback. We use "Proxmox" as a virtualization solution. This solution is using the Ubuntu 16.04 LTS 4.4 kernel. The driver is activated by default.

 

 

In the current system, I have combine two P3608 4TB cards to a Raid 0 volume (# zpool create -f -o ashift = 12 ssd_pool /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1)

 

 

On the server runs a weather application which imports and writes many data. The data can be weather models (large files) or small files like point data.

It is already the second card which has logged off respectively has deactivated. I have opened a support case by Intel; the result was a 1:1 Replacement without any additional information.

Regards,

Roger

0 Kudos
Highlighted
Community Manager
162 Views

Hello Roger_MCH,

 

 

Without performing more troubleshooting and getting more data, it may be hard to positively diagnose the cause of your drive failures.

 

 

However, it might be worth noting that each P3608 counts as two drives. So if you RAID 0 two pairs of these SSDs, it would be the same as creating a RAID 0 array with four SSDs. Which does increase the expected failure rate by about 80%. Perhaps a RAID 10 would be a better option?

 

 

The endurance rating of the Intel® SSD DC P3608 allows for 3 drive writes per day, or 21.90 Petabytes Written. Exceeding this would be another possible cause for the drive to fail earlier than expected.

 

 

In many cases if an SSD fails in a raid, we recommend removing the drive from the array and performing a secure erase/low level format. More often than not, this allows the drive to recover successfully. Although this may have been out of the question if your drives were no longer detected at BIOS level.

 

 

Best regards,

 

Carlos A.
0 Kudos
Highlighted
Community Manager
162 Views

Hi Carlos

I also think that a Raid 10 would be better but the cost is also much higher. We write about 500GB ~ 700GB data per day to the disc. This should be not a problem.

 

Question, how I can do a secure erase/low Level format from the ssd-card respectively from the effected controller?

 

 

Error message for the /var/log/messages

Dec 14 17:58:51 zuenjv05 kernel: [ 9.822799] nvme 0000:87:00.0: Failed status: 0x3, reset controller.

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823291] nvme 0000:87:00.0: Cancelling I/O 0 QID 4

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823294] nvme 0000:87:00.0: Cancelling I/O 1 QID 4

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823296] nvme 0000:87:00.0: Cancelling I/O 2 QID 4

Dec 14 17:58:51 zuenjv05 kernel: [ 9.823298] nvme 0000:87:00.0: Cancelling I/O 0 QID 1

Dec 14 17:58:51 zuenjv05 kernel: [ 11.383135] nvme 0000:87:00.0: IO queues not created

 

 

Some additional details:

mailto:root@zuenjv05:/sys root@zuenjv05:/sys# lspci -nn | grep -i ssd

0d:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

0e:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

86:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

87:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)

root@zuenjv05:/sys#

 

find . -name "*nvme*"

./bus/pci/drivers/nvme

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme/nvme0

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme/nvme0/nvme0n1

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme/nvme1

./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme/nvme1/nvme1n1

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:01.0/0000:86:00.0/nvme

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:01.0/0000:86:00.0/nvme/nvme2

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:02.0/0000:87:00.0/nvme

./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:02.0/0000:87:00.0/nvme/nvme3

./block/nvme0n1

./block/nvme1n1

./class/nvme

./class/nvme/nvme0

./class/nvme/nvme1

./class/nvme/nvme2

./class/nvme/nvme3

./class/block/nvme0n1

./class/block/nvme1n1

./module/nvme

./module/nvme/drivers/pci:nvme

2234 Handle 0x00EC, DMI type 203, 34 bytes

2235 OEM-specific Type

2236

 

Header and Data:

2237 CB 22 EC 00 FE FF FE FF 86 80

 

53 09 86 80 09 37

2238 01 08 EB 00 00 00 10 0A 02 01

 

FF FF 01 02 03 04

2239 00 00

2240

 

Strings:

2241

 

PciRoot(0x0)/Pci(0x3,0x2)/Pci(0x0,0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)

2242 NVMe.Slot.2.1

2243 NVM Express Controller

2244 Slot 2

2245

2246 Handle 0x00ED, DMI type 203, 34 bytes

2247 OEM-specific Type

2248

 

Header and Data:

2249 CB 22 ED 00 FE FF FE FF 86 80 53 09 86 80 09

 

37

2250 01 08 EB 00 00 00 10 0A 02 02

 

FF FF 01 02 03 04

2251 00 00

2252

 

Strings:

2253

 

PciRoot(0x0)/Pci(0x3,0x2)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)

2254 NVMe.Slot.2.2

2255 NVM Express Controller

2256 Slot 2

2282 Handle

 

0x00F0, DMI type 203, 34 bytes

2283 OEM-specific Type

2284

 

Header and Data:

2285

 

CB 22 F0 00 FE FF FE FF 86

 

80 53 09 86 80 09 37

2286 01 08 EF 00 00 00 09 0A 05 02

 

FF FF 01 02 03 04

2287 00 00

2288

 

Strings:

2289

 

PciRoot(0x1)/Pci(0x2,0x0)/Pci(0x0,0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)

2290 PCI.Slot.5.2

2291 NVM Express Controller

2292 Slot 5

2293

2294 Handle 0x00F1, DMI type 203, 34 bytes

2295 OEM-specific Type

2296

 

Header and Data:

2297

 

CB 22 F1 00 FE FF FE

 

FF 86 80 53 09 86 80 09 37

2298 01 08 EF 00 00 00 09 0A 05 03

 

FF FF 01 02 03 04

2299 00 00

2300

 

Strings:

2301

 

PciRoot(0x1)/Pci(0x2,0x0)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)

2302 PCI.Slot.5.3

2303 NVM Express Controller

2304 Slot 5

2305

root@zuenjv05:/dev#

 

dmidecode --type 9

# dmidecode

 

2.12

SMBIOS 2.8

 

present.

....

....

Handle

 

0x00BC, DMI type 9, 17 bytes

System Slot

X...

0 Kudos
Highlighted
Community Manager
162 Views

Hello Roger_MCH,

 

 

The same tool we always recommend for monitoring data center drives can also be used to perform a secure erase:

 

 

- https://downloadcenter.intel.com/download/26221/Intel-SSD-Data-Center-Tool Intel® SSD Data Center Tool (available for Windows*, VMware*, and Linux*).

 

- http://www.intel.com/content/www/us/en/support/solid-state-drives/000020016.html User and Installation Guides for ISCT.

 

 

If you haven't already completed the RMA process for your drives, you may try this out and let us know if this helps.

 

 

Best regards,

 

Carlos A.
0 Kudos
Highlighted
Community Manager
162 Views

Hi Carlos A.

I know the document.

I had installed the card in a Windows Workstation, including the Data Center Tool software. The result was the same.

Status: Selected drive is in a disable Logical state.

What is the possibility to enable the card?

It seems that I'm not alone with this problem.

Merry christmas to you all.

Regards,

Roger

0 Kudos
Highlighted
Community Manager
162 Views

Hello Roger_MCH,

 

 

You may try re-seating the drive, and then running the delete command again. But since the drives are part of a raid array, it would be best to try doing this in a different system.

 

 

However, if it all fails, then the next step would be to proceed with the RMA.

 

 

We hope you have a wonderful Christmas weekend as well.

 

 

Best regards

 

Carlos A.
0 Kudos
Highlighted
Community Manager
162 Views

Hello Carlos A.

I removed the card from the source system and installed it in a Windows computer. Unfortunately without any success.

 

 

Many thanks for your support.

 

Greetings and a happy new year.

 

 

Roger
0 Kudos
Highlighted
Community Manager
162 Views

Hi Roger_MCH,

 

 

We understand you removed the SSD and installed it again in a different system (Windows*); but, it is still not working, correct?

 

 

Can you please let us know what is the status of the LED lights on your SSD?

 

Also, could you please run Intel® SSD Data Center Tool and type: isdct.exe show -a

 

 

We will be waiting for your response.

 

 

Regards,

 

NC
0 Kudos
Highlighted
Community Manager
162 Views

Hi all,

I wish you a good start into the new year!

The card was installed in an HP DL380 G9 server.

 

I have had removed the (disabled) card from the server and I had installed it in a Windows workstation.

The LED status

The command output

C:\isdct>isdct.exe show -a -intelssd 0

- Intel SSD DC P3608 Series CVF85486002X4P0DGN-1 -

AggregationThreshold : Selected drive is in a disable logical state.

 

AggregationTime : Selected drive is in a disable logical state.

 

ArbitrationBurst : Selected drive is in a disable logical state.

 

Bootloader : 8B1B0133

 

CoalescingDisable : Selected drive is in a disable logical state.

 

ControllerCompatibleIDs : PCI\\VEN_8086&DEV_0953&REV_02PCI\\VEN_8086&DEV_0953PCI\\VEN_8086&CC_010802PCI\\VEN_8086&CC_0108PCI\\VEN...

 

ControllerDescription : @oem168.inf,%pci\\ven_8086&dev_0953.devicedesc%;Intel(R) Solid-State Drive P3700/P3600/P3500/750 Series

 

ControllerID : PCI\\VEN_8086&DEV_0953&SUBSYS_37098086&REV_02\\6&37F79443&0&00080008

 

ControllerIDEMode : False

 

ControllerManufacturer : @oem168.inf,%intel%;Intel

 

ControllerService : IaNVMe

 

DevicePath : \\\\.\\SCSI1:

 

DeviceStatus : *ASSERT_405C786C AA

 

DriverDescription : Intel(R) Solid-State Drive P3700/P3600/P3500/750 Series

 

DriverMajorVersion : 1

 

DriverManufacturer : Intel

 

DriverMinorVersion : 7

 

ErrorString : *ASSERT_405C786C AA

 

Firmware : 8DV101F0

 

FirmwareUpdateAvailable : Please contact Intel Customer Support for further assistance at the following website: http://www.intel.com/go/ssdsupport http://www.intel.com/go/ssdsupport.

 

HighPriorityWeightArbitration : Selected drive is in a disable logical state.

 

IOCompletionQueuesRequested : Selected drive is in a disable logical state.

 

IOSubmissionQueuesRequested : Selected drive is in a disable logical state.

 

Index : 0

 

Intel : True

 

IntelGen3SATA : False

 

IntelNVMe : True

 

InterruptVector : Selected drive is in a disable logical state.

 

LatencyTrackingEnabled : Selected drive is in a disable logical state.

 

LowPriorityWeightArbitration : Selected drive is in a disable logical state.

 

MediumPriorityWeightArbitration : Selected drive is in a disable logical state.

 

ModelNumber : INTEL SSDPECME040T4

 

NVMeControllerID : 0

 

NVMeMajorVersion : 1

 

NVMeMinorVersion : 0

 

NVMePowerState : Selected drive is in a disable logical state.

 

NVMeTertiaryVersion : 0

 

NamespaceId : 4294967295

 

NativeMaxLBA : Selected drive is in a disable logical state.

 

NumErrorLogPageEntries : 63

 

OEM : Generic

 

PCILinkGenSpeed : 3

 

PCILinkWidth : 4

 

PNPString : PCI\\VEN_8086&DEV_0953&SUBSYS_37098086&REV_02\\6&37F79443&0&00080008

 

PowerGovernorMode : Selected drive is in a disable logical state.

 

Product : Fultondale X8

 

ProductFamily : Intel SSD DC P3608 Series

 

ProductProtocol : NVME

 

SCSIPortNumber : 1

 

SMARTEnabled : True

 

SMARTHealthCriticalWarningsConfiguration : Selected drive is in a disable logical state.

 

SMBusAddress : Selected drive is in a disable logical state.

 

SectorSize : 512

 

SerialNumber : CVF85486002X4P0DGN-1

 

TCGSupported : False

 

TempThreshold : Selected drive is in a disable logical state.

 

TimeLimitedErrorRecovery : Selected drive is in a disable logical state.

 

TrimSupported : True

 

VolatileWriteCacheEnabled : Selected drive is in a disable logical state.

 

WriteAtomicityDisableNormal : Selected drive is in a disable logical state.

C:\isdct>isdct.exe show -a -intelssd 1

- Intel SSD DC P3608 Series CVF85486002X4P0DGN-2 -

AggregationThreshold : Selected drive is in a disable logical state.

 

AggregationTime : Selected drive is in a disable logical state.

 

ArbitrationBurst : Selected drive is in a disable logical state.

 

Bootloader : 8B1B0133

 

CoalescingDisable : Selected drive is in a disable logical state.

 

ControllerCompatibleIDs : PCI\\VEN_8086&DEV_0953&REV_02PCI\\VEN_8086&DEV_0953PCI\\VEN_8086&CC_010802PCI\\VEN_8086&CC_0108PCI\\VEN...

 

ControllerDescription : @oem168.inf,%pci\\ven_8086&dev_0953.devicedesc%;Intel(R) Solid-State Drive P3700/P3600/P3500/750 Series

 

ControllerID : PCI\\VEN_8086&DEV_0953&SUBSYS_37098086&REV_02\\6&146834E4&0&00100008

 

ControllerIDEMode : False

 

ControllerManufacturer : @oem168.inf,%intel%;Intel

 

ControllerS...
0 Kudos
Highlighted
Community Manager
162 Views

Hello Roger_MCH,

 

 

Happy new year as well!

 

 

From what we can see, the drive keeps showing as disabled logical state, but the light is solid green.

 

We need to ask you first, do you have another P3608 card? The one tested in Linux?

 

 

This because that one shows just fine, as it should, when you ran the command and checking the ISNs of each card, we do see those are different. If this is the case, we suggest you to get in touch with our http://www.intel.eu/content/www/eu/en/support/contact-support.html# @18 support department to get further assistance.

 

Please let us know once you get in contact with them.

 

 

Regards,

 

NC

 

0 Kudos
Highlighted
Community Manager
162 Views

Hi NC,

What do you mean with checking the ISNs (Intel Software Network)?

Yes, we had bought a lot of these cards.

And yes, I had opened two Intel support tickets (01911082 and 02193992) from a long time. But the answer respectively the analyze/resolving from the problem is/was not very helpful.

Thanks again for your support.

Regards,

Roger

0 Kudos
Highlighted
Community Manager
162 Views

Hi Roger_MCH,

 

 

First of all, the ISN is the number we use on each SSD as an identifier and it is unique. (Kind of like a serial number).

 

 

We will verify the case numbers you provided and we will get back to you with a response.

 

 

Regards,

 

NC
0 Kudos
Highlighted
Community Manager
162 Views

Hello Roger_MCH,

 

 

Let us inform you that both cases are escalated and our engineering team is working on this. We will get back to you soon, either from this thread or by e-mail.

 

 

Regards,

 

NC
0 Kudos
Highlighted
Community Manager
162 Views

Roger_MCH,

 

 

We would like to verify something else with you before continuing with the resolution for this case:

 

 

-Can you confirm if the cards are connected directly to the server and not to a RAID controller card?

 

-Did you install our NVMe* driver?

 

-Did you try the cards with a supported OS and tested Intel® Data Center Tool?

 

 

This in order to discard any other card functionality. We will be waiting for your response.

 

 

Regards,

 

NC

 

0 Kudos
Highlighted
Community Manager
162 Views

Hi Roger_MCH,

 

 

We are following up and we would like to know if you can reply back to the questions we sent on previous post.

 

In case you don't need further assistance, please let us know as well.

 

 

Regards,

 

NC
0 Kudos
Highlighted
Community Manager
162 Views

Hi NC,

- The cards are installed into the PCIe 8x Slots (whiteout any other devices). I had built the volume with ZFS over all p3608 controllers.

- Proxmox use the NVMe Driver from Ubuntu 16.04.

- Yes see my answer in the community from 23.12.2016 to Carlos (Windows 7).

Notes

I use the following Linux-Support packages Intel® SSD Data Center Tool DataCenterTool_3_0_2_Linux.zip (Version: 3.0.2) for the reporting.

Question

For Windows exist a new Version of Intel® SSD Data Center Family for NVMe Drivers (Version: 1.8.0.1011) but not for Linux. Do you have the possibility to provide the newest version?

Regards,

Roger

0 Kudos
Highlighted
Community Manager
162 Views

Hi Roger_MCH,

 

 

Thanks for the details provided.

 

We will be addressing your question with our team, we will keep you posted.

 

 

Regards,

 

NC
0 Kudos
Highlighted
Community Manager
162 Views

Hi Roger_MCH,

 

 

We understand our support department got in touch with you already, please in case of any other question or help needed, do not hesitate to contact us back again.

 

 

Have a nice weekend.

 

 

Regards,

 

NC
0 Kudos