Community
cancel
Showing results for 
Search instead for 
Did you mean: 
MHick15
Beginner
697 Views

Intel VROC software RAID5 failed to assemble after single disk failure

Dear all,

 

I created a RAID5 array using Intel VROC following the recommandations provided here:

https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/ssd-software/Linux_VROC...

 

After a single drive failed yesterday, the drives are marked as removed and the state changed to "active , FAILED, Not Started". the full mdadm --detail output can be found below.

 

[root@nfs41 tmp]# mdadm --detail /dev/md124                                                                                                 

/dev/md124:

        Container : /dev/md/imsm1, member 0

     Raid Devices : 9

    Total Devices : 8

 

            State : active, FAILED, Not Started

   Active Devices : 8

  Working Devices : 8

   Failed Devices : 0

    Spare Devices : 0

 

Consistency Policy : unknown

 

 

             UUID : 62b45e71:ad4983c4:8bb60881:54701169

   Number  Major  Minor  RaidDevice State

      -      0       0       0     removed

      -      0       0       1     removed

      -      0       0       2     removed

      -      0       0       3     removed

      -      0       0       4     removed

      -      0       0       5     removed

      -      0       0       6     removed

      -      0       0       7     removed

      -      0       0       8     removed

 

      -    259      16       8     sync  /dev/nvme9n1

      -    259       9       1     sync  /dev/nvme11n1

      -    259      11       2     sync  /dev/nvme12n1

      -    259       1       3     sync  /dev/nvme13n1

      -    259       6       4     sync  /dev/nvme14n1

      -    259      13       5     sync  /dev/nvme15n1

      -    259      12       6     sync  /dev/nvme16n1

      -    259      10       7     sync  /dev/nvme17n1

 

[root@nfs41 tmp]# mdadm --detail /dev/md126

/dev/md126:

          Version : imsm

       Raid Level : container

    Total Devices : 8

 

  Working Devices : 8

 

 

             UUID : 55e62d4e:b4d10491:8ca2f983:286df7c9

    Member Arrays : /dev/md124

 

   Number  Major  Minor  RaidDevice

 

      -    259      16       -       /dev/nvme9n1

      -    259       9       -       /dev/nvme11n1

      -    259      11       -       /dev/nvme12n1

      -    259       1       -       /dev/nvme13n1

      -    259       6       -       /dev/nvme14n1

      -    259      13       -       /dev/nvme15n1

      -    259      12       -       /dev/nvme16n1

      -    259      10       -       /dev/nvme17n1

 

 

Any idea how to get the RAID working again?

 

Cheers,

Markus

0 Kudos
11 Replies
JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

Thank you for contacting Intel® Memory & Storage Support.

 

As we understand, you need assistance regarding your Intel® Virtual RAID on CPU (Intel® VROC). If we infer correctly, we will appreciate it if you can provide us with your system configuration including manufacturer, model and part number of all your system components and a copy of the SSU logs to try to have a better understanding of the system configuration.

 

1-    Go to https://downloadcenter.intel.com/download/26735/ and download the software.

2-    When finished downloading it, open it.

3-    Attach the file obtained to your reply post.

 

We will appreciate it if you can provide us with further information regarding the disk that failed, are you able to test it outside of the RAID array to check your drive health.

 

Would you mind to share some screenshots of the pre-boot solution with us, this to better understand your RAID status.

 

We will be looking forward to your reply.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

MHick15
Beginner
384 Views

Dear Josh,

 

thank you for your quick reply! I ran ssu and attached the output. Storage information is missing, I guess because we only use nvmes (??), os is loaded into RAM.

 

I downloaded the intel ssd data center tool and tried to run a healthcheck:

 

[root@nfs41 bin]# isdct show -intelssd 1

 

- Intel SSD DC P4510 Series BTLJ83300ES54P0DGN -

 

Bootloader : 0203

DevicePath : /dev/nvme10

DeviceStatus : *ASSERT_100DCA30 E5

Firmware : VDV10131

FirmwareUpdateAvailable : Please contact Intel Customer Support for further assistance at the following website: http://www.intel.com/go/ssdsupport.

Index : 1

ModelNumber : INTEL SSDPE2KX040T8

ProductFamily : Intel SSD DC P4510 Series

 

 

[root@nfs41 bin]# isdct show -sensor -intelssd 1

 

DeviceStatus : *ASSERT_100DCA30 E5

 

I tried to update the firmware, but this fails:

 

[root@nfs41 bin]# isdct load -f -intelssd 1

Updating firmware...

 

- Intel SSD DC P4510 Series BTLJ83300ES54P0DGN -

 

Status : Selected drive is in a disable logical state.

 

I can test the drive on another system later today, but to me it seems like the drive is dead. Do you maybe have any hint on why the software RAID-5 became suicidal after the loss of one drive ?

 

Thanks alot for you help!

 

Cheers,

Markus

 

 

 

JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

Thank you for your reply,

 

Please review and provide us with the following information:

 

• Based on the SSU seems that your BIOS Mode is set up in "Legacy" mode, we advise you to check with your motherboard OEM (original equipment manufacturer) in your case Supermicro* the recommended BIOS settings for your server.

 

• Your Intel® SSD DC P4510 Series (BTLJ83300ES54P0DGN) is in a disable logical state. This being said we will appreciate it if you can check all of your SSDs and update the firmware in the ones that require it. For simple examples on how to update the firmware (and also extract SMART Attributes and other information from Intel® Data Center SSDs), please refer to https://www.intel.com/content/www/us/en/support/articles/000055357/memory-and-storage.html

 

• The SMART logs extracted from your Intel® SSD D3-S4510 Series.

 

The Intel’s Data Center Tool (DCT) can be used to read out the Show Device Information, please, provide us with this information.

 

https://downloadcenter.intel.com/download/28999/Intel-SSD-Data-Center-Tool-Intel-SSD-DCT-?v=t

 

For a guide on how to use this tool, please visit the following link:

 

https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/Intel_SSD_DCT_3_0_x_Use...

 

Section 2.1 includes a guide on how to get this information.

 

Make sure to use the parameter –all

 

Please let us know how many of the drives that are part of your RAID array are affected and you cannot run the firmware update to start with the warranty replacement process on the ones that qualify for it.

 

We will be looking forward to your reply.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

We were reviewing your community post and we would like to know if you need further assistance with your Intel® Virtual RAID on CPU (Intel® VROC) or if we can close this community thread.

 

We will be looking forward to your reply.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

We have not heard from you since your reply post 12 days ago, please, let us know if you need further assistance related to the Intel® Virtual RAID on CPU (Intel® VROC). We will be looking forward to your reply.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

MHick15
Beginner
384 Views

Dear Josh,

 

sorry for my late reply, we fixed out issue with the RAID. The NVME disk seems to be dead, should be contact intel directly to get a replacement or should we contact our reseller ?

 

Cheers,

Markus

JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

Thank you for your reply.

 

We will be more than happy to assist you, in case you are interested in continuing with the troubleshooting or the warranty replacement of your SSD; we will appreciate if you can provide us with the information requested in our previous post.

 

If you prefer to expedite the process by visiting your reseller and processing the warranty with them, please let us know.

 

We will be looking forward to your reply.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

MHick15
Beginner
384 Views

Dear Josh,

 

here are the infos provided by smart and the intel ssd tool

 

SMART:

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

 

=== START OF INFORMATION SECTION ===

Model Number:                      INTEL SSDPE2KX040T8

Serial Number:                     BTLJ83300ES54P0DGN

Firmware Version:                  VDV10131

PCI Vendor/Subsystem ID:           0x8086

IEEE OUI Identifier:               0x5cd2e4

Total NVM Capacity:                69,793,218,560 [69.7 GB]

Unallocated NVM Capacity:          69,793,218,560 [69.7 GB]

Controller ID:                     0

Number of Namespaces:              0

Local Time is:                     Thu Dec 12 13:42:03 2019 CET

Firmware Updates (0x02):           1 Slot

Optional Admin Commands (0x000e):  Format Frmw_DL NS_Mngmt

Optional NVM Commands (0x0006):    Wr_Unc DS_Mngmt

Maximum Data Transfer Size:        32 Pages

Warning Comp. Temp. Threshold:    70 Celsius

Critical Comp. Temp. Threshold:    80 Celsius

 

Supported Power States

St Op    Max  Active    Idle  RL RT WL WT Ent_Lat Ex_Lat

 0 +   20.00W      -       -   0 0 0 0       0      0

 

=== START OF SMART DATA SECTION ===

Read NVMe SMART/Health Information failed: NVMe Status 0x4006

 

intel SSD

- Intel SSD DC P4510 Series BTLJ83300ES54P0DGN -

 

AdminPath : /dev/nvme0

AggregationThreshold : Selected drive is in a disable logical state.

AggregationTime : Selected drive is in a disable logical state.

ArbitrationBurst : Selected drive is in a disable logical state.

Bootloader : 0203

CoalescingDisable : Selected drive is in a disable logical state.

DevicePath : /dev/nvme0

DeviceStatus : *ASSERT_100DCA30 E5

DirectivesSupported : False

DynamicMMIOEnabled : The selected drive does not support this feature.

EnduranceAnalyzer : Selected drive is in a disable logical state.

ErrorString : *ASSERT_100DCA30 E5

Firmware : VDV10131

FirmwareActivationNoticesConfiguration : Selected drive is in a disable logical state.

FirmwareUpdateAvailable : Please contact Intel Customer Support for further assistance at the following website: http://www.intel.com/go/ssdsupport.

FormatNVMCryptoEraseSupported : True

FormatNVMSupported : True

HighPriorityWeightArbitration : Selected drive is in a disable logical state.

IOCompletionQueuesRequested : Selected drive is in a disable logical state.

IOSubmissionQueuesRequested : Selected drive is in a disable logical state.

Index : 0

Intel : True

IntelGen3SATA : False

IntelNVMe : True

InterruptVector : Selected drive is in a disable logical state.

IsDualPort : False

LatencyTrackingEnabled : Selected drive is in a disable logical state.

LowPriorityWeightArbitration : Selected drive is in a disable logical state.

MediumPriorityWeightArbitration : Selected drive is in a disable logical state.

ModelNumber : INTEL SSDPE2KX040T8

NVMe1Point2OrGreater : True

NVMeControllerID : 0

NVMeMajorVersion : 1

NVMeMinorVersion : 2

NVMePowerState : Selected drive is in a disable logical state.

NVMeTertiaryVersion : 0

NamespaceAttributeNoticesConfiguration : Selected drive is in a disable logical state.

NamespaceId : 4294967295

NamespaceManagementSupported : True

NativeMaxLBA : Selected drive is in a disable logical state.

NumErrorLogPageEntries : 63

NumberOfNamespacesSupported : 0

OEM : Generic

PCIBus : 104

PCIDevice : 0

PCIDomain : 0

PCIFunction : 0

PCILinkGenSpeed : 3

PCILinkWidth : 4

PLITestTimeInterval : The selected drive does not support this feature.

PhyConfig : The selected drive does not support this feature.

PhySpeed : The selected drive does not support this feature.

PhysicalSectorSize : The selected drive does not support this feature.

PowerGovernorAveragePower : Selected drive is in a disable logical state.

PowerGovernorBurstPower : Selected drive is in a disable logical state.

PowerGovernorMode : Selected drive is in a disable logical state.

Product : CliffdaleRefresh

ProductFamily : Intel SSD DC P4510 Series

ProductProtocol : NVME

ReadErrorRecoveryTimer : Selected drive is in a disable logical state.

SMARTEnabled : True

SMARTHealthCriticalWarningsConfiguration : Selected drive is in a disable logical state.

SMBusAddress : Selected drive is in a disable logical state.

SMI : False

SectorSize : 512

SelfTestSupported : False

SerialNumber : BTLJ83300ES54P0DGN

TCGSupported : False

TelemetryLogNoticesConfiguration : Selected drive is in a disable logical state.

TelemetryLogSupported : False

TempThreshold : Selected drive is in a disable logical state.

TemperatureLoggingInterval : The selected drive does not support this feature.

ThermalThrottleEnabled : Selected drive is in a disable logical state.

TimeLimitedErrorRecovery : Selected drive is in a disable logical state.

TrimSupported : True

VolatileWriteCacheEnabled : Selected drive is in a disable logical state.

WriteAtomicityDisableNormal : Selected drive is in a disable logical state.

WriteCacheReorderingStateEnabled : The selected drive does not support this feature.

WriteCacheState : The selected drive does not support this feature.

WriteErrorRecoveryTimer : Selected drive is in a disable logical state.

 

health status:

isdct show -all -sensor -intelssd 0

DeviceStatus : *ASSERT_100DCA30 E5

 

If possible I would like to process the warranty with intel directly, how should I proceed from now ?

 

Thanks and best regards,

Markus

 

JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

Thank you for your reply.

 

To further assist you we would appreciate it if you can provide us with the SMART logs using the Intel’s Data Center Tool (DCT) (https://downloadcenter.intel.com/download/29185?v=t) this to have a better understanding of the cause of the issue you are experiencing.

 

As soon as you provide us with that information, we will be contacting you with our advanced technical support department in charge of your country/GEO to continue with the process.

 

We will be looking forward to your reply.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

 

MHick15
Beginner
384 Views

Dear Josh,

 

trying to read out the smart values using intel DCT results in an error:

 

[root@backup003 ~]# isdct show -all -smart -intelssd 0

 

Status : Internal Error

 

the same command works fine on another machine with the same nvme

 

Cheers,

Markus

 

JosafathB_Intel
Moderator
384 Views

Hello MHick15,

 

Thank you for your reply and for the information provided.

 

You are going to receive an email shortly from our advance technical support department to further assist you.

 

Thank you for your patience and understanding.

 

Best regards,

 

Josh B.

Intel Customer Support Technician

A Contingent Worker at Intel

Reply