- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I created a RAID5 array using Intel VROC following the recommandations provided here:
After a single drive failed yesterday, the drives are marked as removed and the state changed to "active , FAILED, Not Started". the full mdadm --detail output can be found below.
[root@nfs41 tmp]# mdadm --detail /dev/md124
/dev/md124:
Container : /dev/md/imsm1, member 0
Raid Devices : 9
Total Devices : 8
State : active, FAILED, Not Started
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Consistency Policy : unknown
UUID : 62b45e71:ad4983c4:8bb60881:54701169
Number Major Minor RaidDevice State
- 0 0 0 removed
- 0 0 1 removed
- 0 0 2 removed
- 0 0 3 removed
- 0 0 4 removed
- 0 0 5 removed
- 0 0 6 removed
- 0 0 7 removed
- 0 0 8 removed
- 259 16 8 sync /dev/nvme9n1
- 259 9 1 sync /dev/nvme11n1
- 259 11 2 sync /dev/nvme12n1
- 259 1 3 sync /dev/nvme13n1
- 259 6 4 sync /dev/nvme14n1
- 259 13 5 sync /dev/nvme15n1
- 259 12 6 sync /dev/nvme16n1
- 259 10 7 sync /dev/nvme17n1
[root@nfs41 tmp]# mdadm --detail /dev/md126
/dev/md126:
Version : imsm
Raid Level : container
Total Devices : 8
Working Devices : 8
UUID : 55e62d4e:b4d10491:8ca2f983:286df7c9
Member Arrays : /dev/md124
Number Major Minor RaidDevice
- 259 16 - /dev/nvme9n1
- 259 9 - /dev/nvme11n1
- 259 11 - /dev/nvme12n1
- 259 1 - /dev/nvme13n1
- 259 6 - /dev/nvme14n1
- 259 13 - /dev/nvme15n1
- 259 12 - /dev/nvme16n1
- 259 10 - /dev/nvme17n1
Any idea how to get the RAID working again?
Cheers,
Markus
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
Thank you for contacting Intel® Memory & Storage Support.
As we understand, you need assistance regarding your Intel® Virtual RAID on CPU (Intel® VROC). If we infer correctly, we will appreciate it if you can provide us with your system configuration including manufacturer, model and part number of all your system components and a copy of the SSU logs to try to have a better understanding of the system configuration.
1- Go to https://downloadcenter.intel.com/download/26735/ and download the software.
2- When finished downloading it, open it.
3- Attach the file obtained to your reply post.
We will appreciate it if you can provide us with further information regarding the disk that failed, are you able to test it outside of the RAID array to check your drive health.
Would you mind to share some screenshots of the pre-boot solution with us, this to better understand your RAID status.
We will be looking forward to your reply.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Josh,
thank you for your quick reply! I ran ssu and attached the output. Storage information is missing, I guess because we only use nvmes (??), os is loaded into RAM.
I downloaded the intel ssd data center tool and tried to run a healthcheck:
[root@nfs41 bin]# isdct show -intelssd 1
- Intel SSD DC P4510 Series BTLJ83300ES54P0DGN -
Bootloader : 0203
DevicePath : /dev/nvme10
DeviceStatus : *ASSERT_100DCA30 E5
Firmware : VDV10131
FirmwareUpdateAvailable : Please contact Intel Customer Support for further assistance at the following website: http://www.intel.com/go/ssdsupport.
Index : 1
ModelNumber : INTEL SSDPE2KX040T8
ProductFamily : Intel SSD DC P4510 Series
[root@nfs41 bin]# isdct show -sensor -intelssd 1
DeviceStatus : *ASSERT_100DCA30 E5
I tried to update the firmware, but this fails:
[root@nfs41 bin]# isdct load -f -intelssd 1
Updating firmware...
- Intel SSD DC P4510 Series BTLJ83300ES54P0DGN -
Status : Selected drive is in a disable logical state.
I can test the drive on another system later today, but to me it seems like the drive is dead. Do you maybe have any hint on why the software RAID-5 became suicidal after the loss of one drive ?
Thanks alot for you help!
Cheers,
Markus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
Thank you for your reply,
Please review and provide us with the following information:
• Based on the SSU seems that your BIOS Mode is set up in "Legacy" mode, we advise you to check with your motherboard OEM (original equipment manufacturer) in your case Supermicro* the recommended BIOS settings for your server.
• Your Intel® SSD DC P4510 Series (BTLJ83300ES54P0DGN) is in a disable logical state. This being said we will appreciate it if you can check all of your SSDs and update the firmware in the ones that require it. For simple examples on how to update the firmware (and also extract SMART Attributes and other information from Intel® Data Center SSDs), please refer to https://www.intel.com/content/www/us/en/support/articles/000055357/memory-and-storage.html
• The SMART logs extracted from your Intel® SSD D3-S4510 Series.
The Intel’s Data Center Tool (DCT) can be used to read out the Show Device Information, please, provide us with this information.
https://downloadcenter.intel.com/download/28999/Intel-SSD-Data-Center-Tool-Intel-SSD-DCT-?v=t
For a guide on how to use this tool, please visit the following link:
Section 2.1 includes a guide on how to get this information.
Make sure to use the parameter –all
Please let us know how many of the drives that are part of your RAID array are affected and you cannot run the firmware update to start with the warranty replacement process on the ones that qualify for it.
We will be looking forward to your reply.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
We were reviewing your community post and we would like to know if you need further assistance with your Intel® Virtual RAID on CPU (Intel® VROC) or if we can close this community thread.
We will be looking forward to your reply.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
We have not heard from you since your reply post 12 days ago, please, let us know if you need further assistance related to the Intel® Virtual RAID on CPU (Intel® VROC). We will be looking forward to your reply.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Josh,
sorry for my late reply, we fixed out issue with the RAID. The NVME disk seems to be dead, should be contact intel directly to get a replacement or should we contact our reseller ?
Cheers,
Markus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
Thank you for your reply.
We will be more than happy to assist you, in case you are interested in continuing with the troubleshooting or the warranty replacement of your SSD; we will appreciate if you can provide us with the information requested in our previous post.
If you prefer to expedite the process by visiting your reseller and processing the warranty with them, please let us know.
We will be looking forward to your reply.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Josh,
here are the infos provided by smart and the intel ssd tool
SMART:
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPE2KX040T8
Serial Number: BTLJ83300ES54P0DGN
Firmware Version: VDV10131
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Total NVM Capacity: 69,793,218,560 [69.7 GB]
Unallocated NVM Capacity: 69,793,218,560 [69.7 GB]
Controller ID: 0
Number of Namespaces: 0
Local Time is: Thu Dec 12 13:42:03 2019 CET
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x000e): Format Frmw_DL NS_Mngmt
Optional NVM Commands (0x0006): Wr_Unc DS_Mngmt
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 20.00W - - 0 0 0 0 0 0
=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x4006
intel SSD
- Intel SSD DC P4510 Series BTLJ83300ES54P0DGN -
AdminPath : /dev/nvme0
AggregationThreshold : Selected drive is in a disable logical state.
AggregationTime : Selected drive is in a disable logical state.
ArbitrationBurst : Selected drive is in a disable logical state.
Bootloader : 0203
CoalescingDisable : Selected drive is in a disable logical state.
DevicePath : /dev/nvme0
DeviceStatus : *ASSERT_100DCA30 E5
DirectivesSupported : False
DynamicMMIOEnabled : The selected drive does not support this feature.
EnduranceAnalyzer : Selected drive is in a disable logical state.
ErrorString : *ASSERT_100DCA30 E5
Firmware : VDV10131
FirmwareActivationNoticesConfiguration : Selected drive is in a disable logical state.
FirmwareUpdateAvailable : Please contact Intel Customer Support for further assistance at the following website: http://www.intel.com/go/ssdsupport.
FormatNVMCryptoEraseSupported : True
FormatNVMSupported : True
HighPriorityWeightArbitration : Selected drive is in a disable logical state.
IOCompletionQueuesRequested : Selected drive is in a disable logical state.
IOSubmissionQueuesRequested : Selected drive is in a disable logical state.
Index : 0
Intel : True
IntelGen3SATA : False
IntelNVMe : True
InterruptVector : Selected drive is in a disable logical state.
IsDualPort : False
LatencyTrackingEnabled : Selected drive is in a disable logical state.
LowPriorityWeightArbitration : Selected drive is in a disable logical state.
MediumPriorityWeightArbitration : Selected drive is in a disable logical state.
ModelNumber : INTEL SSDPE2KX040T8
NVMe1Point2OrGreater : True
NVMeControllerID : 0
NVMeMajorVersion : 1
NVMeMinorVersion : 2
NVMePowerState : Selected drive is in a disable logical state.
NVMeTertiaryVersion : 0
NamespaceAttributeNoticesConfiguration : Selected drive is in a disable logical state.
NamespaceId : 4294967295
NamespaceManagementSupported : True
NativeMaxLBA : Selected drive is in a disable logical state.
NumErrorLogPageEntries : 63
NumberOfNamespacesSupported : 0
OEM : Generic
PCIBus : 104
PCIDevice : 0
PCIDomain : 0
PCIFunction : 0
PCILinkGenSpeed : 3
PCILinkWidth : 4
PLITestTimeInterval : The selected drive does not support this feature.
PhyConfig : The selected drive does not support this feature.
PhySpeed : The selected drive does not support this feature.
PhysicalSectorSize : The selected drive does not support this feature.
PowerGovernorAveragePower : Selected drive is in a disable logical state.
PowerGovernorBurstPower : Selected drive is in a disable logical state.
PowerGovernorMode : Selected drive is in a disable logical state.
Product : CliffdaleRefresh
ProductFamily : Intel SSD DC P4510 Series
ProductProtocol : NVME
ReadErrorRecoveryTimer : Selected drive is in a disable logical state.
SMARTEnabled : True
SMARTHealthCriticalWarningsConfiguration : Selected drive is in a disable logical state.
SMBusAddress : Selected drive is in a disable logical state.
SMI : False
SectorSize : 512
SelfTestSupported : False
SerialNumber : BTLJ83300ES54P0DGN
TCGSupported : False
TelemetryLogNoticesConfiguration : Selected drive is in a disable logical state.
TelemetryLogSupported : False
TempThreshold : Selected drive is in a disable logical state.
TemperatureLoggingInterval : The selected drive does not support this feature.
ThermalThrottleEnabled : Selected drive is in a disable logical state.
TimeLimitedErrorRecovery : Selected drive is in a disable logical state.
TrimSupported : True
VolatileWriteCacheEnabled : Selected drive is in a disable logical state.
WriteAtomicityDisableNormal : Selected drive is in a disable logical state.
WriteCacheReorderingStateEnabled : The selected drive does not support this feature.
WriteCacheState : The selected drive does not support this feature.
WriteErrorRecoveryTimer : Selected drive is in a disable logical state.
health status:
isdct show -all -sensor -intelssd 0
DeviceStatus : *ASSERT_100DCA30 E5
If possible I would like to process the warranty with intel directly, how should I proceed from now ?
Thanks and best regards,
Markus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
Thank you for your reply.
To further assist you we would appreciate it if you can provide us with the SMART logs using the Intel’s Data Center Tool (DCT) (https://downloadcenter.intel.com/download/29185?v=t) this to have a better understanding of the cause of the issue you are experiencing.
As soon as you provide us with that information, we will be contacting you with our advanced technical support department in charge of your country/GEO to continue with the process.
We will be looking forward to your reply.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Josh,
trying to read out the smart values using intel DCT results in an error:
[root@backup003 ~]# isdct show -all -smart -intelssd 0
Status : Internal Error
the same command works fine on another machine with the same nvme
Cheers,
Markus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello MHick15,
Thank you for your reply and for the information provided.
You are going to receive an email shortly from our advance technical support department to further assist you.
Thank you for your patience and understanding.
Best regards,
Josh B.
Intel Customer Support Technician
A Contingent Worker at Intel
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page