Rapid Storage Technology
Intel® RST, RAID
2047 Discussions

Intel Raid 10 Failed - 2 out of 4 drives became non-members of array

RFili1
Novice
14,163 Views

This happens:

1. After BIOS update;

2. After CMOS reset;

3. After some Power outages;

4. After ANYTHING that puts the SATA to AHCI instead of RAID.

Questions:

1. Why adjacent members? It is always 3&4 kicked out leading to failed array.

2. How do we recover? Recreating the array, reinstalling windows and recovering from backup is unacceptable. We create RAID to have safety not harm

Thank you,

Vane

0 Kudos
38 Replies
idata
Employee
7,303 Views

Hello Vane

 

 

Thank you for joining the community.

 

 

Could you provide us with the System Reports for Intel® Rapid Storage Technology report? Please attach the report to the thread.

 

 

https://www.intel.com/content/www/us/en/support/articles/000006351/technologies.html https://www.intel.com/content/www/us/en/support/articles/000006351/technologies.html

 

 

Regards,

 

Leonardo C.

 

0 Kudos
RFili1
Novice
7,303 Views

Hi,

How can I get that report? I mean the RAID is dead. Fortunately I have a notification email. I hope it suffices.

Thank you,

Vane

Volume InTosh: Verification in progress.

System Report

System Information

OS name: Microsoft Windows 10 Pro

OS version: 10.0.16299

System name: SRVPC

System manufacturer: Gigabyte Technology Co., Ltd.

System model: Z170-HD3P

Intel® Rapid Storage Technology enterprise Information Kit installed: 15.9.0.1015 User interface version: 15.9.0.1015

Language: English (United States)

RAID option ROM version: 15.2.0.2754

Driver version: 15.9.0.1015

ISDI version: 15.9.0.1015

Storage System Information

RAID Configuration

Array Name: SATA_Array_0000

Size: 7,630,915 MB

Available space: 8 MB

Number of volumes: 1

Volume member: InTosh

Number of array disks: 4

Array disk: X7KYS37AS

Array disk: X7OZNB2GS

Array disk: X7OZNBJGS

Array disk: X7OZNR9GS

Disk data cache: Enabled

Array Name: SATA_Array_0001

Size: 122,104 MB

Available space: 1 MB

Number of volumes: 2

Volume member: Pager

Volume member: Booster

Number of array disks: 1

Array disk: 0000_0000_0100_0000_E4D2_5C87_EFEA_4E01.

Disk data cache: Enabled

Volume name: InTosh

Status: Verifying 0% complete

Type: RAID 10

Size: 3,815,453 MB

System volume: Yes

Data stripe size: 64 KB

Acceleration mode: Enhanced

Write-back cache: Off

Initialized: Yes

Parity errors: 0

Blocks with media errors: 0

Physical sector size: 4096 Bytes

Logical sector size: 512 Bytes

Volume name: Pager

Status: Normal

Type: RAID 0

Usage: Data volume

Size: 56,568 MB

System volume: No

Data stripe size: 128 KB

Write-back cache: Off

Initialized: Yes

Parity errors: 0

Blocks with media errors: 0

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

Volume name: Booster

Status: Normal

Type: RAID 0

Usage: Cache volume

Size: 65,532 MB

System volume: No

Data stripe size: 128 KB

Write-back cache: Off

Initialized: Yes

Parity errors: 0

Blocks with media errors: 0

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

Hardware Information

Controller name: Intel(R) Chipset SATA/PCIe RST Premium Controller

Type: SATA

Mode: RAID

Number of SATA ports: 7

Number of volumes: 3

Volume: InTosh

Volume: Pager

Volume: Booster

Number of spares: 0

Number of available disks: 0

Rebuild on Hot Plug: Disabled

Manufacturer: Intel Corporation

Model number: 0x2822

Product revision: 49

Direct attached disk: X7KYS37AS

Direct attached disk: X7OZNB2GS

Direct attached disk: X7OZNBJGS

Direct attached disk: X7OZNR9GS

Direct attached disk: 0000_0000_0100_0000_E4D2_5C87_EFEA_4E01.

Disk on Controller 0, Port 0

Status: Normal

Type: SATA disk

Location type: Internal

Usage: Array disk

Size: 1,863 GB

System disk: No

Disk data cache: Enabled

Command queuing: NCQ

Transfer rate: 6 Gb/s

Model: TOSHIBA DT01ACA200

Serial number: X7KYS37AS

SCSI device ID: 0

Firmware: MX4OABB0

Physical sector size: 4096 Bytes

Logical sector size: 512 Bytes

Disk on Controller 0, Port 1

Status: Normal

Type: SATA disk

Location type: Internal

Usage: Array disk

Size: 1,863 GB

System disk: No

Disk data cache: Enabled

Command queuing: NCQ

Transfer rate: 6 Gb/s

Model: TOSHIBA DT01ACA200

Serial number: X7OZNB2GS

SCSI device ID: 0

Firmware: MX4OABB0

Physical sector size: 4096 Bytes

Logical sector size: 512 Bytes

Disk on Controller 0, Port 2

Status: Normal

Type: SATA disk

Location type: Internal

Usage: Array disk

Size: 1,863 GB

System disk: No

Disk data cache: Enabled

Command queuing: NCQ

Transfer rate: 6 Gb/s

Model: TOSHIBA DT01ACA200

Serial number: X7OZNBJGS

SCSI device ID: 0

Firmware: MX4OABB0

Physical sector size: 4096 Bytes

Logical sector size: 512 Bytes

Disk on Controller 0, Port 3

Status: Normal

Type: SATA disk

Location type: Internal

Usage: Array disk

Size: 1,863 GB

System disk: No

Disk data cache: Enabled

Command queuing: NCQ

Transfer rate: 6 Gb/s

Model: TOSHIBA DT01ACA200

Serial number: X7OZNR9GS

SCSI device ID: 0

Firmware: MX4OABB0

Physical sector size: 4096 Bytes

Logical sector size: 512 Bytes

Disk on Controller 1, Port 0

Status: Normal

Type: PCIe SSD

Location type: Internal

Usage: Cache device

Size: 119 GB

System disk: No

Port interface: NVMe

PCIe link speed: 4000 MB/s

PCIe link width: x4

Model: INTEL SSDPEKKW128G8

Serial number: 0000_0000_0100_0000_E4D2_5C87_EFEA_4E01.

SCSI device ID: 1

Firmware: 001C

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

ATAPI device on Controller 0, Port 4

Location type: Internal

Transfer rate: 1.5 Gb/s

Model: HL-DT-ST BD-RE BH16NS55

Serial number: Not Available

Firmware: 1.02

ATAPI device on Controller 0, Port 5

Location type: Internal

Transfer rate: 1.5 Gb/s

Model: TSSTcorpDVD-ROM TS-H353B

Serial number: Not Available

Firmware: LE05

0 Kudos
idata
Employee
7,303 Views

Hello Vane

 

 

Thank you for your response.

 

 

On the report that you have sent we see that your RAID has a verifying status, could you access the BIOS Configuration Utility on Intel® RAID Controllers, during the BIOS post press Ctrl+I to access it, once you are there please send us a picture to see the status of your RAID?

 

 

Just to verify with you is the Operating system (OS) installed on the raid volume?

 

 

Regards,

 

Leonardo C.

 

0 Kudos
RFili1
Novice
7,303 Views

Good afternoon,

The report was done before the RAID failed. The OS is installed on this volume, yes.

0 Kudos
idata
Employee
7,303 Views

Hello Vane

 

 

Thanks for the screenshot you sent to us.

 

 

It looks like the RAID structure got corrupted in two of the drives. Since the status of the RAID 10 volume appears as "Failed" it is not possible to rebuild that volume.

 

 

As an advice, you can try to contact a data recovery center for assistance to recover the data.

 

 

Regards,

 

Leonardo C.

 

0 Kudos
RFili1
Novice
7,303 Views

Hello intel_corp,

I managed to restore the data, I'm an IT professional.

The point is that "RAID structure got corrupted in two of the drives" is a recurring error.

Happened to me accidentally or by purpose each and every time I reset the BIOS to default (that disables the RAID and replaces it with AHCI).

Happened to other people since 2011 at least.

It is always the same. The last 2 disks are kicked out. Not 1 and 3, not 1 and 4, not 2 and 3, not 2 and 4. all these are fault tolerant. Never 1 and 2 which is not fault tolerant but ... never happens.

Always 3 and 4 are kicked out.

The bottom line is that this is a well known bug that literally forbids the usage of RAID 10 configurations for millions of users.

Despite the excellent redundancy capacity of this RAID level!

I think this issue shall be treated more diligently.

So, is there an effort here to solve it? Where can we expect a firmware update or something?

Thank you,

Vane

0 Kudos
RFili1
Novice
7,303 Views

Any update, please?

Thank you.

0 Kudos
idata
Employee
7,303 Views

Hello Vane

 

 

Thank you for your response we are glad to know that you have restored the data, given the information on the case that any actions to restore BIOS to default or the BIOS updates as causing this reaction with the RIAD structures we recommend to notify the motherboard manufacturer since they built the BIOS.

 

 

Regards,

 

Leonardo C.

 

0 Kudos
RFili1
Novice
7,303 Views

The Net is full of examples with the same issue and different mobo's. There is something with the Intel chipset/UEFI firmware. Same chipset/firmware that can't preserve data when creating RAID1 if that RAID1 is on the third and the fourth sata connectors and the disks are GPT.

So there is a clear Intel issue with: UEFI+non first pair of SATA connectors+GPT.

RAID1- can't preserve data for GPT you have to convert to MBR before creating the RAID;

RAID10 - you lose non first pair of disks and the whole array.

Didn't have time and energy to test RAID0 and.

Didn't ........ to test other combination like 1st, 2nd SATA with 5th and 6th. Nor 3rd,4th with 5th and 6th ....

Please escalate!

idata
Employee
7,303 Views

Hello Vane

Thank you for your response.

 

 

I understand this situation. Allow me to share with you that the SATA controller distribution is selected by the manufacturer of the motherboard, and Intel® does not have control over that distribution nether to the built of the BIOS, we recommend contacting the motherboard maker to replicate the scenario with the exact same motherboard model, if they can replicate the issue then they should be able to identify the root cause.

 

 

Regards,

 

Leonardo C.

 

0 Kudos
RFili1
Novice
7,303 Views

Hi,

The MoBo manufacturere is Gigabyte. They do not respond to tickets. Can you initiate an action against them or something? People see the Intel in not working. Intel RST. Intel RAID etc.

Thank you,

Vane

0 Kudos
n_scott_pearson
Super User
7,303 Views

You could do something radical like pick up the phone and call them. Imagine, using a phone in this day and age.

...S

0 Kudos
RFili1
Novice
7,303 Views

Very good piece of advice. Thank you. However I prefer written stuff.

Later Edit: they do not have a number anyway ... Imagine ...

0 Kudos
emors
Novice
7,303 Views

It's not a Gigabyte issue, exact same thing happened to me on a Supermicro C7Z370 and an Asus Maximus VIII Z170.

Here is scenario

1) Reset CMOS( because support, for some other issue, has dictated this)

2) during boot(directly after CMOS reset), a 2 drive RAID0 comes up 1st one OK, second one is "non-RAID Disk", array status is FAILED.

I should be able to reset CMOS, the CTRL-I on 1st reboot, reset PCH Storage to RAID, then everything should be OK.

Right?

0 Kudos
emors
Novice
7,303 Views

Just ran a test on the SUpermicro C7Z370

0) system had 2 SSDs in RAID 0 stripe in the chipsets controller, Win10 installed

1) power on PC

2) hit CMOS reset button, system powers off, restarts

3) first BIOS splash screen, hit DELETE

4) second BIOS splash screen, HIT DELETE

5) short delay, entered SETUP

6) set PCH Storage to RAID (IRST)

7) SAVE, RESET

8) at Intel RAID Cntroller prompt, CTRL-I

9) Drive 0 - display as part of RAID stripe

Drive 1 - "Non-RAID Device"

RAID STRIPE Status : FAILED

Literally, nothing else was done to this systems.

Resetting CMOS IS DESTROYING THE RAID STRIPE!

PS, Supermicro's support does have an 800 number and you can get a real person on the phone. I have waited various times, up to 7-8 minutes, but most times someone picks up in a few minutes!

AND THAT INITIAL LEVEL OF SUPPORT PERSON HAS DIRECT ACCESS TO PRODUCT ENGINEERS.

0 Kudos
n_scott_pearson
Super User
7,303 Views

Everything stated so far points to the SuperMicro BIOS being broken and responsible for these problems occurring. Only the BIOS is associated with (and affected by) CMOS resets. Tell these support engineers to exercise this (so-called great) access to product engineers and get this broken BIOS fixed.

...S

0 Kudos
emors
Novice
7,303 Views

> Everything stated so far points to the SuperMicro BIOS being broken and responsible for these problems occurring.

No, it does not, because the other person was seeing same thing on a Gigabyte mobo and I have seen it on an Asus mobo as well. This is all documented above.

Everything points to a logic error in the Intel RAID Controller firmware.

0 Kudos
n_scott_pearson
Super User
7,303 Views

Nope. Again, CMOS reset has nothing to do with RAID. I repeat: NOTHING! It is a BIOS issue. I am willing to bet that these boards share a common BIOS root (AMI most likely) and the bug is in this root.

Regardless, these board manufacturers need to have their BIOS engineers (or AMI, if they are outsourcing their BIOS) properly diagnose the issue. If it is something that Intel needs to look at, then these BIOS engineers can report it through their direct channel to Intel.

...S

0 Kudos
emors
Novice
7,303 Views

> CMOS reset has nothing to do with RAID

The motherboard BIOS Settings have some fields that control how the chipset's SATA controller behaves.

The default mode is AHCI, which is a non-RAID controller. If the user wishes to use the controller's RAID feature, they change this setting to Intel RST.

When the chipset's controller goes through initialization, it reads these settings and behaves accordingly.

During the failing scenario, the controller was in RAID Mode, had created a RAID 0 Stripe, and saved it's configuration.

When the CMOS Reset occurs, the mode is switched back to AHCI. When the controller initializes, it detects that it is currently in AHCI Mode, but also knows it was in RAID Mode and apparently marks the RAID Stripe FAILED.

Even if you switch back to Intel RST Mode it is too late, the RAID Stripe continues to be in FAILED state and data is lost.

0 Kudos
emors
Novice
7,036 Views

Leonardo C,

Can you contact the RAID firmware team and get their opinion on this matter?

thanks,

0 Kudos
Reply