Software Storage Technologies
Virtual RAID, RSTe, and Memory Drive Technology
Announcements
Looking for our RealSense Community? Click HERE

Looking for RAID, VROC? You found the forum to ask questions!
310 Discussions

Getting massive Parity errors for RAID5 arrays with more than 3 drives. Drives are fine. Should we ignore or is there a fix for this?

GraniteStateColin
4,595 Views

Similar to the post in the forum at https://forums.intel.com/s/question/0D50P0000490VxbSAE/vroc-raid5-parity-errors, we get dozens to hundreds of Parity errors following every Intel Rapid Storage Technology Verification and Repair run on multiple computers running RAID5. The drives all test out fine and I don't see any parity errors if put those same drives in RAID1 configurations. The drives are mostly Samsung Evo 860 SSD drives (SATA, not NVMe). There are some EVO 850 drives, but some systems are running exclusively new 860 drives and experience the same problems.

 

We only see these errors in RAID5 volumes with more than 3 drives. At only 3 drives, no errors. And the number of errors increases significantly (from a couple dozen to a few hundred) when moving from 4 drives to 5.

 

As far as we know, we have not had any problems with the RAID volumes. So we're not sure if this is a reporting problem and everything is fine, or if this is an indication that Intel's RAID5 system is fatally flawed and can't support more than 3 drives (which would be terrible, because performance increases on SATA with more volumes and the chief benefit of RAID5 over other forms of RAID is to lower the cost of parity by only using a single drive of many, vs RAID1 which uses half the drives for party).

 

Also note that when IRST Verification and Repair runs, the initial report, when it starts, always shows 0 errors. It's only the final report AFTER it runs that reports the parity errors. This is part of what makes us think it could be a reporting problem, rather than actual errors.

 

Regardless of whether this is a reporting problem or an actual RAID issue, because these drives appear in all other checks to be fine (including in IRST arrays with 3 or fewer drives), this appears to be a software or Intel firmware problem for the Rapid Storage Technology support for RAID5, where it fails with more than 3 drives.

 

Here's the relevant segment of the report after running Verification and Repair, this one with 5 drives:

Volume name: SSD RAID5

Status: Normal

Type: RAID 5

Size: 1,907,750 MB

System volume: Yes

Data stripe size: 32 KB

Write-back cache: Write through

Initialized: Yes

Parity errors: 527

Blocks with media errors: 0

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

 

Here's another from a different system with 4 drives:

Volume name: RAID5 System

Status: Normal

Type: RAID 5

Size: 1,430,812 MB

System volume: Yes

Data stripe size: 32 KB

Write-back cache: Write through

Initialized: Yes

Parity errors: 20

Blocks with media errors: 0

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

 

Please advise:

  1. Is this a serious error that we should shut down the arrays, or is this just a reporting bug and we should ignore the parity errors (if it is just a reporting error, please fix)?
  2. Is there a fix or work-around that still includes the use of RAID5 with 4 or 5 drives?
  3. If we need to wait for an updated IRST or driver update, is there a scheduled release date?
0 Kudos
29 Replies
David_V_Intel
Employee
915 Views
Hello, I am following up with your case and see that we have not heard back from you. If you need more assistance do not hesitate to reply. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation
0 Kudos
GraniteStateColin
915 Views

Have you been able to reproduce the problem? If still no, please tell me what you tested. Did you use the bestselling Samsung Evo drives? How many? I have not found anyone using Intel RST RAID5 with 4 or more drives that can run the Verify and Repair operation without getting Parity errors, so I'm little suspicious that when you claim you can't reproduce the problem. I don't believe anyone tried, or you would have reproduced the problem. It happens 100% of the time in all cases, in my experience.

0 Kudos
BRudo
Novice
915 Views

It's because you didn't hear us. All you answers mean "we are tired to say nothing".

0 Kudos
David_V_Intel
Employee
915 Views
Hello, Thank you for patiently waiting. There is a newer version of the driver for Intel ® Rapid Storage Technology. Please attempt installing that version and check to see if the error persists. Please refer to the link below: https://downloadcenter.intel.com/download/28650/Intel-Rapid-Storage-Technology-Intel-RST-User-Interface-and-Driver?product=55005 If this does not work please let me know. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation
0 Kudos
David_V_Intel
Employee
915 Views
Hello, This information has been forwarded to the appropriate department for investigation, I cannot guarantee a time-frame since this will take time but I appreciate all of the feedback that has been provided so far. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation
0 Kudos
GraniteStateColin
915 Views

Has there been any update on this? I still see the problem in version 17.2.6.1027. In fact, the number of errors I'm getting now has exceeded 1,000 parity errors every run on the systems with 5 drives. On the systems with only 4 drives, the number is much lower hovering around 20-30 parity errors (always 0 on systems with only 3 drives). I have yet to see any implications from this, all drives individually continue to report they are fine, so I suspect it's a reporting error rather than an actual problem, but I am very nervous assuming my data is fine and that Intel telling my drive has over 1,000 errors is just a reporting glitch.

 

Please press your engineering team for more information on this.

0 Kudos
GraniteStateColin
915 Views

David, could you please give an update? I'm now having some performance issues with database systems running on these computers and I can't tell if it's related to the Intel driver bugs for RAID5 with 4+ drives, or if the problems are unrelated. Where does this stand now 4 months after this bug has been confirmed?

0 Kudos
GraniteStateColin
915 Views

Alright, another month has passed. The only conclusion I can reach is that Intel seems to have completely collapsed. Still struggling to get a 10nm CPU out the door, while the rest of the industry is producing 7nm chips. And can't even figure out how to make RAID5 work with more than 3 drives.

 

I'll check back from time to time to see if you ever respond that this is fixed, but for now, I'm moving on and shifting our company away from buying any more Intel. Your response on this has been so atrocious.

0 Kudos
GraniteStateColin
915 Views
Alright, another month has passed. The only conclusion I can reach is that Intel seems to have completely collapsed. Still struggling to get a 10nm CPU out the door, while the rest of the industry is producing 7nm chips. And can't even figure out how to make RAID5 work with more than 3 drives. I'll check back from time to time to see if you ever respond that this is fixed, but for now, I'm moving on and shifting our company away from buying any more Intel. Your response on this has been so atrocious.
0 Kudos
Reply