Getting massive Parity errors for RAID5 arrays with more than 3 drives. Drives are fine. Should we ignore or is there a fix for this?

GraniteStateColin · ‎02-02-2019

Similar to the post in the forum at https://forums.intel.com/s/question/0D50P0000490VxbSAE/vroc-raid5-parity-errors, we get dozens to hundreds of Parity errors following every Intel Rapid Storage Technology Verification and Repair run on multiple computers running RAID5. The drives all test out fine and I don't see any parity errors if put those same drives in RAID1 configurations. The drives are mostly Samsung Evo 860 SSD drives (SATA, not NVMe). There are some EVO 850 drives, but some systems are running exclusively new 860 drives and experience the same problems.

We only see these errors in RAID5 volumes with more than 3 drives. At only 3 drives, no errors. And the number of errors increases significantly (from a couple dozen to a few hundred) when moving from 4 drives to 5.

As far as we know, we have not had any problems with the RAID volumes. So we're not sure if this is a reporting problem and everything is fine, or if this is an indication that Intel's RAID5 system is fatally flawed and can't support more than 3 drives (which would be terrible, because performance increases on SATA with more volumes and the chief benefit of RAID5 over other forms of RAID is to lower the cost of parity by only using a single drive of many, vs RAID1 which uses half the drives for party).

Also note that when IRST Verification and Repair runs, the initial report, when it starts, always shows 0 errors. It's only the final report AFTER it runs that reports the parity errors. This is part of what makes us think it could be a reporting problem, rather than actual errors.

Regardless of whether this is a reporting problem or an actual RAID issue, because these drives appear in all other checks to be fine (including in IRST arrays with 3 or fewer drives), this appears to be a software or Intel firmware problem for the Rapid Storage Technology support for RAID5, where it fails with more than 3 drives.

Here's the relevant segment of the report after running Verification and Repair, this one with 5 drives:

Volume name: SSD RAID5

Status: Normal

Type: RAID 5

Size: 1,907,750 MB

System volume: Yes

Data stripe size: 32 KB

Write-back cache: Write through

Initialized: Yes

Parity errors: 527

Blocks with media errors: 0

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

Here's another from a different system with 4 drives:

Volume name: RAID5 System

Status: Normal

Type: RAID 5

Size: 1,430,812 MB

System volume: Yes

Data stripe size: 32 KB

Write-back cache: Write through

Initialized: Yes

Parity errors: 20

Blocks with media errors: 0

Physical sector size: 512 Bytes

Logical sector size: 512 Bytes

Please advise:

Is this a serious error that we should shut down the arrays, or is this just a reporting bug and we should ignore the parity errors (if it is just a reporting error, please fix)?
Is there a fix or work-around that still includes the use of RAID5 with 4 or 5 drives?
If we need to wait for an updated IRST or driver update, is there a scheduled release date?

David_V_Intel · ‎02-05-2019

Hello GraniteStateColin, Thank you for posting on the Intel ® communities. I will need some more information in order to do some testings and see why the error is happening, please provide me with an System Support Utility report, this report can be generated from our tool, you can refer to the link below so you can download it: https://downloadcenter.intel.com/download/25293/Intel-System-Support-Utility-for-Windows- Make sure to attach the created report to this thread. Also, please attach the full system report as well as screenshots showing the parity errors you mention. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

BRudo · ‎02-06-2019

I have same problem with parity errors on RAID-5 with 4 drives (HDD nas edition Toshiba and WD).

Scan run monthly and fix hudrends parity errors.

Attached zip contain result of SSU scanning.

Please advise : what's happened with data if one drive in array really failed? Will such parity errors prevent to restore array in that case?

GraniteStateColin · ‎02-05-2019

SSU and IRST reports attached from 2 computers running RAID5. Computer names reflected in the file names. Cygnus is running Windows Server 2016. Dagobah is running Windows 10. In both cases, if I remove drives and run these as RAID5 with only 3 drives or RAID1 with 2, no problems. The drives including doesn’t matter. The problem seems to be RAID5 with more than 3 drives. Also note the much larger number of errors in the RAID5 array with 5 drives compared to the RAID5 array with 4 drives. This is further detailed in my original report. I believe this is a defect in the Intel software or chipset for RAID5 with more 4+ drives. Please advise if this is just a reporting defect and I can ignore the parity errors or if this is a critical problem with the Intel software. If the latter, when will you release a fix? Thanks, Colin

David_V_Intel · ‎02-05-2019

Hello GraniteStateColin, Thank you for your response. Please provide me with the attachments mentioned, I do not see any attachment, it might've not been uploaded correctly. You can always compress both files into a .zip folder in case there is problems uploading. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

GraniteStateColin · ‎02-05-2019

Attached again. Did you receive this time?

David_V_Intel · ‎02-06-2019

Hello GraniteStateColin, Thank you for your response. Are you attaching the file in the reply box in this thread? If not then let me know because there is no attachments in this thread. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

GraniteStateColin · ‎02-06-2019

I don’t know what you mean by “reply box.” I’m attaching files, just like I’ve done hundreds or thousands of times before – I hit the Attach File button in Outlook and select the files to attach, then send. It sounds like your system is stripping out the attachments. I may have used Drag and Drop the first time (not sure), but used the Attach File button the second time. I believe both methods do the same thing though. I’ve attached again using the same method, but this time, I’ve put the two text files into a ZIP file first. I’ve read that some systems get confused when attachments are just plain text files. Either way looks like you guys need a more modern system for handling support tickets. If it’s not compatible with the #1 e-mail program in the world (Microsoft Outlook), that’s not a good sign. Disappointing that Intel of all companies would have a problem like that. Thanks, Colin

David_V_Intel · ‎02-07-2019

Hello GraniteStateColin, Thank you for your response. This support request has been created in the communities, this means that you need to attach the files in the thread itself, in the forum and not via e-mail because we are not receiving it. Please reply directly in the communities and attach it there so we can access it and help you further. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

GraniteStateColin · ‎02-11-2019

Wow, I see that all my e-mails to you are appearing as posts to this forum (unlike this post, which I'm making directly at forums.intel.com). I'm apparently not able to edit them either. Please grab the data you need to troubleshoot and delete or remove that information from my post. I do not want the serial numbers or other unique information being made public like this.

Also, please see the other post in this thread from BRudo who reports the same bug in the IRST software.

GraniteStateColin · ‎02-08-2019

For security reasons, I don’t want to post files to a public or semi-public forum that include details on our computer configurations. Is there another way I can get you the files? They’re just text, I’ll paste the text below (4 files total: 2 SSU Reports and 2 IRST Verification and Repair post-run reports, 1 each for two computers named Cygnus and Dagobah). Also, in my testing, this problem occurs in all cases – connect 4 or 5 drives in RAID5, run the Verification and Repair at least twice, and you’ll see the Parity Errors when the Verification and Repair finishes (reports 0 errors at start, errors only appear AFTER the process completes). Have you tried to reproduce the problem? *********************** Volume SSD RAID5: Verification and repair complete. System Report System Information OS name: Microsoft Windows Server 2016 Standard OS version: 10.0.14393 System name: CYGNUS System manufacturer: Gigabyte Technology Co., Ltd. System model: Z170X-Gaming 3 Processor: GenuineIntel Intel64 Family 6 Model 94 Stepping 3 3.401 GHz BIOS: American Megatrends Inc., F22j PCH: 0xA145 Intel® Rapid Storage Technology enterprise Information Kit installed: 16.8.0.1000 User interface version: 16.8.0.1000 Language: English (United States) RAID option ROM version: 15.2.0.2754 Driver version: 16.8.0.1000 ISDI version: 16.8.0.1000 Storage System Information RAID Configuration Array Name: SATA_Array_0000 Size: 2,384,699 MB Available space: 10 MB Number of volumes: 1 Volume member: SSD RAID5 Number of array disks: 5 Array disk: S21HNXAG841427H Array disk: S3Z1NB0K855037V Array disk: S3Z1NB0K855034K Array disk: S21HNXAG841423R Array disk: S3Z1NB0KA45024N Disk data cache: Disabled Volume name: SSD RAID5 Status: Normal Type: RAID 5 Size: 1,907,750 MB System volume: Yes Data stripe size: 32 KB Write-back cache: Write through Initialized: Yes Parity errors: 527 Blocks with media errors: 0 Physical sector size: 512 Bytes Logical sector size: 512 Bytes Hardware Information Controller name: Intel(R) Chipset SATA/PCIe RST Premium Controller \\Scsi0 Type: SATA Mode: RAID Number of SATA ports: 6 Number of volumes: 1 Volume: SSD RAID5 Number of spares: 0 Number of available disks: 0 Rebuild on Hot Plug: Disabled Manufacturer: Intel Corporation Model number: 0x2822 Product revision: 49 Direct attached disk: S21HNXAG841427H Direct attached disk: S3Z1NB0K855037V Direct attached disk: S3Z1NB0K855034K Direct attached disk: S21HNXAG841423R Direct attached disk: S3Z1NB0KA45024N Disk on Controller 0, Port 0 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Disabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 850 EVO 500GB Serial number: S21HNXAG841427H SCSI device ID: 0 Firmware: EMT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 1 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Disabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0K855037V SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 2 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Disabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0K855034K SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 3 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Disabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 850 EVO 500GB Serial number: S21HNXAG841423R SCSI device ID: 0 Firmware: EMT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 5 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Disabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0KA45024N SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes ATAPI device on Controller 0, Port 4 Location type: Internal Transfer rate: 1.5 Gb/s Model: ATAPI iHAS224 Y Serial number: Not Available Firmware: ZL0W ******************************************* Volume RAID5 System: Verification and repair complete. System Report System Information OS name: Microsoft Windows 10 Pro OS version: 10.0.17763 System name: DAGOBAH System manufacturer: To Be Filled By O.E.M. System model: To Be Filled By O.E.M. Processor: GenuineIntel Intel64 Family 6 Model 158 Stepping 10 3.201 GHz BIOS: American Megatrends Inc., P1.30 PCH: 0xA305 Intel® Rapid Storage Technology enterprise Information Kit installed: 16.8.0.1000 User interface version: 16.8.0.1000 Language: English (United States) RAID option ROM version: 16.7.0.3513 Driver version: 16.8.0.1000 ISDI version: 16.8.0.1000 Storage System Information RAID Configuration Array Name: SATA_Array_0000 Size: 1,907,759 MB Available space: 8 MB Number of volumes: 1 Volume member: RAID5 System Number of array disks: 4 Array disk: S3Z1NB0K316414Y Array disk: S3Z1NB0K316411M Array disk: S3Z1NB0K855040A Array disk: S3Z1NB0K722361X Disk data cache: Enabled Volume name: RAID5 System Status: Normal Type: RAID 5 Size: 1,430,812 MB System volume: Yes Data stripe size: 32 KB Write-back cache: Write through Initialized: Yes Parity errors: 20 Blocks with media errors: 0 Physical sector size: 512 Bytes Logical sector size: 512 Bytes Hardware Information Controller name: Intel(R) Chipset SATA/PCIe RST Premium Controller \\Scsi1 Type: SATA Mode: RAID Number of SATA ports: 6 Number of volumes: 1 Volume: RAID5 System Number of spares: 0 Number of available disks: 0 Rebuild on Hot Plug: Disabled Manufacturer: Intel Corporation Model number: 0x2822 Product revision: 16 Direct attached disk: S3Z1NB0K316414Y Direct attached disk: S3Z1NB0K316411M Direct attached disk: S3Z1NB0K855040A Direct attached disk: S3Z1NB0K722361X Disk on Controller 0, Port 0 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Enabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0K316414Y SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 1 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Enabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0K316411M SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 2 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Enabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0K855040A SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Disk on Controller 0, Port 3 Status: Normal Type: SATA SSD Location type: Internal Usage: Array disk Size: 466 GB System disk: No Disk data cache: Enabled Command queuing: NCQ Transfer rate: 6 Gb/s Model: Samsung SSD 860 EVO 500GB Serial number: S3Z1NB0K722361X SCSI device ID: 0 Firmware: RVT01B6Q Physical sector size: 512 Bytes Logical sector size: 512 Bytes Empty port Port: 4 Port location: Internal Empty port Port: 5 Port location: Internal ******************************************* # SSU Scan Information Scan Info: Version:"2.5.0.15" Date:"02/05/2019" Time:"00:00:35.6301639" # Scanned Hardware Computer: BaseBoard Manufacturer:"ASRock" BIOS Mode:"UEFI" BIOS Version/Date:"American Megatrends Inc. P1.30 , 11/08/2018 12:00 AM" CD or DVD:"HL-DT-ST BD-RE WH16NS40" Embedded Controller Version:"255.255" Platform Role:"Desktop" Processor:"Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz , GenuineIntel" Secure Boot State:"Off" SMBIOS Version:"3.1" Sound Card:"USB Audio Device" Sound Card:"USB Audio Device" Sound Card:"NVIDIA High Definition Audio" Sound Card:"Intel(R) Display Audio" Sound Card:"High Definition Audio Device" Sound Card:"USB Audio Device" Sound Card:"NVIDIA Virtual Audio Device (Wave Extensible) (WDM)" System Manufacturer:"To Be Filled By O.E.M." System Model:"To Be Filled By O.E.M." System SKU:"To Be Filled By O.E.M." System Type:"x64-based PC" - "Display" Intel ® Graphics Driver Version:"25.20.100.6519" - "Intel(R) UHD Graphics 630" Adapter Compatibility:"Intel Corporation" Adapter DAC Type:"Internal" Adapter RAM:"1.00 GB" Availability:"Running or Full Power" Bits Per Pixel:"32" - "Caption":"Intel(R) UHD Graphics 630" Link:"http://www.intel.com/content/www/us/en/search.html?keyword=UHD+Graphics+630" CoInstallers:"nvdispgenco6441771.dll,NvGenericCoInstall,nvdispco6441771.dll,NVDisplayCoInstall" Color Table Entries:"4294967296" Dedicated Video Memory:"Not Available" Driver:"igdkmd64.sys" Driver Date:"01/08/20

BRudo · ‎02-13-2019

Hello GraniteStateColin,

DavidV_Intel in the next comment recommend you downgrade version of installed Intel Rapid Storage Technology to 15.7.0.1014.

Please let me know if this action fix a problem. My motherboard came with drivers version 16.5.0.1027, and this version generate parity errors. Update to 16.8.0.1000 does not help.

David_V_Intel · ‎02-13-2019

Hello GraniteStateColin, Thank you for patiently waiting. Upon review of the reports we were able to determine that your systems have Intel ® Chipset 100 series. Please try downloading the latest driver/interface supported by your system in the link below: https://downloadcenter.intel.com/download/26865/Intel-Rapid-Storage-Technology-Intel-RST-User-Interface-and-Driver?product=55005 Parity errors could appear if a non-supported version of Intel ® Rapid Storage Technology is in use. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

GraniteStateColin · ‎02-13-2019

One system does, the other that I've posted here has 300 series, specifically, Intel(R) 300 Series Chipset Family LPC Controller (Z390) - A305. Both have parity errors, and ONLY when running RAID5 with 4+ drives, so the chipset is clearly not the problem. RAID1 and RAID5 with 3 drives both work without yielding any parity errors.

Could you confirm that 4+ drive RAID5 works internally for Intel in your testing?

David_V_Intel · ‎02-20-2019

Hello GraniteStateColin, Thank you for patiently waiting. I would like to ask a few questions so we can investigate further about this: 1 - What is the size of the hard drives? Are they all the same size or mismatch? 2 - How much data is in the array when they run the scan? Or is it empty? 3 - Have you installed the latest BIOS version from ASUS*? Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

GraniteStateColin · ‎02-21-2019

1 - What is the size of the hard drives? Are they all the same size or mismatch?

I have provided that data already. All drives in the arrays are the same size. In the case of the newer system, the drives are all identical drives: Samsung EVO 860, 500GB. The exact size is 476,940MB. For the older system, there are some Evo 860s and some Evo 850s, but all the same size. On other computers (all have the same problem if RAID5 with more than 3 drives, problems go away with 3 or fewer drives), the sizes are similarly matched. Most of those other systems exhibiting identical symptoms are running all Evo 850s.

The only common configuration these systems may have is that all are running Samsung EVO drives (850 or 860). The motherboard, other hardware and even operating systems vary (Windows 10 or Windows Server 2016).

2 - How much data is in the array when they run the scan? Or is it empty?

Between 25% - 33% full.

3 - Have you installed the latest BIOS version from ASUS*?

Neither of these systems use an ASUS motherboard, but the latest BIOS is installed on the older Gigabyte system. I had upgraded the BIOS in the newer ASRock 300 series/8th gen i7-8600 CPU motherboard in December 2018. I see there is a newer BIOS that just came out in January, but it only appears to address memory timings for overclocking for systems running the 8086K CPU (which I'm not).

Again: Does Intel not get these errors when running 4 or more SATA drives in RAID5? And is there reason to believe our data is at risk? This is clearly a bug on the RAID5 configuration with more than 3 drives. What is not clear is if the bug is in the reporting or if there really are data-threatening errors on the volumes.

David_V_Intel · ‎02-28-2019

Hello GraniteStateColin, Thank you for patiently waiting. In order to use 4 NVMe devices, you have to use at least 1 CPUa device in the platform. According to the Software specification I found, Intel ® Rapid Storage Technology only supports RAID 0 and RAID 1 for CPUa device. This also makes me think that although Intel ® Rapid Storage Technology doesn’t block the CPUa device in RAID 5 configuration it is not officially supported. Please also note that this applies to CPUa device. SATA Raid 5 is always supported. Note: CPUa stands for CPU Attached Storage which is the one that you attached in the PCIe port. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation

BRudo · ‎02-28-2019

I have filing nobody read thread before answer. Quote:

---cut---

The drives are mostly Samsung Evo 860 SSD drives (SATA, not NVMe).

---cut---

And I have absolutely the same problem (remind you, if you forget it's Parity errors) with 4 x 4Tb HDD WD and Toshiba

GraniteStateColin · ‎02-28-2019

Indeed. I am not aware that this configuration works for ANYONE. I have not heard of a single case with 4+ SATA drives working in RAID5. Note that I don't know it doesn't work for anyone, I've just not found it to work on any of my systems and not heard a single report or response from anyone that this configuration works. David_V is clearly not actually trying this, he's just posting the occasional response. Very frustrating. Certainly starting to make me want to avoid Intel solutions in the future.

GraniteStateColin · ‎02-28-2019

David V, all the drives are SATA drives, not PCIe. Not sure what leads you to think otherwise, but these are all Samsung Evo 850 or 860 drives. Those are SATA.

If this should work, what is Intel's proposed solution?

Thanks,

Colin

David_V_Intel · ‎03-11-2019

Hello, Thank you for your patience. We have been trying to replicate the issue on-site but have not been able to do so, I have not received any parity errors. This could be related to the SSD brand and model running on this computer. Please refer to the system manufacturer or motherboard manufacturer for more information. Regards, David V Intel Customer Support Technician Under Contract to Intel Corporation