Rapid Storage Technology
Intel® RST, RAID
2188 Discussions

Help! Intel RST RAID 1 Corrupting SSDs

Michael4000
Novice
6,214 Views

We have multiple HP Z2 G9 workstations using Intel RST motherboard RAID in RAID 1 configuration.  We use Western Digital Black SN850 and Samsung 990 Pro SSDs.  Over the past few months, we have experienced multiple problems, to the point of being chronic.  

 

Problems:

1. After the computer sits overnight, it says "Boot Device not found".   After power cycling, computer reboots normally.

2.  Intel RST reports SMART errors on one or both drives.

3.  Intel RST reports one of the drives  as "Unidentified" after a few days or weeks.

 

One computer was in since last November, and just in the past couple weeks is experiencing the "Boot Device not found" error.  It restarts normally after soft boot.  Driver issue with latest drivers?

 

We have sent multiple SSDs in under warranty to WD and Samsung.  Finally, Samsung reported RAID is corrupting drive on one of the RMA repairs.

 

All drivers are up to date from HP including BIOS and Intel RST.  All computers have 14 gen i7 processors.

 

Any help would be appreciated since we are at a standstill.

0 Kudos
30 Replies
Rick_Queneau
Beginner
1,330 Views

Robbie,

My name is Rick Queneau and I have an almost identical problem to the one Michael4000 is having. According to my event logs, it started on August 25th of last year. I have two Samsung 990 Pro 1TB NVMe SSD's configured in a RAID 1 mirror using the onboard IRST BIOS utility of my ASUS motherboard. It's been working flawlessly since February of 2023. I was looking at HWInfo today to try and figure out why I was having some performance issues and saw that it was reporting both Samsung SSD's as being degraded with 0% life remaining. I installed the IRST windows app and it shows the status in the screenshot that's attached. 

So that we don't beat a dead horse:

Yes, everything was running fine up until recently.

Running Windows 11 Pro 23H2 OS Build 22631.4602

13th Gen Intel i9-13900K 3.00 GHz

64GB RAM

Firmware has not changed recently.

 

I was thinking about breaking the mirror, replacing one of the SSD's and then rebuilding, doing it again but replacing the other (old) SSD. However, I can't find an IRST procedure for doing this; at least the first part. How do I safely break the mirror without losing my boot SSD? How do I rebuild the mirror without wiping out the data on the boot SSD? If you have this procedure handy, I would appreciate a link to it. But I digress. This is a bad situation and like Michael, I'd like to get it resolved.

Please help.

0 Kudos
Michael4000
Novice
1,308 Views

Rick,

 

Thanks for the report.  it's good to hear I am not the only one, and hearing that it is occurring in computers other than HP computers should be helpful to Intel.

 

With RAID 1, breaking the mirror is as simple as turning the computer off, removing power, and removing one of the SSDs.  The remaining SSD, as long as it is in good condition, will allow your computer to boot back up.  Installing a new SSD will automatically put the RST system into "Rebuild" mode, and once booted into Windows the rebuilding will take place.  You can monitor the rebuilding process in the RST Windows app.  it will tell you the percentage rebuild completed.

 

If you are putting in a used SSD, it is best to erase that SSD, so RST sees it as a new SSD.  You can put in a used SSD that was using RST RAID from another computer, but you will then have to delete that RAID volume in the RST in BIOS.  I can be done, but is a bit nerve-racking to be deleting RAID volumes on a running computer.

 

Michael

0 Kudos
Rick_Queneau
Beginner
1,289 Views

Michael,

Too Late!!!

As bad luck would have it, my system tanked last night and refused to boot at all. The AMI BIOS kept popping up with with a warning for both of the Samsung SSD's stating that I needed to back up the data and that failure was imminent. 

Long story short, I was able to break the mirror and get back to having two separate 1TB SSD's. The M.2-1 would not boot no matter what I tried. I went into the BIOS and swapped M.2-1 with M.2-2 (Boot Order) hoping that the other SSD still had a valid MBR/GPT. To my surprise, I was able to boot from M.2-2. It took a while, but it came up. Plenty of OS and application issues to resolve once I booted, but that's to be expected. 

I was able to run Samsung Magician and found some odd results. The M.2-2 SSD that I'm able to boot from has a total of 10587 'Power On Hours', while the M.2-1 SSD that I can't boot from has only 9692 hours. Both are configured the same. Both show a status of 'Critical' though. M.2-1 shows 19.1TB written while M.2-2 shows 18.1TB written.

Ran the Full and Short 'Diagnostic Scans' on both and the scans said both SSD's are 'Good'. But when I run the Short or Extended SMART Self-Test, those fail almost immediately with the message '-Status: Aborted by unknown error ({0{){7}'. Can't find any definition for that. In the 'S.M.A.R.T.' report window, the only issue that shows up is a 'Critical' status at 'Byte 0' with a Raw Data value of 4. Again, no idea what that means exactly. The description simply says 'Critical Warning' related to the controller.

So, even though the Short and Full tests say they correct errors and check data integrity, I don't think there's a way to fix the problem with the controller. Boot time is taking around 5 minutes. For an SSD, that's awful. Since both of the SSD's were affected, I simply DO NOT believe the problem was with the SSD's themselves. Like you, I believe the issue was caused by the Intel RST RAID. BTW; I have a pair of Samsung 870 EVO 4TB SSD's configured as a mirror (Dynamic Disk) in Windows. No issues on those.

Data is all backed up now and I'm opening a support case with Samsung to get the SSD's replaced. May just go out and buy one and do a 'Migrate' of the current M.2-2 before it completely craps out though.

Robbie, if you read this entry, please let me know if you want me to upload any diagnostics.

0 Kudos
Michael4000
Novice
1,272 Views

Sorry to hear that both disks are in such poor shape.

 

The problems you are experiencing with failures on the short and long SMART tests are the same that I experienced with several Western Digital SSDs.  I was lucky.  I only have one drive fail at a time, and was able to mirror the system again after WD repaired the SSDs.

 

Since your Windows dynamic software Raid drives are working fine, it points to Intel RST having the issue.

 

I sent a forum message to Robbie yesterday.  I hope we hear back from him soon.

0 Kudos
Rick_Queneau
Beginner
1,261 Views

I opened a support ticket with Samsung. The guy I spoke with was scratching his head wondering how the heck two SSD's could fail at the same time. What are the odds? But the diagnostics don't lie. Event viewer shows issues for both SSD's starting in August of last year.

Anyway, I have to send my SSD's to Samsung in order to get them replaced. So in order to have a functioning system in the meantime, I need to go buy one and install it, and then use Magician to migrate the bootable M.2-2 to the new M.2-1.  I don't think that should be an issue. I'll just need to go into the BIOS and swap the boot order again.

I'm thinking of going with either a Windows Mirror on the boot drives, or getting a PCIe card that does hardware RAID1. Thoughts?

0 Kudos
Michael4000
Novice
1,214 Views

Poor Samsung and WD are probably getting a lot of broken SSDs due to this RST issue now.

 

I've looked into Windows mirror.   Microsoft doesn't support Windows RAID for boot drives.  I did find someone who did it, but it was a real pain.  You have to modify files in the boot sector so Windows will see the drives.  If one drive goes out, you have to go through the complicated process again.

 

I've been looking for RAID controllers also.  Unfortunately, the ones I have found are in the hundreds of dollars.  There was an $80 Acer one, but it turned out it didn't have onboard RAID.  It just used Intel RST, so we are back where we started with that one.  

 

No response from Intel on this issue yet.  I messaged Robbie, and haven't heard back.

 

Intel please help with this!

0 Kudos
Rick_Queneau
Beginner
1,206 Views

Michael, 

I created a mirrored boot drive already using Windows. But first, I used Samsung Magician to perform a data migration from the still bootable SSD to a new one. It's really a data clone since the data remains on the old SSD. Anyway, the migration created all of the partitions (EFI, Recovery, Boot as well as the reserved). I made sure the drive was a GPT drive in Disk Manager and then verified a few things using diskpart. Then I used the 'bcdboot' command to make sure the proper information was copied to the EFI partition. I first had to assign the EFI partition a drive letter though. Once the 'bcdboot' command was complete, I removed the drive letter from the EFI partition. 

The final set of steps was to make the new drive a dynamic drive and then create the mirror. It's been running fine for two days now. I get a message each morning from Magician warning me about the failing SSD and asking me if it can send info to Samsung. And the AMI BIOS still reports the same SSD as having imminent failure. So, for now I'm covered. When I boot I get a windows screen that gives me a choice of booting into Windows 11 or booting into a secondary OS (the mirror drive). Screenshot below shows the mirrored boot drives. Drive 2 is primary.

As for hardware RAID, I ran into the same issues. Not sure how much I want to spend versus having to deal with these issues again. If I was dealing with a server farm, I'd go with hardware RAID every time. But since this is a personal PC that I also use for work related tasks due to the very large screen, I'm in the decision phase currently. Can you send me the names of the RAID units you found? I'd like to check them out. One of the concerns I have with those is if any of them can preserve the source drive when creating the mirror. 

0 Kudos
Michael4000
Novice
1,125 Views

Thank you for detailing the procedure you used for setting up RAID with software.

 

If one drive goes bad, will the computer startup on its own?

 

What's the process for remirroring?

 

I'm still looking for affordable hardware raid solutions.  I did some research this morning, but the controllers I found were $1000.  Yikes!   That's OK on a server, but not a workstation.

0 Kudos
Rick_Queneau
Beginner
1,120 Views

Once you have the mirror created in Windows, using regular windows disk management, you get a blue boot screen on startup that gives you the choice of booting up from the regular Windows system, or from the 'Secondary Plex' which is the mirror. It defaults to the regular Windows system after 30 seconds or you can just hit 'Enter'. I would imagine that if the primary were unavailable, it would boot from the secondary plex automatically. Haven't tested that scenario. 

What I'm unsure of is if you would need to change the boot order or do a boot override in the BIOS.

Anyway, one thing I wasn't happy with was the fact that I wasn't seeing 'Windows Boot Loader' in the SSD name listing of the boot devices. This tells me that the bcdboot command didn't work as expected. So, I booted to the Windows Install Media, went to a Command Prompt and ran a CHKDSK on the drives. Took almost 12 hours!!! Seemed to fix the problem.

I also bought and downloaded AOMEI Partition Assistant. I really like this utility. It allows you to convert a Windows Dynamic Drive back to a Basic Drive without having to delete the partitions or reinstall windows and no data lose. You can Clone SSDs in minutes and do a raft of other things that you just can't do in windows. I got the 'Pro' version and worth the $54 I paid for it. It also has a feature where you can create a bootable USB drive on another system and use it to boot up a system that won't boot. Then you can use the partition assistant utilities to fix the boot problems. I actually did this with the one M.2 SSD that I was unable to boot from as a test of the utility. Worked like a charm. They also have an enterprise/tech version for the IT department. Highly recommend looking into it. I probably should have used this utility from the start. Would have saved a lot of time and frustration. It can't fix the actual damage to my two original 1TB SSDs, but it would have made the soft repairs and cloning a lot easier and quicker. 

0 Kudos
Michael4000
Novice
1,061 Views

Thanks for all of the testing and tips.  I'll have to look into the AOMEI Partition Assistant.  This sounds like a nice tool.  I'm still a bit unsure of the exact steps to setup a software bootable mirror, and if one drive fails, it will automatically fail over to the other.

 

By the way, I started a new thread, and have enlisted the help of an Intel engineer if you would like to follow along.  Here is the new thread.

 

https://community.intel.com/t5/Rapid-Storage-Technology/Intel-Tech-Support-Please-Help-with-SSD-Corruption-with-RST/m-p/1659076#M13634

 

0 Kudos
Reply