Rapid Storage Technology
Intel® RST, RAID
2063 Discussions

RST and SATA RAID0 is going haywire in Windows 11

PeaCatWR52
Novice
4,119 Views

Over the last 3 days I’ve had intermittent problems with my RAID 0 array (comprised of two identical make and model SATA3 240GB SSDs) and am now at the point where I think I may have lost all the data on my "long term" storage.

Firstly - the PC boots fine off another disk because the OS is on an Intel PCIe NVMe.

Firstly, the array initially shows OK in the Intel RST option ROM version shows as 17.5.2.4317.

PeaCatWR52_1-1707318440436.png

Optane Memory and Storage Management was throwing application errors, but I figured that out - the current version in the Window store does NOT work with the 17.x drivers and option ROM. Period. And believe me I tried every darned combo.

So I rolled the software all the way back to the versions installed from the original driver ISO for my Lenovo P330 (the driver and app are in folders K1RAD63US14 and U1RAU12AP17.) And that was the ONLY combination where the RST Optane app would even load without throwing application errors.

With the software now apparently working again and Device Manager showing the RST is on driver 17.11.3.1010, I can finally get back into the RST app...

PeaCatWR52_0-1707318291716.png

the drive that held all my documents appears like this in Disk Management:

PeaCatWR52_2-1707318608027.png

 

And in RST, initially it shows this...

PeaCatWR52_3-1707318664790.png

But I still cannot see any of the data, and after a while, it shows this:

PeaCatWR52_4-1707318701016.png

PeaCatWR52_6-1707320927026.png

 

If I reboot, I get the same thing again. Option ROM and RST both say there's nothing wrong with either disk initially but after a while it says one of the disks is inaccessible.

I may be able to lay my hands on another PC with the option ROM on it and see if the drives work on that, but wondered if there is any way to recover the data from the drives.

I mean, bit of a Hail Mary but maybe the community knows of a tool kicking around that can provide meaningful diags on the RAID array instead of just telling me "it's got errors but we ain't gonna tell you what they are, or why we don't think there are any issues for the first 10-20 minutes after a cold boot".

23 Replies
n_scott_pearson
Super User
3,717 Views

No "long-term" storage should ever be implemented using RAID0. A single drive failure results in the loss of all data on the array - and I have a feeling that you have a drive that is really failing. Working for a few minutes and then failing is the classic indicator for a component that is in the process of failing and this manifests as it warms up.

Hope this helps,

...S

0 Kudos
PeaCatWR52
Novice
3,646 Views
Sure. But it's a drive array I've had for years as an offline backup of data, much of which is stored elsewhere in a bunch of places like cloud mailboxes and other drives.

The array's survived being moved from one computer to another more than once and it predates me using cloud storage.

Also, the other PC confirmed that actually BOTH these drives are working... By having 2 instances of Phison Toolbox open side by side with each one doing a health check of the array.

I'm not at home yet but can upload the proof later. So the issue could be specific to RST on the <2 year old workstation, a motherboard fault, a dodgy Windows update... Without getting my hands on another workstation with RST RAID it'll be difficult to tell.
0 Kudos
PeaCatWR52
Novice
3,710 Views

Addendum: I have put both drives in another computer (an old Dell 790 machine that's been in a cupboard for a few years).

Both drives are being detected correctly by the BIOS AND by Windows, but the Phison Toolbox (the app provided by the vendor) is having trouble scanning one of the two disks, so I guess it's died and I've lost all the data on both as a result.

 

I don't know if this means anything but in the event logs STORAHCI is writing warnings to the event log, talking about resetting \Device\RAIDPort0 per https://answers.microsoft.com/en-us/windows/forum/all/event-id-129-storahci-resetting-raidport0/7b30c512-6597-438b-80cb-22fb2f85d62e which points to https://downloadcenter.intel.com/download/23538/AHCI-Intel-Rapid-Storage-Technology-Driver-for-Intel-NUC.

After booting Windows, both drives appear in Disk Management. Disk 2 is the one that Phison's hanging while interrogating.

PeaCatWR52_0-1707344248166.png

Here's Disk 2 in the Phison Toolbox. Health isn't optimal but it hasn't failed completely...

PeaCatWR52_1-1707344574839.png

So is there any chance whatsoever that the RAID array is recoverable? (I expect no, but you never know if you don't ask...)

0 Kudos
Kiriakos-GR
New Contributor I
3,665 Views

@PeaCatWR52   your skills as PC doctor are very poor.

The reasonable ones, they do study first and accept the risks and chances of failure at RAID-0.

 

I give my vote of trust at @n_scott_pearson ,  and you better believe him.

End of the story...

0 Kudos
PeaCatWR52
Novice
3,621 Views

Actually, Scott is TECHNICALLY correct if I'm talking about data that's "crown jewels". Data that I can't afford to lose, that's not stored anywhere else, etc.

Except, that isn't the purpose of the array.

This RAID0 array is a cheap cache for storing non-critical archive material on a local drive. Basically, if I need to reinstall software, mount an ISO image, clone an old HDD to a new one for a family member, or take an image of a new laptop HDD so that I can reimage it back to factory, etc etc, this is the rig I use (I also have external drive bays and a drive docking station with cloning function.)

The 480GB drive is more than sufficient for holding most stuff, and it has the advantage of being a lot quicker to work with than a 6TB SATA mechanical drive and these days as most of the backup/restore/clone work is with SSDs, using SSD. I was able to buy 2 matching 240GB SSDs for 30% less than buying the 480GB version - and it actually is a bit quicker than a single 480GB drive.

The home PC already had RST built in, and as a single dead slightly slower 480GB drive would lose the data just the same as a dead 240GB drive in a faster RAID0 config would've. So this was a total no-brainer.

Would you implement a RAID 5 array with enterprise grade mechanical drives purely for a STEAM library and maybe 100GB worth of ISOs... inside a home computer which can access a 1TB NAS on the LAN? If you would, then you've got more money than sense.

 

0 Kudos
PeaCatWR52
Novice
3,620 Views

This evening I've run both drives through the Phison Toolbox app on the (~10 year old!) computer.  (The machine's off the network hence photos rather than saved reports).

I had trouble with the Phison Toolbox diags yesterday because I was scanning both disks in the same running instance. If I launch the app twice, so each disk is scanned under a different processsession, it works fine. Here it is, a photo of the two sessions running side by side.

PeaCatWR52_0-1707430569557.png

And here's what I got when I switched to the SMART test.

PeaCatWR52_1-1707430705392.png

The stats may be difficult from the photo so as a general pointer:

  • Unrecoverable errors (from first use) - 14 on one disk, 37 on the other.
  • Power-on hour count: 28137 on one disk, 28152 on the other. (Aggregates as 168 weeks, or 3 years, 3 months)
  • CRC errors to date: 489, 258.

I've had that PC on for 24 hours now and both drives are still showing as fine. So it seems that the fault could be with the SATA controller, or RST - because the other PC can't see anything wrong with the disks.

To test the theory I'm going to need another PC with the RST option ROM and plug the drives into it. I've already moved RAID arrays from one PC to another before, so if there's nothing wrong with the disks, when I plug them into another PC in the right order and set up the RAID array exactly how I had it set up in this PC, the disk will reappear and be readable.

And if that happens, then I'll log a support call with the PC manufacturer, because it's an enterprise grade workstation and it's still under warranty.

0 Kudos
JayB_Intel
Moderator
3,364 Views

Hi PeaCatWR52,


We would like to know the exact computer/system that you are setting up. We when looked up Lenovo P330, it also shows Lenovo ThinkStation P330 Workstation, is this correct? This for us to make sure that we will be providing accurate advise suitable to your system in terms of compatibility.



Best regards,

Jay B.


0 Kudos
PeaCatWR52
Novice
3,358 Views
Yes, that is correct - ThinkStation P330.

This may be relevant - after close inspection the event logs were going crazy with storage device event errors even with the SSDs removed. But that has stopped since a Windows 11 update earlier this week. But i also had to uninstall the Windows Store version of the optane management app because it doesn't work with v17 drivers at all.

I've put the disks in an older machine, one passes all integrity checks but the other has a SMART warning (not a total fail though).

I'm getting my hands on a different Win10Pro workstation with a factory restore image including stock option ROM and original RST management utilities on it at the weekend, and if the raid array mounts fine on that then it effectively proves that the underlying issue is Windows Store + Windows Updates + Lenovo/ Intel driver updates resulting in an incompatible combination on the ThinkStation.
0 Kudos
JayB_Intel
Moderator
3,294 Views

Hi PeaCatWR52,


Thank you for sharing your findings on the issue. However, we will do further research and investigation on this matter and post the response on this thread once available.



Best regards,

Jay B.


0 Kudos
JayB_Intel
Moderator
3,189 Views

Hi PeaCatWR52,


I hope you are doing well. To proceed with the case, it is likely that the drive is failing and that is why you are seeing it report ok at first and then it drops.


You may try this section (If the RAID 0 volume failed due to a failed drive) of this article: Recovering a RAID 0 Volume Failure Using Intel® Rapid Storage Technology - https://www.intel.com/content/www/us/en/support/articles/000006437/technologies.html to see if it helps. However if this does not work, you may consider to try and see if a 3rd party data recovery app will help.


Please be advised that moving the drives between systems and checking them (looks like individually), this likely didn’t help as the RAID metadata can easily get corrupted when drives are being moved between systems being looked at individually and such.


Lastly, RAID 0 is simply a big volume. If one of the drive members fails at any level, all data is compromised. If you do not have any data backup and one of the drives in RAID 0 already failed, then all data is lost unfortunately.



Best regards,

Jay B.


0 Kudos
PeaCatWR52
Novice
3,096 Views
Sorry, been busy.

I got another 2 pcs - one a Dell T1700 the other a Lenovo M91P, both with RST option ROM. The Dell is on factory recovery Windows 7, the M92P is on clean install Windows 10 with matched Intel RST and drivers from the Lenovo support site, Windows Update is blocked as I've taken the machine off the network.

On both PCs the OS hangs or bluescreens while starting if the failing disk is plugged into a SATA port but it doesn't hang if it's plugged into the Lenovo eSATA port.

The Dell option ROM shows both drives as non raid with one disk healthy and the other with a warning. I can't boot off an internal drive if bios is set to raid instead of AHCI, but a winpe USB does boot, so I'm trying that with third party tools that does detect the array correctly and can show me a file system.

I don't have time this week to write up a full report but what I can tell so far is that one SSD is healthy and the other has an issue triggering a SMART warning about a month ago.

No idea why, but instead of throwing up a message to the effect of "mate, your raid disk's got a SMART WARNING, you might want to check it out" either at the option ROM stage or in Windows, nothing was picked up and there's no direct evidence of an imminent disk failure in event logs.

I think the OS is being too clever, trying to mount and index the array then hitting issues, while the Intel RST is acting as if the drive has a minor smart warning that isn't worth interrupting boot for.
0 Kudos
PeaCatWR52
Novice
3,094 Views
What I really need is a WinFE type pen drive (or a Linux equivalent) that has rst drivers but doesn't mount the array as read/write. Recovery tools can then mount it read only and skip the sanity checks. That'll take me a while to rig up. Chances are the data's inaccessible but you never know. I recovered data from a HDD that melted in a house fire once; it just took a few months of lateral thinking.
0 Kudos
JayB_Intel
Moderator
3,107 Views

Hi PeaCatWR52,


Were you able to check the previous post? Let us know if you still need assistance. 



Best regards,

Jay B.


0 Kudos
PeaCatWR52
Novice
3,065 Views

Thanks Jay, yes I have checked the post.

I know what I'll be doing - if you saw the previous message,  you'll know I have 1 machine without RST RAID (Dell 790) and 3 machines with RST RAID (one Dell T1700 and two Lenovos, the P330 and M91P).

I know the Dell 790 SMART test and the Phison SSD diagnostics tools (if run on that machine) think the SSDs are a bit "worn" but not dead yet. It boots up fine, but of course has no idea that they're a RAID stripe, so as far as it's concerned there are two unpartitioned SSDs in the system.

The Dell T1700 boots up fine into Windows if the better condition disk is plugged in, but doesn't if both are plugged into SATA ports - but booting into a WinPE environment or a Linux one does appear to bypass that problem (but then of course I can't run the RST software). I do have a third party tool on the T1700 that can see there's a RAID stripe across the 2 SSDs, created in RST, as long as the drives are plugged into a USB-C dock not into the SATA ports. But it fails when trying to recover the files. By the looks of things, the disk emulation in the dock masks the SMART errors.

So next step is to test fully on the M91P. I know already if the less reliable SSD is plugged into the eSATA port instead of a main SATA port, the boot sequence doesn't try to initialise it right away. If that all fails, I'll go into the option ROM and recreate the array there.

It might take a couple of weeks but like I said, this is an academic exercise not a "my world will collapse if I can't get the data back" situation. I have backups elsewhere, but that's really not the point - I've already recovered what I need from elsewhere.

But it'd be fun to get to the bottom of what ACTUALLY is going on here, because I bet you good money if I formatted both SSDs and permanently de-RAIDED them, at least one of them would be in full working order and the other would be past its best but still usable.

0 Kudos
JayB_Intel
Moderator
3,054 Views

Hello Hi PeaCatWR52,,


thank you for the update. We will look into your findings and check how we can move forward with your query. We will do further research and investigation on this matter and post the response on this thread once available.


We appreciate your patience for this one.



Best regards,

Jay B.



0 Kudos
ryan_c
Beginner
3,017 Views

I came across this from Google. I've also been seeing some strangeness with RAID0. For me, as far as I can tell it only happens when more than two devices are put into RAID0 using Intel RST. We have lots of deployments out there with 4TB m.2 NVMe drives in RAID0 that work just fine. However when we put three 4TB m.2 NVMe drives in RAID0 within 24 hours Windows GUI starts to lock up in strange ways then completely locks up about 20 minutes later. These are using brand new Dell workstations with brand new Corsair P3 m.2 SSDs. Let me know if you think this should be it's own post or not.

0 Kudos
PeaCatWR52
Novice
2,950 Views
I'm beginning to wonder if it is related to...

https://support.microsoft.com/en-gb/topic/kb5028997-instructions-to-manually-resize-your-partition-to-install-the-winre-update-400faa27-9343-461c-ada9-24c8229763bf

Booted into Linux and guess what, just like on the non updated Win7 installs, both drives mount. But on updated Windows 10 or 11, I can't even boot to the login screen.
alexfankle
Beginner
2,977 Views

Its really helpful, thank you. 

0 Kudos
JayB_Intel
Moderator
2,821 Views

Hi PeaCatWR52,


We would like to clarify and share a few points in addressing your concern. Please see below for your reference:


1) Did you configure the RAID 0 array as a data only volume or as Windows OS boot drive + data?

 

2) When checking this article:

https://www.intel.com/content/www/us/en/support/articles/000006437/technologies.html

 

Were you able to see the part about "If the RAID 0 volume failed due to a failed drive" which is about resetting the disks to normal and attempt data recover and also mentioned in section 11.4.2.5 of the Intel RST guide?

 

I We


We would like to suggest to do this on the original Lenovo P330 computer.

 

3) Intel Optane Memory and Storage Management software offer some level of health monitoring. Please see screenshot below:

 

But this is not as comprehensive compared to some dedicated hardware RAID controllers.

 

Email notification can also be setup.

 



Please try the above steps and see if we can have it going with Windows 11. We will be waiting for any update. Thank you.



Best regards,

Jay B.


0 Kudos
PeaCatWR52
Novice
2,799 Views

1) Did you configure the RAID 0 array as a data only volume or as Windows OS boot drive + data?

Data only. The array was set up as a store separate to the OS, about seven years ago.

  • The RAID0 pair of disks has been swapped across different workstations three times - it started off in a HP tower with Intel RAID, then moved seamlessly to a Dell, and 2 years ago I plugged it into the Lenovo P330.
  • It's been working without a hitch until February this year.
  • At the time it failed, a bunch of different things happened. Firstly, Microsoft Updates insisted on overriding the Optane app, installing one that works with v18+ drivers but not v17.

2) When checking this article... were you able to see the part about "If the RAID 0 volume failed due to a failed drive" which is about resetting the disks to normal and attempt data recover...

Yes. That's how I know third party recovery tools can interrogate both disks individually AND scan the RAID array - here's DMDE on Linux, and me playing with command line tools on the Lenovo T1700.

PeaCatWR52_0-1709747736592.png

PeaCatWR52_1-1709747788490.png

Stellar Data Recovery is even able to bring up the file system from the RAID0 array, but it falls over when trying to restore files.
Here's what RST option ROM and Lenovo Diags show. Note, options 1-5 in the option ROM are not even selectable anymore so it defaults to EXIT.

 

PeaCatWR52_2-1709747916351.png

PeaCatWR52_3-1709747950862.png

 

 

3) Intel Optane Memory and Storage Management software offer some level of health monitoring.

Yes, but not to the point where it's useful if I'm honest. I get far more useful information out of Phison Toolbox and other diag tools than I get from RST. And, another issue now is that I have to keep uninstalling the Windows Store version of Optane tools and rolling back the driver, because no matter how many times I tell Windows 11 to STOP automatically updating drivers and apps, it keeps on doing it, although that's a new phenomenon - it's only been doing this since late January so I blame a Microsoft update.

 

PeaCatWR52_4-1709748353845.png

And, the minute THAT goes on the PC, then I get this when I try to use the software no matter what driver's installed.

PeaCatWR52_5-1709748435113.png

 

ONLY the driver AND tool combination from the Lenovo recovery disk e.g. U1RAU12AP17 (per my very first message) work. See attached.

0 Kudos
Reply