Rapid Storage Technology
Intel® RST, RAID

Which disk is in trouble?

JSchw10
Beginner
1,988 Views

I have a two-disk RAID 1 array. My system started behaving very badly, and I suspected that there was a problem somewhere in the disk subsystem. Windows 10 resource monitor showed that the disks were active 100% of the time, and the graph had a very peculiar pattern, even if there weren't any processes particularly busy. Response times were going up to over a second, and applications were not responding.

The problem is that the RST software reported that everything was fine. I couldn't find any way to diagnose the problem until suddenly one of the drives fell over dead. The volume is now degraded, but system performance is back to normal. Now that the drive is junk, I can tell which one it is.

My question is how could I tell a drive was failing, and how could I tell which one?

0 Kudos
8 Replies
idata
Employee
1,013 Views

JerrySNB: Thank you very much for contacting the Intel® Rapid Storage Technology communities. We will do our best in order to assist you with this scenario.

 

 

In regard to your inquiry, to find out which is the defective hard drive, you can always access the "control + i" menu by pressing those keys repeatedly when the computer is starting up. Once you do that, you will see each hard drive and the status of it, there should be an error message next to the disk that is not working and also you will be able to see the serial number of it to identify it.

 

 

Any further questions, please let me know.

 

 

Regards,

 

Alberto R

 

0 Kudos
JSchw10
Beginner
1,013 Views

I understand what you're saying, but that doesn't solve my problem. The volume was reported as normal, because both drives were still working; but one of them was in serious trouble. I could tell because the disk activity (percent busy, queue length, response time) was not right. The percent busy was 100%, the queue length was sometimes as high as 50, and the response times often reached 1000 ms or more.

Once the drive stopped working completely, the volume went into a degraded state and I could see which drive had died. It took a week for this to happen, during which time the system was nearly unusable.

How could I have told which drive was getting ready to fail? Neither the RST software, BIOS, nor controller reported a problem.

0 Kudos
idata
Employee
1,013 Views

JerrySNB: Thank you for providing those details. Normally the way it works is that the firmware of the hard drive reports to the Intel® RST tool if there are any problem with the health status of the hard drive. Sometimes, like in this case, if the firmware does not report inconsistencies on the hard drive then the tool will not show any errors or problems with the RAID structure.

 

 

As an option, you can always check with the manufacturer of the hard drive if they have a tool or application to monitor the health of it.

 

 

We apologize for any inconvenience.

 

 

Any questions, please let me know.

 

 

Regards,

 

Alberto R

 

0 Kudos
n_scott_pearson
Super User
1,013 Views

I thought that RST was regularly polling S.M.A.R.T. data from the drives and could generate an alert if something was wrong????

0 Kudos
idata
Employee
1,013 Views

N. Scott Pearson: Yes, correct, but that S.M.A.R.T. data is generated by the firmware of the hard drive.

 

 

Regards,

 

Alberto R

 

0 Kudos
JSchw10
Beginner
1,013 Views

I tried looking at the S.M.A.R.T. information in my system's BIOS, but that shows the raw data.

By the time I realized what was happening, it was too late to download anything. The system was too unresponsive. I was doing everything through my phone. I suppose I could have downloaded a utility to my phone and then moved it over to my PC.

In any case, I think we've taken this as far as we can.

0 Kudos
JSchw10
Beginner
1,013 Views

Sadly, it doesn't look like Western Digital's software can see past the RAID controller; so that's no help.

0 Kudos
idata
Employee
1,013 Views

JerrySNB: Thank you very much for providing those updates. We are sorry to hear that the configuration does not work as expected and as you mentioned, from our side we did our best we could to provide the information you were looking for.

 

 

Any questions, please let me know.

 

 

Regards,

 

Alberto R

 

0 Kudos
Reply