Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4778 Discussions

IMS MFSYS25 - Bad sector found

MJaen1
Beginner
2,462 Views

Hi,

one of my drives alerts:

Description: Bad sector has been found on physical disk

Probable Cause: Encountered a media error.

Corrective Action: Check if the virtual drive is redundant or not. If the

virtual drive is not redundant, there may be data lost. The user should

replace the bad physical disk and rebuild the storage pool, and if the

virtual drive is offline reload the data from a backup.

What does this exactly mean? The"Corrective Action is somewhat ambigous...

One physical drive has a bad sector, but the physical drive should be able to handle bad sectors and relocate it internally.

Does this alert now mean that the drive (already) failed to relocate the sector because too many bad sectors have already been relocated?

Or does this mean nothing at the moment as the drive relocated the sector and just also informed the SCM about this first occurrence?

In the first case I should really replace the drive immediately in the other case ...

(If I would get an alert for every bad sector I would sooner or later drown in such alerts.)

Problem is that this is a Seagate Savvio 10K 2 900 GB ST9900805SS (6 months in use, 5Y warranty).

Seagate requires for replacement to run several checks on the drive which is not possible as long as it is running inside the IMS.

Also if they find out the there was a single bad sector which was properly relocated, they would probably charge me...

What would be a good way to get the full SMART-Information form that drive including the size of the remaining relocation table?

Any advice?

Just for intel: the IMS is fully upgraded to comply with the latest THOL, i.e. PSs and IO-FAN have been upgraded as well as FW (which was done because we had a lot of trouble with Toshiba MBF2600RC and Bad Sectors).

Thanks

M.

0 Kudos
6 Replies
idata
Employee
803 Views

Have you considered taking that drive offline letting the IMS rebuild on a spare? You can then pull that drive and put it in an external enclosure or another machine with SAS card and run full diagnostics on it.

I ran across your thread as I am preparing to put 14 new ST9900805SS drives in an MFSYS25 chassis and was Googling to see if anyone had any negative experiences with this model in the older MFSYS25 units

Jason

0 Kudos
Daniel_O_Intel
Employee
803 Views

I would recommend replacing the drive with a new one.

Now that you have the new IO fans and Power Supplies, you shouldn't be getting any new bad sectors.

0 Kudos
JLupo
Beginner
803 Views

Does anyone know the procedure to swap a drive in the MFSYS25 when a hot spare isn't present?

0 Kudos
Edward_Z_Intel
Employee
803 Views

The simple way is to make the new drive as hot spare first, and then remove the faulty one. But if you don't have any free slot for the new drive, follow the steps below:

1. If the disk is already dead, remove it directly. If it's still accessible, select the drive and click "Force Offline", and then remove it.

2. Insert the new drive.

3. Select the Storage Pool, and then click "Rebuild".

4. Select the new drive to rebuild on.

 

0 Kudos
MJaen1
Beginner
803 Views

Hi.

assuming you have a spare slot available you could just add the drive, define it as a spare (non-revertable !) and then "force offline" the old drive.

Unfortunately you have to run the RAID in insecure mode for several hours. As it is possible that an other sector on an other drive is bad you might end up unrecoverable, maybe even with a completely failed RAID. Intel has no information on these critical situations and how the storage manager will behave.

I have requested two years ago to change that procedure so that one could have a "replace", where all available drives are taken into account when rebuilding the replacement drive. And only after the successful transition the "to-be-replaced-drive" will be shut down. But they decided to stop development of the IMS completely instead and even refined EoL-Date.( This is the worst I have ever seen in IT esp from Intel - now I am having headaches explaining this to my customers)!

M.

0 Kudos
Edward_Z_Intel
Employee
803 Views

Now I understand what you mean... Good point.

0 Kudos
Reply