Graphics
Intel® graphics drivers and software, compatibility, troubleshooting, performance, and optimization
20839 Discussions

IMSM 8.8/8.9/9.5 & Win7 in-box RAID driver - Random drive failures with 8.9

idata
Employee
7,066 Views

This thread is effectively a continuation of "/thread/5036?start=0&tstart=0 Random drive fails with new Matrix Storage Manager 8.9" that has been locked - a pity because it was a valuable source if information.

I am about to conduct a major hardware and OS update of a PC having a RAID10 setup. Previously the system has been running 8.8 under Vista without problems. The update will include Win7, but I am uncertain which version of IMSM/IRST to use:-

  • Version 8.8.0.1009 is not certified for Win7,
  • 8.9.0.1023 is certified for Win7 but has the well documented problem with dropping drives,
  • the pre-release RST 9.5.0.1037 has still not been officially released, and judging from web reports has many other problems,
  • or the in-box Win7 driver..........

Intel's Elizabeth, in /thread/5036?start=423&tstart=0 post 423 of the original thread, stated that the Win7 in-box driver is similar to 8.9.0.1023, but not identical. Has anyone using the Win7 in-box RAID driver had any problems with this version? However, I think that unless the "manager" part of IMSM has been separately installed the user may not be aware that it was randomly dropping drives from the array if they recover quickly.

I would appreciate feedback from anyone using the embedded Win7 RAID driver.

55 Replies
idata
Employee
1,116 Views

Hi,

I will update my results, now 1.5 months without any problems using 8.8. I will probably not upgrade to any new version, why to brake something already working. This question is also for Intel....why this new fancy UI (9.5) is needed, i would be more happy to have command line interface, but working driver...

thanks,

Imre

0 Kudos
idata
Employee
978 Views

Intel have released RST 9.6.0.1014 on their website http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=15251&lang=eng here. Reports say that these include TRIM pass-through for SSD's (except with RAID 5).

Has anyone who had the "random drive failures" problem with 8.9 and conventional HDD's tried these new drivers?

0 Kudos
Daniel_T_Intel
Employee
978 Views

A "Random drive failure" condition has been corrected with 9.6.0.

0 Kudos
idata
Employee
978 Views

It may have been, but I cannot find confirmation of this in the release notes.

What is a concern in the "Known Issues" is "Event ID 9 "iaStor" error in the system log". Very similar to the error messages some were getting with 8.9.

0 Kudos
Daniel_T_Intel
Employee
978 Views

The issue resolve that pertains to this issue is 2724057.

Event ID 9 is a generic I/O timeout message. Sometimes it is a result of a bug in the driver and sometimes it can be an issue with a physical drive. Issues in the Known Issues section are issues still under investigation. So it has not been determined if it is a bug in the driver or if it is a valid Event (i.e. the driver is working properly and is reporting the error properly).

0 Kudos
idata
Employee
978 Views

Well, I upgraded to 9.6 a few days ago -- maybe even a week ago -- and have not had a single problem. It has been so stable that I even enabled the write cache, which I don't normally do.

And I was one of the ones having horrible problems with 8.9 or whatever it was.

This is W7 64-bit with enterprise class WD RE drives in RAID10 config.

0 Kudos
idata
Employee
978 Views

az-djt wrote:

The issue resolve that pertains to this issue is 2724057.

Event ID 9 is a generic I/O timeout message. Sometimes it is a result of a bug in the driver and sometimes it can be an issue with a physical drive. Issues in the Known Issues section are issues still under investigation. So it has not been determined if it is a bug in the driver or if it is a valid Event (i.e. the driver is working properly and is reporting the error properly).

I can accept what you say about Event ID 9, but the description of Ref 2724057, appears to refer to a different type of issue - the creation of a RAID1 volume from the OS source drive. This is also a "Known Issue" and not a "Resolved Issue". Have you quoted the wrong reference?

0 Kudos
idata
Employee
978 Views

cymru1 wrote:

Well, I upgraded to 9.6 a few days ago -- maybe even a week ago -- and have not had a single problem. It has been so stable that I even enabled the write cache, which I don't normally do.

And I was one of the ones having horrible problems with 8.9 or whatever it was.

This is W7 64-bit with enterprise class WD RE drives in RAID10 config.

 

Thanks for that. It is the first positive confirmation that I have seen from someone who had the problem of dropped drives with 8.9 saying that 9.6 works for them.
0 Kudos
Daniel_T_Intel
Employee
978 Views

Sorry about that, I got ahead of myself. The resolved issue that is ties to this random drive failure issue is 3005402. A better explination of the issue is this:

When running in a RAID 1 configuration (RAID 10 is comprised of a combination of RAID 0 and RAID 1), there is a corner case condition were RAID metadata information being written to a drive may not complete successfully. This is caused by a synchronization issue between the Intel® MSM driver and a Microsoft driver component. When the error occurs, the Intel® MSM driver believes that the disk drive is not responding and the driver has no choice but to mark it as FAILED. This issue was root caused after the Intel® MSM 8.9.0 driver release. The resolution to this issue is available for general release with our 9.6 based driver. This synchronization issue is a pre existing condition (i.e. not introduced in the 8.9 driver release). Drivers prior to Intel® MSM 8.9.0 could exhibit this behavior. Even though this issue was identified and resolved in a RAID 1 configuration and has not been seen under other RAID configurations, it could be encountered under other RAID configurations.

0 Kudos
idata
Employee
978 Views

az-djt wrote:

Sorry about that, I got ahead of myself. The resolved issue that is ties to this random drive failure issue is 3005402. A better explination of the issue is this:

When running in a RAID 1 configuration (RAID 10 is comprised of a combination of RAID 0 and RAID 1), there is a corner case condition were RAID metadata information being written to a drive may not complete successfully. This is caused by a synchronization issue between the Intel® MSM driver and a Microsoft driver component. When the error occurs, the Intel® MSM driver believes that the disk drive is not responding and the driver has no choice but to mark it as FAILED. This issue was root caused after the Intel® MSM 8.9.0 driver release. The resolution to this issue is available for general release with our 9.6 based driver. This synchronization issue is a pre existing condition (i.e. not introduced in the 8.9 driver release). Drivers prior to Intel® MSM 8.9.0 could exhibit this behavior. Even though this issue was identified and resolved in a RAID 1 configuration and has not been seen under other RAID configurations, it could be encountered under other RAID configurations.

 

Thanks for the explanation of the reason why 8.9 was dropping drives. This is the first explanation of the failure mechanism that I have seen. Previously, all Intel have posted on this forum was that they could not reproduce the failure, and presumably therefore could not find the problem or its origins. It's comforting to know that the cause has been found and a positive fix applied in 9.6.0.1014. Can I ask the source of the root cause explanation?
0 Kudos
idata
Employee
978 Views

I upgradd to 9.6 about 10 days ago. I have SSD, and raptor pluss 4wd greens in raid. I've had not drive failures, and the system seems robust and less glitchy, but I do have a message that ppears each time on boot: data on one or more volumes is protected from a disk failure. Which is obvioulsy not filling me with confidance.

If anyone knows how I can go about finding out exactly what that means and how I can rectify it, I'm all ears.

0 Kudos
idata
Employee
978 Views

data on one or more volumes is protected from a disk failure

That's actually completely normal, and should fill you with confidence. It's the message that tells you your RAID array is working.

0 Kudos
idata
Employee
978 Views

Ah *slaps forehead*.

thanks!

There is one other thing though - that is in the advanced properties of each member of the riad there is a value - disk data cache.

its enabled on 1 disk and disabled on the other 3.

ANy ideas anyone? Googling this throws up pages about windows data caching or write-back cache enabling - very hard to find info about a member drive having it swtiched on.

0 Kudos
Daniel_T_Intel
Employee
978 Views

It was not until recently that a actual failing system was obtained by Intel to perform the root cause analysis.

0 Kudos
idata
Employee
978 Views

update: Used driver version 9.6 on my another system 1 week now and no drive failures so far.

But still i have random major slowdowns on my system. From resource monitor i can see that, disk activity is ~50KB/s and disk queue is >40 and active time is 100%, system is almost unresponsive for couple of minutes, then all speeds up and continues as normal. Any ideas why?

0 Kudos
idata
Employee
978 Views

Hello impc,

I might need to start my own thread about this issue however ... I have the exact same issue that you have described in your post and I have been trying to resolve the issue for weeks now. The problem can occur at any time and under any workload (Simply checking web mail or even in games)

This desktop I'm using is only about 4 months old and this problem has only occurred for about a month so something must have occurred at this time.

Luckily I have finally been able to experience the issue while running a Data Collector Set in Windows 7 to allow perfmon to export the performance characteristics which include disk activity suddenly crawling to a halt, my disk queue length goes to about 40 - 50 and the active time is pegged at 100% and the system becomes pretty much unresponsive for a couple of minutes and then suddenly the hard disks come alive and the performance goes back to normal until it happens again. Over a 8 - 10 hour period I can expect to experience the problem once or twice an hour on average and it is extremely frustrating!

One of the odd things I have also noticed is that its not only my SATA Raid 0 volume that grinds to a halt, my external USB drive attached via ESATA into the back of my motherboard will also have the vastly increased disk queue length and 100% active time. For a while I was pondering if this was a hardware related problem of possibly the motherboard, memory or physical disks however as this affects an external ESATA disk at the same time I'm starting to lean towards the problem being that the Intel ICHR10 Controller portion of the motherboard is simply becoming "overwhelmed" and unable to keep up.

Over the last few weeks I have update my bios and all drivers / configuration applications (Such as Asus AISuite, EPU-6, ATI Graphics card, Western Digital Disks, JMICRON IDE / SATA Controller) from my motherboard manufacturer just incase however nothing has improved. I have also attempted various different versions of the Intel Rapid Storage Technology program and also driver upgrade / downgrades of the Intel ICHR10 controller but nothing has improved the issue.

I have also left a memtest86+ repeating memory tests over a 3 day period which has not indicated any faults with the memory.

My computer setup is ....

  • Windows 7 Home Premium 64bit. Fully patched from MS.
  • QuadCore Intel Core i7 920
  • Asus P6T
  • 3 x 2GB Corsair XMS3 CM3X2G1600C7 2 GB DDR3-1333 DDR3 SDRAM
  • ATI Radeon HD 5700 Series 1GB
  • Storage Controller Intel(R) ICH8R/ICH9R/ICH10R/DO/PCH SATA RAID Controller - Which is present on the Asustek P6T motherboard.
  • 2 x Western Digital 1500HLFS 10K VelociRaptor SATA disks configured in a RAID 0 configured into the above Intel Storage controller.
  • Corsair 750XT Modular Power Supply.
  • No overclocking has been performed on this PC. If I can get this issue sorted out I may perform small amounts of overclocking in the future.
  • Currently running IRST version 9.6.0.1014 and my Intel ICH10R Sata Raid Controller driver version is 8.9.0.1023

If you could let me know if you find anything out, that would be fantastic!

I'm not sure what else I can attempt

I have also uploaded a CSV file of the performance counters for a period when this problem occurred.

The megaupload link for this is http://www.megaupload.com/?d=MMB38Y5I http://www.megaupload.com/?d=MMB38Y5I

Performance looks fine from 14:38:00 until 14:39:45 when the problem strikes and the symptoms occur until 14:47:25 when the just as suddenly stop and normal performance occurs.

 

I really hope someone out there has some advice for me!

Thanks!

Message was edited by: Bleachy28 - Removed a huge amount of whitespace at the end of the post.

0 Kudos
idata
Employee
978 Views

Hi,

And thanks for input Bleachy28!

This is definitely same issue as i have, it's completely random, but freq. is >1 during 24h.

it's my RAID10 volume, not happened on RAID0 (it's idle most of the time anyway).

Adding my system specs.

W7 64bit ultimate

Core i7 920

Gigabyte GA-EX58-UD4P (ICH10R)

Western Digital 4x 1T in RAID10 & 2x 1T in RAID0

6G RAM

Driver version 9.6.0.1014

Wheee, it happened again, when typing this text here

This issue could be moved to another thread.

Thanks,

impc

0 Kudos
idata
Employee
978 Views

Sounds like the same issue I describe /thread/12917?tstart=0 here with the 9.6 series drivers on my machine. Glad to hear it's not just me; disappointed as the drivers apparently have had one such root cause fixed (another post elsewhere here describes a fix appearing first in the 9.6 drivers).

Is the Win64 commonality important I wonder? Motherboards differ, as do the Intel controllers, CPU config and hard drive types...

Anyone from Intel able to replicate this using any of your test hardware config'd with Win64?

BTW - as referenced in my post, I'm now back with the 8.8 series driver which is working a treat. No problems since install...

0 Kudos
idata
Employee
974 Views

Hi, again

I have solved my random "freeze" issue.

I exchanged my WD green drives with WD RE3, and all is working fine now.

Thanks everyone!

0 Kudos
idata
Employee
978 Views

I have just updated a P35 / ICH9R based system (Asus P5K Dlx) running Win7 64-bit from using the in-box driver (8.6.2.1012) and the manager from 8.8.0.1009 to 9.6.0.1014 (driver and manager). The original driver and manager was not uninstalled first, the new version just executed - the old version was removed and the new one installed, followed by a reboot.

Refer to posts 1 & 6.

This works fine with no problems at all. The only slightly annoying thing is the pop-up which informs you that the array is protected against losss of data.

0 Kudos
idata
Employee
974 Views

I hope someone can help! Please!

I've gone from 8.9 to 8.8 and then about 3 weeks ago to 9.6.0.1014

Everything fine except today:

I got a message in windows from the storage manager saying one drive had failed (the same drive that always failed under 8.8. HOWEVER:

this time on reboot ALL of the drives show up in the IMSM bios as "OFFLINE MEMBER"

Never having had this before I have no idea where to go next. DO I create raid volume at this point? Are the disks really offline - I mean is it possible there is a hardware issue?

EDIT:

OK, I pulled out the failed drive, testing it on another computer in WD's diagnostic: always comes out fine, though.

Its always the same port that fails for me - anyone else find this?

I've NEVER had another port fail always port 2. Is it possible the sata port is dodgy?

0 Kudos
Reply