We are experiencing random, occasional but catastrophic array corruption on two servers that are on test before being moved to a hosting centre.
SuperMicro SuperServer 6017R-TDLRF 1U Server
Incorporating SuperMicro X9DRD-LF Motherboard with Intel C602 Chipset with latest BIOS.
64GB ECC RAM
1x Xeon E5-2630v2 CPU
2x Intel DC S3700 800GB SSD Drives in RAID 1 (Mirror) on RSTe Hardware RAID.
Windows Server Enterprise 2008 R2, fully updated.
Under heavy load, after a random period of time, often when doing a Windows backup, the array corrupts and the following event log messages are generated. There are varying quantities of each message...
Event ID: 55
The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume VMs.
Event ID: 12289
Volume Shadow Copy Service error: Unexpected error CreateFileW(\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy25\,0x80000000,0x00000003,...). hr = 0x800703ed, The volume does not contain a recognized file system.
Please make sure that all required file system drivers are loaded and that the volume is not corrupted.
Event ID: 136
The default transaction resource manager on volume E: encountered an error while starting and its metadata was reset. The data contains the error code.
A chkdsk on a corrupted volume shows hundreds of lines of errors. I can post these two, but I do not think the exact errors are relevant, as they vary each time. They include:
The object id index entry in file 0x19 points to file 0x174c
but the file has no object id in it.
I am really sorry for your trouble but let me help you with this.
All the information you have provided is very complete and helpful for our understanding and investigation process. However, there are some other questions I want to ask you for me to have a better picture of the issue:
1. The Intel DC S3700 SSD firmware version is 5DV10270.
2. The two Intel SSDs are connected directly to the two Intel Chipset RAID ports. There is no external RAID card. It is Intel C602 chipset RAID. See my first post.
3. This is a Windows Enterprise Server and is running Microsoft Hyper-V. There are four VMs, but none have direct disk access. The corruption is beneath the logical disk level, so I do not believe that anything within the OS can be doing this other than the Intel RAID driver. Also, the only thing we can change that makes the problem go away is the Intel RAID driver version. When the array corrupts, one drive is intact and the other is scrambled - this points at driver or firmware to me. My software developer hat makes me think driver race condition.
4. Yes, we have tried the iRSTe driver provided by SuperMicro - this corrupts. We have also tried an update supplied by SuperMicro - this corrupts. We have also tried all versions of driver downloadable from the Intel site. These all corrupt, except 22.214.171.1243. However, SuperMicro are simply supplying the Intel driver, as I would expect them to. Boston, the UK distributor for SuperMicro have recently supplied us with several more driver versions between 3.6 and 3.8, as we have offered to pinpoint the driver version where the corruption began. We will do this as an assistance to Intel in solving the problem, if we know that the information will be used - would you find this useful? Please confirm, as these tests are all further time and money to my company. However, the fact still remains that the current driver version corrupts.
Thanks for the information.
Please note that our drivers are generic drivers for OEM Systems like SuperMicro. This is because they create their own software and special drivers for their units.
Have you tried the IRSTe driver version provided by SuperMicro?
>Please note that our drivers are generic drivers for OEM Systems like SuperMicro.
>This is because they create their own software and special drivers for their units.
So the OEM drivers should work then?
But perhaps not have extra features that SuperMicro implement?
I've never been told that an OEM driver will not work before, and I've been in the business for some time.
But I can see how that might cut down on your support overhead :-)
>Have you tried the IRSTe driver version provided by SuperMicro?
You asked this in question number 4 above and I answered above.
Summary of reply to question 4: The problem is the same, whether using SuperMicro supplied drivers or Intel supplied drivers.
This is why it is logical to conclude that since the problem is present in both drivers versions, the problem is in the core code.
But I can see that someone higher up in Intel is listening, as the later driver versions have just been pulled from the Intel download centre!
I'm glad to know I'm getting somewhere, even though it might at first glance appear otherwise :-)
Beng Hons Information Systems Engineering,