Hi, everyone... I'm desperate for help here. I'm running an Intel DP55KG motherboard with an i7-860 processor. 4GB of memory, dual channel, running at 1333mhz. All multipliers and bus speeds are set to default values.
I have run into a brick wall, in that for days I could not get past a BSOD loop, with the MACHINE_CHECK_EXCEPTION error, STOP code 0x0000009C. From what I have read this seems to be hardware related. The computer and everything in it is only 3 months old.
For my OS I'm running Windows 7 x-64 version. I have four hard drives, two of which are in RAID-0, the other two in standard single-drive mode.
After hours... and hours... of troubleshooting (I am a computer tech, so I tried all of the standard stuff already)... This BSOD was hardcore, it would BSOD trying to enter safe mode, recovery/repair mode... whether run from the HD or booted from rescue media, same result.
After pure luck (well, after a lot of time and troubleshooting), I realized that if I set my (4 core) i7-860 to use only 2 of the 4 cores, the error ceases to happen. Hyper-threading can be enabled without trouble... but as soon as I change it from 2 to ALL cores, back to the same BSOD.
I have run both the memory test utility provided with Windows 7, the original 'memtest', and memtest86+ -- with no memory errors after several passes. I have attempted to remove all extra PCI bus hardware, and dropped the 4GB (2x2GB) down to 1 stick, and exchanged them out, to eliminate it being a back stick -- same result... hard drives pass SMART tests. thermal values are within normal range (58c-62c thermal *margin*), with case temps being around 38 degrees at both board thermal sensors.
All fans are working properly, and in fact, I currently have FIVE 120mm fans, plus the processor fan. I have been running this same config since I built the machine around early September of this year, without a hitch.
I will mention the problem originally started after a random thermal trip, denoted in the BIOS logs (only happened once). I have tried disabling everything on-board, removing any non-essential hardware, etc... and the only way I was able to get anything accomplished was through Hiren's boot CD, which ran fine. Voltages on all the rails are pretty spot on, 12v, 5v, 3.3v -- nothing strange hinting towards power problems.
I have been struggling with Intel's Desktop Control Center and event log errors about (what appears to be shoddy code, lots of .NET problems it looks like, lots of 'can't connect' errors, etc. I reinstalled this software and updated my chipset software in the period prior to the reboot, which could leave the *possibility* of a software issue. Something makes me feel fishy about the Desktop Control Center, as it has always been flaky, and is used to overclock, monitor system health, etc... and could be involved in throwing errors (or this is the result of a latent one?).. I have uninstalled it -- regardless, I still see remnants of at least two services left behind and running (and who knows what else), named IOCBIOS.SYS and a TURBOB.sys (both signed as "Intel Extreme Tuning Utility").
I have used Sysinternals "autoruns" and have disabled any extra or possibly problematic shell extentions or services that no longer exist -- (though I have no disabled the two Intel Control Center ones, for fear of what havoc that may cause).
Last, but not least, I have unplugged all CD-rom drives, and all hard drives not linked to the OS/boot, tested both sticks of memory 1 at a time, swapped them, stood on one foot and held my breath....
So, my question to the more experienced is... where do I go from here? What further troubleshooting can I do? Memory checks out fine, I am able to boot into OS with only 2 processors activated and run OK, and I allowed (once I could boot) the (Windows 7) recovery process to run, which gave me this dump info:
Event Name: StartupRepairOffline
Problem signature 01: 6.1.7600.16385
Problem signature 02: 6.1.7600.16385
Problem signature 03: Unknown
Problem signature 04: 21350134
Problem signature 05: AutoFailover
Problem signature 06: 1
Problem signature 07: BadDriver
Which was followed by an ominous sign, which I have found to usually be caused by bad memory in my experience:
Instruction at 0xFB39584D referenced memory at 0x00000008; the memory could not be read.
This popup only happened one time... but that "referenced memory" location seems extremely low... almost as if it were base memory or cache memory or something "low level" like that... (though I could be mistaken).
This is about the best information I can give you all, as it is all I have been able to find... but I do know one thing... it is something that is being tripped when those extra 2 cores are enabled, with 100% certainty. I'm really hoping it's not processor/board problem, as said... brand new equipment. what a shame, i have been working to fix this day and night since it happened.
Os Version: 6.1.7600.2.0.0.256.1
Locale ID: 1033 (English)
Lastly, I still do want to partially blame the Intel Desktop Control Manager, as it has been giving me errors in the event log like this for quite some time, similar to these, but these seemed to be some of them that were occurring right before it all went downhill, just a ton of unhandled exceptions and basically "I don't know what to do's" or "I'm having a hard time reading this value, etc"... here are just some of the event log entries (all with eventid of "0".. helpful...)
2010-01-09, 17:19:18.0527859 : Error : Unhandled exception detected while executing virtual device command response.: IdOfEvent: 384 | IdOfItem: QPI_FREQUENCY_MONITOR | Status: NoError | Return Value: Intel.PerfTune.DeviceData.DerivedMonitorItemData`1[System.Single]
2010-01-09, 17:23:26.2128751 : Warning : Failed to load the config file. Using internally-defined default configuration settings.
2010-01-09, 17:23:48.8875942 : Warning : Hardware monitoring subsystem failed to initialize a virtual device.: QPI_FREQUENCY_MONITOR
2010-01-09, 18:52:55.4728341 : Error : Remoting Client is Unresponsive with Exception: 1
2010-01-09, 20:06:29.5478794 : Error : Unhandled exception detected while executing virtual device command response.: IdOfEvent: 153100 | IdOfItem: MCH_TEMPERATURE_MONITOR | Status: ErrorUnexpected | Return Value: NULL
2010-01-09, 20:06:40.5175068 : Error : Unhandled exception detected while executing virtual device command response.: IdOfEvent: 153135 | IdOfItem: CPU_CORE_TEMPERATURE_MONITOR | Status: ErrorUnexpected | Return Value: NULL... etc, etc.
basically it keeps going through the components, saying unhandled exception from all the thermal monitors, voltage monitors,
2010-01-09, 20:19:10.6517305 : Error : Unhandled exception detected while executing virtual device command response.: CDV(READ_ITEM, 153157, UNCORE_SPEED_MONITOR) : Inputs=[ (HOST_CLOCK_FREQUENCY,) (UNCORE_MULTIPLIER,18) ] := Error calculating derived value!
then the last events that seem to have happened before the ominous event appear to be:
2010-01-09, 22:07:11.5624686 : Warning : Failed to load the config file. Using internally-defined default configuration settings.
2010-01-09, 22:07:17.0380783 : Error : Could not load last known go...
Personally i would reinstall win 7 but 32 bit with only one HDD in the system no RAID. Other than that i think you might need to get hold off another CPU or board so you can crosstest em....
yeah... that's what i am thinking too... unfortunately, i don't have an extra (SATA) hard drive around here I can experiment with this on, all I have are IDE's (which of course this board has no PATA support on-board...) and getting another CPU or board is not an option for me right now... as I don't have the money or availability
and the RAID itself is 500GB, quite filled (though I'll admit with ripped DVD's and stuff that I own, so I could probably delete them... but if it didn't fix the problem, i'd want to re-create my system how it is.. not sure how well disk imaging apps work on RAID volumes. I know Acronis will do it (used to use that at work), but I don't have a copy here, and it had a bad habit of creating images that were the size of the entire array, even if i only chose to image one partition on that array (say, my Windows partition is 50GB, and the rest of the array is 450GB. I chose to only backup C: -- it would actually create a file size that was 500GB.
though this may be my mistake, i could be recalling when you use the built-in function of converting native Acronis .TIB files -> .VHD files (new Windows backup image files, or also virtual hard drive images -- Microsoft's format).
but digressing from that -- I think this is my next step is to get a new hard drive, set it to just normal IDE or AHCI mode and try a re-install, and see what happens.
my only concern is this: remember in my first post (you may have easily read over it), i mentioned that the blue screen happened even when I booted from the Windows 7 rescue CD. In my opinion/idea, if it were booting from CD, it should be reading boot information, drivers, etc... from that CD, correct? That simple thing, in itself, makes me concerned... but at this point I don't have much choice, huh?
My main concern right now about simply leaving this in 2-core mode for the time is data corruption possibility. I continue working, then find out some weeks or months down the road that my data is corrupted. I have tested this out with MD5/SHA1 checksums; copying files across disks and/or downloading large files, then checking against an online known good checksum.. and they match... but I still have my concerns.
Looks like I need to 1) get another hard drive (hate to do this, I'm out of work right now, and am trying to not spend any money) and 2) talk to Intel some more, as last I spoke to them, they recommended pulling the CMOS battery and letting that clear, then see if it fixed the problem, which it did not. Also back-to-BIOS switch did not work to fix the problem. Last thing I have not tried is re-flashing the BIOS, but i doubt this was corrupted somehow.. oh, and trying to rollback drivers on all of the devices I manually updated during my "chipset driver update".
Thanks for the response though... appreciate it. If you or anyone else gets any more ideas, please -- by all means -- post them for me, please.
p.s. the main reason I am using x64 version was that I planned to get another 2x2GB kit in the future (to bring my memory from 4GB to 8GB, but that may be held off for a while). From what I have read and heard, there is little difference or benefit for x64 OS right now, other than high-end apps that actually utilize it, like... CGI rendering apps, or CAD type stuff.. not your average user; actually, if anything it seems to be *more* of a problem due to incompatibility with legacy (or even current) stuff that play well with x64 OS's.
Please provide detailed specs of your machine (especially the memory and PSU). This motherboard does not like memory that requires above 1.6 volts. I would also state that you should be using MSM 8.8 or Intel Rapid Storage Technology drivers rather than the latest MSM 8.9.xxx version. Lots of issues with that version and RAID 0,1, and 5.
sure thing. here is what info i can give you:
motherboard model: intel brand DP55KG
processor: intel i7 860 (2.8ghz, turbo mode step up to 3.47ghz)
cpu voltage max it looks is set to 1.2v, standard 21x multiplier
turbo multipliers being:
current max (a) = 89
power max (w) = 95
Turbo mode: on
cutting these off then setting cores to ALL makes no diff.. still get blue screen.
memory: gskill ripjawz 2x2gb kit (4GB total, dual channel, both in blue slots)
memory is only using 1.5v; i specifically got this memory because of reading the voltage requirements
memory is running at 1333mhz speed, with 10x multiplier -- using 1.5v, timings 9-9-9-24, command rate T2
pci memory freq as said 1333mhz
pci express bus 100mhz
pci bus 33.33mhz
PCH voltage 1.03v
base clock freq: 133.33mhz (factory)
uncore voltage: 1.1v (factory)
cpu voltage using about 1.0v (dynamic, adjusted of course to cores and load... factory)
in fact, all voltages and bus speeds are at their factory defaults.
power supply voltages all in the "green" on 12v, 5v and 3.3v rails, pretty spot on:
12v showing as using 12.4v right now, 5v pulling 5.04v, 3.3v pulling 3.31v
there are 3 case fans, and 1 processor fan of course... all running fine.
cpu core temp runs about 65c MARGIN -- so around 35c actual core temp, ambient showing 36c, and PCH temp showing as 33c. -- similar temps when all cores were running -- about 62c margin, and around 36c for the other sensors.
i am indeed using MSM 184.108.40.2063, as i was a little concerned of upgrading to the rapid storage drivers (with a jump into the 9.x version numbering, i believe?). i know it is supposedly just a 'name change', but with the ROM being called matrix storage and being at 8.9, i felt it was best not to upgrade and cause more problems -- so this could be 'something to try' as long as it doesn't mess up my RAID -- which btw, is 2x250gb drives in RAID-0, then another two 1tb drives in regular mode (not in an array)
chipset drivers at 220.127.116.113... upgraded them to... i think it was .1023, but did a 'rollback driver' on them after i started having this problem.
and of course have all the latest drivers, included trying both using say, nvidia's latest, and asus's latest driver (using driver sweeper in between the installs).
i'm working with intel still, but we've not reached a solution. i have upgraded to the latest bios (the one which kills bluetooth), which did not make any difference. have also blanked out CMOS, reset to factory defaults, done BIOS restore... i've done it all nearly it seems.
also, forgot to mention... lastly.... OS is Windows 7 x-64.
any ideas? i suppose you'll recommend upgrading to the latest storage drivers... any other thoughts? remember -- this happens only when i enable the second 2 cores on the processor. when i manually set it to use only 2 cores -- no problem.
any other information feel free to ask. the power supply is plenty... 700w rated, up to 830w i think, and all i have extra in here is a PCI-e video card (which I don't even game with), and of course the 4 hard drives.
my thoughts seem to be that the problem lies in one of those 2 extra cores or parts thereof, but just my personal thought. any help at this point is welcomed. thanks for helping out.
just to follow up on this, in case anyone else ever runs across this error in the future... the part that was bad was the CPU. I RMA'd the CPU, got the new one back, popped it in, and everything worked perfectly -- so it was absolutely a bad CPU/core causing this bluescreen.
hope this helps someone, i spent... countless hours troubleshooting every part, configuration, trying this and that to fix it and i was at my wit's end. i felt fairly confident that it was the CPU (second choice was gonna be the motherboard), so i had to just try it out and thankfully i was right.
If anyone is still following this post please preovide help:
Computer recently getting BSOD. Same error reported by Chinch. QPI_Frequency_Monitor.... error.
I ran MEMTEST on the memory and came back with Zero errors (ran it over 8 hours). Computer increasingly crashed after a few minutes of use.
I then turned OFF two of the four cores on this i5-750 chip in windows 7. The computer was stable for over 20 minutes. I ran the Intel Extreme Stress Utility. had ZERO errors with the CPU (5 minute test) but as soon as it ran the memory test it came back with a FAIL. (not sure why, if Memetest had no errors)?!?
I am currently running Memtest again.
Need help. Is the CPU bad, if i can get it to run on 2-cores and not 4?
SPECS: Nothing Overclocked.
ASUS motherboard: P7P55D-E (bios: 1601)
Windows 7 64bit
PSU: Corsair 650TX
Memory: Corsair TW3XG1333C9DHX (4GB)
Intel i5-750 (not overclocked) (currently running on 2-cores)
Graphics: EVGA nVidia 9500GT
I ran Memtest again had No Errors.
Fixed voltage settings on memory according to ASUS motherboard specifications.
Ran Intel Test again (with only 2-cores) Had no Errors with CPU or Memory.
Turned all cores on (4-cores) Ran Intel Test again for 15 minutes on CPU and 5 on memory. CPU was fine but Memory had errors before the 2 minute mark.
So, I ask again, does this mean there is something wrong with the CPU. I'm currently running a longer CPU test and will see if it freezes or crashes.