Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

[CRITICAL] memory data corruption with GM965 on Dell machines

7oby
New Contributor II
15,236 Views
current workaround (if no bugfix BIOS available):

use intel display driver 7.14.10.1272 (v15.2.6) dated 05/11/2007 or older

bugfix BIOS (yet only released for the following systems):

Dell Inspiron 1420 : Update BIOS to >= A09 (BIOS A09 here)
Dell Inspiron 1520 : Update BIOS to >= A09 (BIOS A09 here)
Dell Inspiron 1525 : Update BIOS to >= A09. Verify VBIOS >= 1566
Dell Inspiron 1720 : Update BIOS to >= A09 (BIOS A09 here)
Dell Latitude D530 : Update BIOS to >= A07 (BIOS A07 here)
Dell Latitude D630 : Update BIOS to >= A12 (BIOS A12 here)
Dell Latitude D830 : Update BIOS to >= A13 (BIOS A13 here)
Dell XPS M1330 : Update BIOS to >= A12 (BIOS A12 here)
Dell Vostro 1400 : Update BIOS to >= A09 (BIOS A09 here)
Dell OptiPlex 330 : Update BIOS to >= A05 (BIOS A05 here)

still waiting for fixes for at least these notebooks:

Dell Vostro 1500

I don't know whether these notebooks are affected as well, nor whether a BIOS update is being prepared:

Dell Vostro 1510, 1710, 1700
Dell Latitude ATG

List of notebooks NOT affected:

Dell Vostro 1310

Bottom Line:

I have sufficient reason to belive in some memory corruption bug with the X3100 drivers v15.8 + v15.7.3. I do not think it affects your hardware, but I have the problem. There is also the chance that my GM965 chip is faulty. In addition there is the chance that my bug is not related to the gfx drivers, but just emerges with those two particular gfx drivers and is caused by some other faulty driver or device.

I filed a bug report at intel, with the outcome that they haven't reproduced anything like that. Since I found people with bugs that sound very similar (see below), I want to raise awareness in this forum and help other people google it (if they encounter the same problem). I will updates this posting if I get more information or by some magic it's not the drivers fault.

My walkaround is to use the slightly outdated 7.14.10.1253 of the GM965 gfx drivers. It does not show up in this version.

Selected Details:

I've got a fancy Dell M1330 notebook with intel X3100 graphics integrated into the GM965 chipset. Dell ships this Core2Duo notebook with 7.14.10.1253 of the graphics driver version runnning Vista Business 32Bit @2GB memory. It runs perfectly except that sometimes it doesn't recognize HDMI attached video beamers and big monitors.

Thus I tried:
7.14.10.1437 (v15.8)
7.14.10.1409 (v15.7.3)

Both fix the external HDMI attached device bugs.

However I encounter a severe bug: Memory corruption!
Unfortunately it occurs very infrequent and my test setup to detect memory curruption in this case requires reading big files from harddrive in a loop and checking their md5sum. In general I have to read ~50 - 1000 GB of Data u ntil I encounter a single md5sum error.

The whole course of isolating this bug and accounting the gfx system is not that interesting here. Let me just mention that I threw all the memory tests (incl. the brand new GM965 supporting MemTest86+ V2.01), changed memory modules, CPU stre testing at every single P-State, Harddrive checking (incl. different drivers such as IASTOR.SYS 7.8.0.1012, 7.0.0.1020, MSAHCI.SYS 6.0.6000.20765). I do not encounter these bugs running ubuntu 7.10 or Mac OS X 10.5.1 on the same hardware reading the same files. It took me every evening the last week and some nights to do a thorough test. The bug seems not to be file/harddrive related - just my testing setup is built this way, since it gives me an easy way to validate some of the main memory content.

I am extremely careful if it comes to making hardware or software responsible for a given bug and try to think of every possible inference.

Certainly while switching back and forth with the drivers above I did not change anything else, after having isolated the bug. At that point I tried to find evidence that I'm not the only person. And I found:

Posting #161 (X3100 on MacBook):
"Our assessment of the error is that a bug in the graphics driver code is causing memory corruption within the data used by WoW under some circumstances, leading to a crash."
http://forums.worldofwarcraft.com/thread.html?topicId=3686740801&sid=1&pageNo=9

This is written by a WoW/Blizzard employee. Well, but it's 3D and it's Mac OS X!

There is code sharing amoung different intel drivers (Linux, Windows, Mac OS X). But the same crash also occured using WINDOWS by means of Boot Camp on the same hardware! posting #182:
"heres my windows crash report on the same macbook"
http://forums.worldofwarcraft.com/thread.html?topicId=3686740801&sid=1&pageNo=10

Regarding 3D: I'm using Aero, but the errors occurs with Aero being enabled and disabled.
0 Kudos
70 Replies
levicki
Valued Contributor I
3,765 Views
taoshen1983:
Nobody should be running less than 4GB of ram on their new laptopsand 64bit OS today(even for 32bit backward compatibility.)

taoshen1983, are those your words? I added emphasis to the part I am having an issue with — what you said basically means that when buying a new laptop people should shell out for 4GB of RAM whether they need it or not.

taoshen1983:
First of all, your point that 800 dollars is someone's 2 months worth of salary is irrelevant since we are not talking about the cost of the laptop, but the cost of adding 4GB of memory.

On the contrary, it is very relevant — $80 is 10% of a laptop cost as you say, but it is also a 20% of someone's $400 monthly salary. Besides, do you really think that people would not buy a laptop with 4GB installed to begin with if they had enough money to spare? Please use your brain cells.

taoshen1983:
Once you hit the swap in Vmware or MySQL, you are screwed. The fact that you say i am self-centered for recommending a 4GB configuration is baseless.

Wrong again. 90% of people do not use MySQL and VMWare so you are self-centered for taking those as an example of a typical laptop workload. Also, your tone ("Nobody should be running...") doesn't sound recommending but demanding instead. Perhaps it is the language barrier?

taoshen1983:
Lastly, if I am too soon to jump to conclusions, you are an Intel employee located outside of United States.

You are jumping indeed... I am not an Intel employee, just a regular software engineer dealing with code optimization who by sheer misfortune lives in a poor country with a lot of poor people around — that allows me to also see things from a perspective different than your own.

taoshen1983:
And Bill Gates did say 640K is enough. You can google it. Igor

So can you, but seems you didn't — here, look under Misattributed.

Getting back to the issue at hand — if I were you, I would at least try the development branch of Compiz to see if it improves anything and make sure they are at least aware of the issue if it doesn't. As for the Dell drivers, nobody is forcing you to use those — go here, and get the latest Intel drivers for GMA965.

0 Kudos
jenzd
Beginner
3,765 Views
Yesterday, Dell released a new BIOS version (A11) for the XPS M1330 notebooks. Has anybody with such a machine already testet, if it fixes the problem there? Unfortunately, the changelog mentions only an "Added enhancement for thermal control", which does not sound very promising.
Concerning my own machine (Latitude D830), there have been no updates since 5/22/2008. By now, I am really getting impatient ...

Regards, jenz

0 Kudos
7oby
New Contributor II
3,765 Views
jenzd:
Yesterday, Dell released a new BIOS version (A11) for the XPS M1330 notebooks. Has anybody with such a machine already testet, if it fixes the problem there?

Unfortunately the A11 BIOS does NOT contain the fix. I tested it (and I knew by reading the changelog).

I have a small script that gives me every morning the new BIOS versions:

ftp -A -s:get_dir.txt ftp.dell.com >updates.txt
diff baseline.txt updates.txt

Therefore I'm watching D530/630/830, Vostro 1400 etc.

I was told the fix entered release pipeline, but as we can see: It didn't or got delayed. It's more than 4 weeks ago since a fix was known.

Thus I got back in touch with Dell and they will update me on the expected release date.
0 Kudos
7oby
New Contributor II
3,765 Views
BIOS update A12 for Dell Latitude D630 is supposed to fix memory corruption!

Please test and report! Report the Video BIOS version please (before and after). The Video BIOS is a four digit number displayed inside the BIOS Menu in one of the upper sections. You can also access the number through igfxcfg (System Control / Intel GMA Driver for Mobile) selecting "Information" or "i".

Fixes/Enhancements
------------------
1. Updated Intel Video BIOS.
2. Enhancement for thermal control.


Direct link:
http://ftp.us.dell.com/bios/D630_A12.EXE
ftp://ftp.dell.com/bios/D630_A12.EXE


0 Kudos
yuriylsh
Beginner
3,765 Views
Do you have any idea when this BIOS update will be available for XPS M1330?
0 Kudos
7oby
New Contributor II
3,765 Views
yuriylsh:
Do you have any idea when this BIOS update will be available for XPS M1330?


Please read this posting (especially the note 06/23/2008):
http://softwarecommunity.intel.com/isn/Community/en-US/forums/permalink/30257484/30255221/ShowThread.aspx#30255221

Dell is definitly working on it and I expect a release very soon. I received a testing bios, which has this bug already fixed. I'm not allowed to share this one though. Instead I'm pushing Dell to release a public fix. Meaning it looks they got more customer complains about this issue, which might speed up the process.
0 Kudos
yuriylsh
Beginner
3,765 Views

Great!

Thank you so much for your effort to speed up the process!

0 Kudos
thefunks67
Beginner
3,765 Views
I assume there is no Vista 64 bit version of these for a 1330:

7.14.10.1272 (v15.2.6)
7.14.10.1268 (v15.4.1)
7.14.10.1255 (v15.4)
7.14.10.1253 (v15.2.4)

-Funk

0 Kudos
7oby
New Contributor II
3,765 Views
thefunks67:
I assume there is no Vista 64 bit version of these for a 1330:
7.14.10.1272 (v15.2.6)
7.14.10.1268 (v15.4.1)
7.14.10.1255 (v15.4)
7.14.10.1253 (v15.2.4)


I found these:

7.14.10.1255 (v15.4) for Vista64
7.14.10.1253 (v15.2.4) for Vista64

Basically what you can do to get more updated drivers is to search the support homepages of HP, ThinkPad etc. for their professional products and use their drivers. Only the professional product line will feature Vista64 drivers.
0 Kudos
haidu
Beginner
3,765 Views
I think that there are some serious problems at dell support department. They posted a "new" GM965 driver on their site ( 7.14.10.1409) which of course has the bug. I wasted another 15 min to test the driver. It is unbelievable that they do nothing for fixing the bug but instead they agravate it.
0 Kudos
7oby
New Contributor II
3,765 Views
haidu:
They posted a "new" GM965 driver on their site ( 7.14.10.1409) which of course has the bug. I wasted another 15 min to test the driver. It is unbelievable that they do nothing for fixing the bug but instead they agravate it.


You're right Dell's updated driver is just 7.14.10.1409 (v15.7.3) dated 01/11/2008, which has this bug.

It's now ~7 weeks since I received a bugfix from Dell and they didn't release it yet to public. The last weeks I did asked for status update several times:
http://softwarecommunity.intel.com/isn/Community/en-US/forums/permalink/30258498/30255221/ShowThread.aspx#30255221

For some reason the Dell processes in this case didn't work out the way they are supposed to work out. Recognizing that I just wrote an e-mail to Dell support and asking for an escalation strategy on this issue. If I don't receive an adequate answer shortly, I'll use other channels.

reply from Dell received 08/07/2008:
Dell:

Mr. XXX is currently out of the office until july XXth.
I see the issue has been escalated to our product/engineering group, which is responsible for implementing updates. They already are working on a solution for M1330 and D830 too.

0 Kudos
7oby
New Contributor II
3,765 Views
Here you go! Install this BIOS and data corruption with recent Intel drivers is gone:

BIOS A12 for M1330:
http://ftp.us.dell.com/bios/M1330A12.EXE

This is the important fix from the release log:
"Update GM965 Graphics VBIOS from 1466 to 1588"
0 Kudos
7oby
New Contributor II
3,765 Views
Final fix for Vostro 1400 released:
http://ftp.us.dell.com/bios/1400_A09.EXE

Includes VBIOS 1588 as well. Now waiting for D830, ...
0 Kudos
phaedurs
Beginner
3,765 Views
And the Vostro 1500 :(
0 Kudos
7oby
New Contributor II
3,765 Views
phaedurs:
And the Vostro 1500 :(


I included a list of notebooks, which are still waiting for a BIOS update in the top posting. There are some notebooks, whose status I can't tell e.g. whether those are affected by this bug or not. If you have any information regarding those, please post.

If you want to speed up the process of making a bugfix BIOS available, please file a bugreport at Dell technical service. They most likely will ask you to run Dell diagnostics and suggest all kinds of hardware exchanges (motherboard, memory, harddrive). Run Dell diagnostics, but don't agree on (wasteless) hardware exhanges. Emphasize how critical this bug is (data loss, data corruption, ...) and that it helped for other Dell notebooks to provide this BIOS fix:

"Update GM965 Graphics VBIOS from 1466 to 1588"

Then a couple of things will happen:
. Dell will agree on this being a problem and a solution. If they don't asked them to replay the memtest scenario and see it fails on their machines.
. Dell will get awareness of this bug and/or change priority regarding this bugfix.
. If you're lucky Dell will inform you about the ETA.

The priority Dell gives to this bugfix will most likely depend on the quality of you bugreport and the total number of bugreports they receive regarding this bug. The more, the better, the faster.
0 Kudos
levicki
Valued Contributor I
3,765 Views

7oby,

I have just noticed that Dell Inspiron 1525 latest BIOS is A13. VBIOS however, is still 1566 just like in A09 and A11.

Do you have any information whether 1566 is an OK VBIOS version or should we also pursue 1588? In addition, does 1566+ work with any drivers or I should look for any particular version?

0 Kudos
7oby
New Contributor II
3,765 Views
Inspiron 1525 works regarding this issue with BIOS >= A09. Yes, A09 - A13 do contain VBIOS 1566, but this is sufficiently recent.

Windows XP isn't affected anyway (even with older BIOS versions with older VBIOS). And regarding Vista you're fine with any BIOS >=A09. I'm using the most recent Vista driver (v15.9) and this should work for the Inspiron 1525 as well. You just have to point manually to Graphicskit13330.inf in device manager - otherwise generic intel driver won't install on OEM notebook.

Just to give complete information: If you have a notebook for which no BIOS fix is available and old drivers have issues on your notebook, you may also install Windows XP drivers into Vista. This is supposed to work as well.
0 Kudos
levicki
Valued Contributor I
3,765 Views

Thanks for the detailed answer 7oby. I already knew that Windows XP is fine with any VBIOS / driver combo since that is what I am currently running, I was just wondering what is safe for those who might want to install Vista on it.

I am pretty much annoyed that we the users have been turned into free hardware beta testers these days. I mean, beta testing software and even drivers is ok — you read the EULA, click on that "I agree" button and risk it if you want so.

What is not ok in my opinion is making hardware products which weren't sufficiently tested and which have buggy BIOS or firmware. You pay for the final product and it is not what you are getting. For beta software and drivers you expect failures, but for the hardware you dearly paid for such as laptop you don't, and since you are not prepared you always end up with some loss, be it data, money, time, nerves, etc.

In my opinion industry should be reminded that in the days of PC/XT BIOS chip was soldered to the mainboard and it wasn't programmable, yet those computers worked just fine.

0 Kudos
7oby
New Contributor II
3,748 Views
I know some people are subscribed to this thread and are waiting for this one. Here you go:

Latitude D830 BIOS fix:
http://ftp.us.dell.com/bios/D830_A13.EXE

1. Updated Intel Video BIOS.
2. Updated Nvidia Video BIOS.
3. Improved support for 4GB memory.
4. Improved PXE support.

0 Kudos
taoshen1983
Beginner
3,748 Views
IgorLevicki:

Thanks for the detailed answer 7oby. I already knew that Windows XP is fine with any VBIOS / driver combo since that is what I am currently running, I was just wondering what is safe for those who might want to install Vista on it.


I am pretty much annoyed that we the users have been turned into free hardware beta testers these days. I mean, beta testing software and even drivers is ok you read the EULA, click on that "I agree" button and risk it if you want so.


What is not ok in my opinion is making hardware products which weren't sufficiently tested and which have buggy BIOS or firmware. You pay for the final product and it is not what you are getting. For beta software and drivers you expect failures, but for the hardware you dearly paid for such as laptop you don't, and since you are not prepared you always end up with some loss, be it data, money, time, nerves, etc.


In my opinion industry should be reminded that in the days of PC/XT BIOS chip was soldered to the mainboard and it wasn't programmable, yet those computers worked just fine.



Good Job, Dell/Intel Engineers. A09 bios is the fix. Thanks

As for you Igor:
It is obvious that the hardware is working and you fail to see that even working hardware could have buggy firmware which is software. Today's definition of hardware is becoming blurred against software since all hardware must be written in HDL(hardware description language) first. Before the ICs are fabricated, hardware IS software. So yes, we are all beta testers for all the hardware we buy now just like software. I am surprised to see you, a software engineer, arguing that beta testing is not acceptable.

As for the PC/XT BIOS chips and non programmable firmwares, we have to remember that the complexity of hardware is growing at the rate of Moore's Law, meaning that transistor count of today's hardware is more than 1000 times the complexity of 90s hardware. However, engineers haven't gained 1000 times worth of brain matter. Instead of having a brain 1000 times the complexity, we spend time on the internet arguing that 4GB of ram is too much for a laptop. So I am not sure industry should be reminded of the old days. They are trying as hard as they can. This one is a matter of size of regression.

For this case, Dell finally pulled it off after like a year. So congrats, Dell, and Intel. Lesson learned: don't stop bugging the people who are responsible until they take notice.
0 Kudos
7oby
New Contributor II
3,748 Views
@taoshen1983

This thread is not meant to be cluttered by useless and false information.

I can assure you that we engineers today have tools which are more than 1000 times more productive than the tools available a couple of years ago. That applies to hardware and software and for each of them to both the research as well as the practical side.

On the hardware side great advances were made in model checking technics. Theoretic achievements as well as processor power available today allow to validate designs nobody ever thought about before. For designs that are still to complex to model check another Moore (namely J Strother Moore) used a theorem prover to verify the correctness of the AMD5k86 FPU (*).

Similar tools have been pulled to the software side such as Microsoft Static Driver Verifier. But the main technique to cope with all this complexity has been to seperate concerns and use abstractions. And to maintain abstractions we have never had such powerful refactoring tools as of today.

Here we are talking about bugtracking and fixing. And again: We had never such powerful tools as of today. I give you an example: Today I used bisecting to find the change in the code that introduced a bug in the just released 2.4.0 xf86-video-intel Linux driver.

Everything related to the bugs described in this thread are a complete failure on the management side and NOT on the engineering side. A failure at one single stage (e.g. development or testing) would have been caught by the next stage (obviously with additional expensens; e.g. customer bugfiling). But all those stages at Intel's side as well as Dell's side are not sufficiently well carried out.

Intel Software Network is one thing that improves matters, but unfortunately only a single piece in the puzzle and I haven't seen the right people observing the issues here.

(*) I don't know whether you see the impact here. That's actually proving the absence of the intel Pentium FDIV bug in AMD chips.
0 Kudos
Reply