Software Archive
Read-only legacy content
17060 Discussions

Bug in wbinvd instruction for P4 based processors

Deleted_U_Intel
Employee
607 Views
Has anybody besides myself seen memory corruption resulting from use of the wbinvd instruction in Pentium 4s and/or Xeons (based on P4)? Intel claims to have never seen this problem, and it has taken them well over a year to ask for a test program to demonstrate this issue.

Here is the scenario, I have a DOS-based application that can run multiple threads (one per "cpu", HyperThreading is enabled) and test memory as it comes off the production line. At least one of the tests requires immediate reads after writes, therefore I use wbinvd after the writes to ensure data is flushed to physical memory and that I am not testing the processor caches. Within moments of starting my tests with multiple threads running, I begin to see memory compare errors, even with ECC registered modules. You might ask "Why isn't the chipset reporting ECC errors instead?" and the answer is because the memory I read back in was previously written by the processor when it executed the wbinvd instruction. As it turns out, upon examination of all processor registers using my Arium ECM-50, I found that the DI part of EDI in one of the other "cpus" is in the area as the location that is being reported as failing data compares. What this means is that where I found corrupt data for a length of bytes, the DI portion of EDI in another processor is the same value as the next memory location beyond the corrupt data.

In another scenario, the same wbinvd bug causes memory corruption that then leads to exception interrupt 13 (0Dh). This particular case does not occur with the frequency of the one above because the use of this instruction is far less often. When it does effect me, it shows up when my code on an alternate processor is incorrectly executing 16-bit code in 32-bit mode and attemtps to write to a memory location with CS: override. I have not been able to pinpoint the exact memory corruption which led to executing code from the wrong area, I have only been able to trap the exception interrupt and examine each processors' state for the invalid instruction.

At the present time, I see this problem occurring with Dempsey and Nocona processors using Blackford family chipsets. I also see similar problems with the wbinvd instruction when running on an IBM server using dual Xeons with the Lindenhurst chipset. The IBM server is running a pair of processors with an ID value of F43.

The wbinvd problem was first reported to Intel back in December 2004 when I found memory intended for the 4 GB address was actually being written to address zero, overwriting the interrupt vector table. This was with the Lakeport CRB and an early BIOS that supported more than 4 GB. The processor at the time had an ID value of F41.

I have available some F3x and F2x processors, but not the inclination to identify at what stage Intel introduced a bug in the Pentium 4 series.

Message Edited by CHyde on 05-23-200604:34 PM

0 Kudos
1 Reply
Intel_Software_Netw1
607 Views

Hello,

Thank you for posting your question on the Intel Software Network forum. Our engineering staff is taking a look at your question more closely and I will post their input as soon as I receive it.

Best regards,
Jim A
IntelSoftware NetworkSupport
http://www.intel.com/software
Contact us

0 Kudos
Reply