Embedded Server
Consolidate Considerations of Intel® Xeon and Atom server Hardware, Firmware, Software, and Tools
301 Discussions

Stale byte read back after 64-bit write on Xeon CPU

matt_schafer
Beginner
346 Views

I'm using a Xeon® Gold 6226R and see occasions where reading a 64-bit value doesn't match the value that was recently written. The first byte is stale from the previous write and the remaining 7 bytes are the new expected value.

The address for the write/read is always on an 8 byte alignment using regular DDR memory.  This seems to happen only on the addresses at start 4KB increments.  Also happens on multiple hardware units.

The write is in this assembly object code right before a locked cmpxchg. If i fiddle with the C code or don't run with optimization on, the problem seems to go away. 

3337c: 48 89 02 mov %rax,(%rdx)
3337f: f0 48 0f b1 91 18 0b lock cmpxchg %rdx,0xb18(%rcx)
33386: 00 00
33388: 75 f2 jne 3337c

I haven't been able to replicate this issue yet on a smaller test case. We're running on a 64-bit SMP RTOS. All the virtual memory is mapped directly to RAM. The scenario that seems to trigger this is 1 thread/core writing to memory from it's heap, queuing for a second thread/core that processes and frees the pointer.  The free process writes some data to the memory which is later read back by the original thread/core and sees an invalid byte. There's a mutex with a locked instruction on the initial hand off that should be completing the first cores's write buffer. The 2nd core is also running a thread that is polling a memory value when the previous thread is not runnign.

 

Also, if I put a test pattern write during the free (on the 2nd core), the test pattern becomes the stale byte value even when read on the 2nd core. The stale byte is still observed even with an mfence after the write.

 

I thought this might be related to processor errata CLX32 "Processor May Behave Unpredictably on Complex Sequence of Conditions Which Involve Branches That Cross 64 Byte Boundaries", but still see issue when running with the latest microcode OS loaded (5003707). Our BIOS ships with the original microcode (500002b) which doesn't have the work-around for this errata issue. The microcode git repo (releasenote.md ) microcode GIT repo mentions "An OEM may receive microcode update packages that are a superset of what is contained in this package for inclusion in a BIOS."

Q: I was wondering if there is a case where our board vendor may have access to a microcode superset for this CPU?

Q: Are there any performance counters that I could watch for changes when this happens to get better clue what's going on?

 

0 Kudos
3 Replies
CarlosAM_INTEL
Moderator
318 Views

Hello,  @matt_schafer 

 

Thank you for contacting Intel Embedded Support. 

 

We have received your request but we need to address the following questions to understand it:

 

Could you please clarify if this request is related to a design developed by you or by a third-party company?

 

Could you please let us know the name of the manufacturer, the part number, and where we can find the information if this request is related to a third-party design?

 

If it is your design, please let us know the place of purchase of the cited Intel processor and its part number.

  

We are waiting for your reply.

 

Best regards,

@CarlosAM_INTEL.

0 Kudos
matt_schafer
Beginner
197 Views

We're using an Advantech ASMB-815 server board (ASMB-815).

0 Kudos
matt_schafer
Beginner
254 Views

We're using commercial server hardware from Advantech ASMB-815.

0 Kudos
Reply