Recently, I encountered a rarely happened bug.
**Environment:
1. The address of a pointer(called pMemory) is mis-aligned.
2. Two thread simultaneously access pMmeory
3. Our program runs on a server with 8 CPUs
4. Original value of pMemory is 0xFFFF FFFF
**Operation Sequence:
1. One thread read the value of pMemory while the other thread modified pMemory.
the read/modify instructions both are MOV
2. The first thread firstly read the lower part of pMemory, that is 0xFFFF
3. The second thread modified pMemory from 0xFFFF FFFF to 0x02de 2c68
4. The first thread secondly read the higher part of pMemoyr, that is 0x02de,
and finally the first thread read the pMemory as 0x02de ffff which is a invalid pointer.
Currently we are discussing the way to solve the problem.
Do you have any suggestion?
I don't have too much time, so would you please rely as soon as possible.
BTW, our program is a network program, so the memory is designed to be aligned on one-byte with compiler options such as /Zp1.
It's impossible for us to change /Zp1 to natural alignment with aspect of risks and workload.
Intel 64 and IA-32 Architectures
Software Developers Manual
Volume 3A:
System Programming Guide, Part 1
8.1.1 Guaranteed Atomic Operations
The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M,Pentium 4, Intel Xeon,
and P6 family processors provide bus control signals that permit external memory
subsystems to make split accesses atomic;
however,nonaligned data accesses will seriously impact the performance of the
processor and should be avoided.
Would you please detail the way:
"provide bus control signals that permit external memory subsystems to make split accesses atomic"
Link Copied
Q6600 32-bit pointers
Test1 total = 16400.1 (using simple store of pointer)
Test2 (w/LOCK) total = 374715 22.8484x
Test3 (multi-write) total = 28346.4 1.72844x
Q6600 64-bit pointers
Test1 total = 16927.5
Test2 (w/LOCK) total = 674978 39.8746x
Test3 (multi-write) total = 40752.6 2.40748x
The Test3 on 32-bit system performs
store of short (low 2 bytes)
store of char (3rd byte)
store of char (4th byte) - the char containing 0xFF to be overwritten last
The Test3 on 64-bit system
store of long (low 4 bytes)
store of short (bytes 5 and 6)
store of char (byte 7)
store of char (8th byte) - the char containing 0xFF to be overwritten last
32-bit shows 13.22x improvement using multi-store technique over LOCKed technique
64-bit shows 16.56x improvement using multi-store technique over LOCKed technique
Note, the test code was all C++ (no ASM) so this was completely portable.
Jim Dempsey
For more complete information about compiler optimizations, see our Optimization Notice.