Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

atomic read and write with 2 CPUs (one reads, another writes the variable)

bitmaker
Beginner
762 Views
Hello everyone,

my question is rather simple, but the one for which I have not been able to get a definitive AND authoritative answer. The question relates to atomic read and atomic write of variables where such variable is accessed by two threads on different CPUs (only one thread is writing the variable and the other thread only reads it) BUT the varialbe is multibyte (i.e. 4 bytes size) AND is not aligned on a 4 byte memory boundary.

Consider the following C++ code:


Initialisation, shared code:

/*
for now we can presume "new" returns at least 4 byte aligned mem.
*/
char *c = new char[5];
/*
misaligned variable (its address is not aligned on 4 byte boundary)
*/
long *l = ( (long *) (c + 1));
/*
initialise prior to any threads being started
*/
*l = 0;

Then start the threads...

Thread A (on CPU 1) has something like:
*l = 102637;

Thread B (on CPU 2) has:
long x = *l;

The question is, in thread A, the code:
*l = 102637;
is atomic write or not?

also, in thread B, the code:
long x = *l;
is atomic read or not?

In other words, will there be a possibility for thread B to read the "half-written" value (where some of the variable's bytes are old and some are new from the modification performed by thread A)...

My understanding is that the above code will not have atomic read and write because the variable in question is not aligned on a 4 byte boundary and there shall not be an attempt at the hardware level to somehow atomise the reading or writing of the non-aligned varialbes.

I am basing my thoughts on the following excerpt from "CHAPTER 7 MULTIPLE-PROCESSOR MANAGEMENT" of "Intel Architecture Software Developer's Manual Volume 3: System Programming":

The Intel386T, Intel486T, Pentium R , and P6 family processors guarantee
that the following basic memory operations will always be carried out
atomically:

- Reading or writing a byte.
- Reading or writing a word aligned on a 16-bit boundary.
- Reading or writing a doubleword aligned on a 32-bit boundary.
"

In other words if doubleword is NOT aligned to a 4 byte boundary (as in previous code example) then it will not be atomic and the underlying hardware will not try to ensure that any reads or writes of such a doubleword are atomic...

Am I correct in this line of thinking?

Kind regards,
Leon
0 Kudos
2 Replies
jim_dempsey
Beginner
762 Views

Most IA32 (if not all) multi-propcessor capable processors use 128-bit data path to memory. If only one processor is writing the data and if your 32-bit data dword does not span the 128-bit boundry, then the write will be atomic. If you cannot align the data then you will have to use a flagging mechanism to assure consistant reads.

e.g. If you are passing a volitilepointer. Typicaly a pointer might always have the sign bit 0 and points to data thatis aligned on an even byte boundry. When modifying the pointer you can write twice. Once with the sign and odd bits set, then immediately later with the sign and odd bits cleared. Then on the processor that reads the data, after reading AND the sign and odd bits with the value obtained and if zero the value obtained is valid else re-read and test again. Make sure the read reads from memory and not cache.

You could use spin-locks and such but from your description a spinlock would be an overkill.

Jim Dempsey

0 Kudos
daradder
Beginner
762 Views


jim_dempsey@ameritech.net wrote:

...
e.g. If you are passing a volitilepointer. Typicaly a pointer might always have the sign bit 0 and points to data thatis aligned on an even byte boundry. When modifying the pointer you can write twice. Once with the sign and odd bits set, then immediately later with the sign and odd bits cleared. Then on the processor that reads the data, after reading AND the sign and odd bits with the value obtained and if zero the value obtained is valid else re-read and test again. Make sure the read reads from memory and not cache.

...


Ingenious. But back to the OP's question. I have beeninvestigating the same topic, only with respect to 64-bit accesses. I read the following in the same document that the OP quoted:

"Accesses to cacheable memory that are split
across bus widths, cache lines, and page boundaries
are not guaranteed to be atomic by the Pentium 4,
Intel Xeon, P6 family, Pentium and Intel486 processors.
The Pentium 4, Intel Xeon, and P6 family processors
provide bus control signals that permit external
memory subsystems to make split accesses atomic;
however nonaligned data accesses will seriously
impact the performance of the processor and should
be avoided."

I took that to mean that with one of the modern processors and a reputable chipset unaligned accesses would be atomic. Apparently I am wrong because I have tested for this on a multiprocessor machine and the threads occasionally conflict with their reads and writes.

So what are these "bus control signals"; at which processor pins do they appear?

Regards, DarAdder

0 Kudos
Reply