Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Diffs of xchg and cmpxchg to impl. lock

I've read the artical Implementing Scalable Atomic Locks for Multi-Core Intel EM64T and IA32 Architectures at
It says that the cmpxchg is just as expensive as the xchg instruction. Some test data were shown, but no reason was given.

Can anybody explain why those 2 instructions have the same cost to implement lock? Does this cost equivalency only exist for some certain scenarios?

Thanks a lot!
0 Kudos
1 Reply
Valued Contributor II

Why would they not have similar costs? Both the xchg instruction, which is always locked, and cmpxchg which operates within a read-modify-write cycle in order to achieve atomicity when locked would be dominated by that read-modify-write cycle, The xchg always substitutes its argument with the destination while the cmpxchg sneaks a compare into the memory cycle. As the Software DevelopersManual says about the latter instruction,

This [cmpxchg]instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processors bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

The memory cycle dominates the time required to use these as lock instructions, so at least I would expect them to take about the same amount of time.


0 Kudos