Diffs of xchg and cmpxchg to impl. lock

judeyang · ‎05-21-2007

I've read the artical Implementing Scalable Atomic Locks for Multi-Core Intel EM64T and IA32 Architectures at http://www3.intel.com/cd/ids/developer/asmo-na/eng/dc/threading/333935.htm
.
It says that the cmpxchg is just as expensive as the xchg instruction. Some test data were shown, but no reason was given.

Can anybody explain why those 2 instructions have the same cost to implement lock? Does this cost equivalency only exist for some certain scenarios?

Thanks a lot!

robert-reed · ‎06-06-2007

Why would they not have similar costs? Both the xchg instruction, which is always locked, and cmpxchg which operates within a read-modify-write cycle in order to achieve atomicity when locked would be dominated by that read-modify-write cycle, The xchg always substitutes its argument with the destination while the cmpxchg sneaks a compare into the memory cycle. As the Software DevelopersManual says about the latter instruction,

This [cmpxchg]instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processors bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

The memory cycle dominates the time required to use these as lock instructions, so at least I would expect them to take about the same amount of time.

--Robert.