Why would they not have similar costs? Both the xchg instruction, which is always locked, and cmpxchg which operates within a read-modify-write cycle in order to achieve atomicity when locked would be dominated by that read-modify-write cycle, The xchg always substitutes its argument with the destination while the cmpxchg sneaks a compare into the memory cycle. As the Software DevelopersManual says about the latter instruction,
This [cmpxchg]instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processors bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)
The memory cycle dominates the time required to use these as lock instructions, so at least I would expect them to take about the same amount of time.