In the referenced MSDN material I believe the sample code is poorly written. And lack of comments does it no good either.
Multiple processors can concurrently enter the ComputeValue() section of CacheComputedValue. This may be by design but it seems odd to me to have several processors concurrently executing ComputeValue().
I think in FetchComputedValue the
is not there to protect iValue but to protect the writing of piResult. A comment in the sample code might have cleared thing up.
Using mutex or spinlock or other event-like object is or may be recommended when an operation cannot be performed with a single Interlocked... operation. In reality the mutex, spinlock, ... now internally use Interlocked... operations in addition to the O/S calls that may occure if operation fails. Step into a SpinLock or Mutex and watch the code in the Disassembly window.
You can implement your own mutex operations using InterlockedExchange or other Interlocked... operations. One of the major reasons for using the O/S supplied routines is to reduce unnecessary processing overhead when a resource is in use and the thread owning the resource is suspended.
It used to be that to incriment a shared variable that you had to acquire a SpinLock which guards the variable, incriment the variable, (flush), then release the SpinLock. Interlocked instructions can now be used to accomplish the same without the use of SpinLock (or fast mutex).
In the case of the MSDN article they had a flag and a variable. Depending on the values of the variable a bit in the variable can be used as the flag. Then with the use of InterlockedCompareExchange you can contitionally update the variable and flag.
Do a web search on Lock Free Programming and LockFree/Wait Free Programming there are some good reference articles on this subject. MSDN does have some good reference material related to this but unfortunately it also has some bad material too.
The Interlocked instructions are processor dependent and have data size and alignment requirements. Also, they tend to be restricted to integer (32-bit or 64-bit) 128-bit is not commonly available. There are some interesting ways to perform a lock free add of floating point numbers through the use of local union of float and integer and whos integer part is conditionally swapped with InterlockedCompareExchange.
To answer your question, if you code your Interlocked... calls correctly then you do not need the mutexes, SpinLock, ...