Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Volatile Worthless or Not

chris
Beginner
868 Views
Over at this blog, Arch Robinson claimed that volatile is almost worthless for multithreaded programming. I, Chris, argue that volatile is necessary in order for memory fences to work with lock free programming. Can anyone clarify this?

http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
0 Kudos
9 Replies
Dmitry_Vyukov
Valued Contributor I
868 Views
Quoting - chris
Over at this blog, Arch Robinson claimed that volatile is almost worthless for multithreaded programming. I, Chris, argue that volatile is necessary in order for memory fences to work with lock free programming. Can anyone clarify this?

http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/

Arch is referring to ISO C++'s volatile which has nothing to do with multi-threading, memory fences and lock-free programming.
And you are probably referring to Microsoft Visual C++'s volatile which is basically promoted to the rank of multi-threading synchronization primitive.

0 Kudos
RafSchietekat
Valued Contributor III
868 Views

I think that this is about flushing out the writes somehow. so that the other thread gets to see the new Ready value, which makesat least somesense to me. But I'm also in favour of using real atomics, and you can write those without a single "volatile" (I did, anyway), because they would be redundant with the inline-assembler store instructions. Still, what prevents the optimiser from reordering even those with later code, perhaps code that waits for a value that cannot appear before the store is seen by another thread, leading to deadlock? I checked my code again, and I thought that I had at least a compiler fence at both ends to prevent just that, but not so, apparently, and I now consider that an oversight. I'm aware that language-level atomics can do other kinds of things, if only the specification were readable...

(Added) "I did, anyway": or not yet... they're still there in tbb_machine.h.

0 Kudos
jimdempseyatthecove
Honored Contributor III
868 Views

atomic (from my understanding) has operators operating on (naturaly aligned) volatile'd variables (look at the primitives in TBB's atomic.h)

volatile though does not assure the variable is aligned on a natural boundry
(natural boundry variables can be atomicaly R, W, or LOCK RMW).

volatile does assure that the compiler will always generate code to reference memory

atomic typed variables are aligned on natural boundry.

Therefore atomic typed variables, when used internaly with volatile, are assured to atomically R, W, or LOCK RMW.

volatile alone typed variables are not assured to be aligned on natural boundry, therefore are NOT assured to atomically R, W, or LOCK RMW...

...UNLESS the programmer takes caution to enforce alignment rules on such declared variables.

atomic typed variables are permitted to have some optimizations performed on them

Some of these optimizations may interfere with multi-threaded programming.

atomic line = 0;
...
line = 1;
line = 2;

compiler is permitted to remove first statement. Therefore, if other thread is monitoring progress, it will never see line=1.

Be cautious in assuming atomic variables will perform as you intend. If you require some probability of reading all instances of (programmed) writes then do not use atomic, use volatile.

volatile int line = 0;
...
line = 1;
line = 2;

compiler is NOT permitted to remove first statement.
however, it is not required to align line on natural boundry.

A stack local int variable will tend to be naturally aligned.
A structure member variable is not assured to be naturally aligned
A a programmer, you are required to assure alignment (when atomnicity is required).

Jim Dempsey



0 Kudos
RafSchietekat
Valued Contributor III
868 Views
"Therefore atomic typed variables, when used internaly with volatile, are assured to atomically R, W, or LOCK RMW."
The RMW operations require specific assembler instructions, but the existing TBB implementation does not bother with those for ordinary loads and stores on x86 (except for 8-byte data) or x64, that's true. I don't agree with that, though, even if it happens to work. I think that some existing code would break if volatile were taken out, so that decision is not so easy to make.
0 Kudos
jimdempseyatthecove
Honored Contributor III
868 Views

>>The RMW operations require specific assembler instructions

Correct - these (RMW)are provided with compiler intrinsics (or inline assembler).

the atomic class will use these compiler intrinsics (or inline assembler).
So atomic will "hide" the nasties

However, use of atomic will require at times

if(var.read_the_variable_right_now_no_matter_what()==whatnot) ...

or whatever the end member function ends up being called
(or in lieu of member function some new keyword/directive as in C++0x)

where volatile can use

if(var==whatnot) ...

provided you are also careful to correctly align the volatile variable.

atomic variables are good and thread safe

BUT the behavior may not necessarily produce the desired result.

The use cautions are not clearly documented, at least to the point of presenting a clear picture up front.


The same issue is involved with volatile with respect to alignment/atomnicity

In both cases the usage examples should include "***WARNING***" when programming xxx use yyy technique.


Jim Dempsey

0 Kudos
RafSchietekat
Valued Contributor III
868 Views
"or in lieu of member function some new keyword/directive as in C++0x"
I didn't spot that yet?

"BUT the behavior may not necessarily produce the desired result."
Specifically? The compiler's coalescing optimisation you mentioned (let's discount statistics), or something else (too)?

"The same issue is involved with volatile with respect to alignment/atomnicity"
Such use seems dubious (not portable), even ignoring memory semantics.
0 Kudos
chris
Beginner
868 Views
Quoting - Raf Schietekat

Still, what prevents the optimiser from reordering even those with later code, perhaps code that waits for a value that cannot appear before the store is seen by another thread, leading to deadlock?

For volatile to work the way I intended, there has to be some way to insert a barrier that at least limits the compiler from moving some types of instructions across it. Out of the three compilers Intel, Visual Studio, and gcc, do any of them have such a feature?
0 Kudos
jimdempseyatthecove
Honored Contributor III
867 Views
Quoting - Raf Schietekat
"or in lieu of member function some new keyword/directive as in C++0x"
I didn't spot that yet?

"BUT the behavior may not necessarily produce the desired result."
Specifically? The compiler's coalescing optimisation you mentioned (let's discount statistics), or something else (too)?

"The same issue is involved with volatile with respect to alignment/atomnicity"
Such use seems dubious (not portable), even ignoring memory semantics.

Specifically the coalescing optimization. Which may result in coalescing across a lengthy loop. And in which case can introduce unintended interlocks. (assuming you forgot to use the designated member function for atomic including the correct std::memory_order_....)

The atomic appears to be written from the perspective of and a bias towards the thread issuing the statements as opposed to the threads observing the results. Whereas volatile appears impartial.

Coalescing optimizations are not always good

Assume you are on a processor with HT capability
Assume the hardware PREFETCHn instruction is either not implementen or not doing what you want it to do.
You can split the thread processing into one that does the work, and a second that monitors the progress of a volatile variable(s) (non-coalescing). This second thread can be "brainless" so to speak and simply performs memory moves to a register to the addresses pointed to by each of the volatile variable(s) should they change (and include _mm_pause).

Essentialy the second HT thread does no work except for assuring soon to be accessed data is fresh and ready in L1 cache. This will work across page boundries as well as through page faults.

In the above scenario coalescing optimizations will thwart the intentions of the programmer.

The above is also a simple example of synchronization through a memory mailbox coalescing optimizations interferes with such synchronization. The two (or more) cannot keep in lock-step if some of the steps cannot be observed.

Jim Dempsey
0 Kudos
RafSchietekat
Valued Contributor III
867 Views
"For volatile to work the way I intended, there has to be some way to insert a barrier that at least limits the compiler from moving some types of instructions across it. Out of the three compilers Intel, Visual Studio, and gcc, do any of them have such a feature?"
Unless you come up against a fiercely optimising compiler that won't let go even after linking, a portable compiler fence would require nothing more than a call to an external empty function. In g++, you could do without the hassle of another source file, however trivial, and the overhead of making that call, however small, by using a blank inline assembler call that "clobbers" all memory. At least, these are commonly understood to provide the functionality you seek, at least between operations that work on data that is not just locally visible, unless the g++ option even frees you from that restriction.
0 Kudos
Reply