Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Memory Semantics and C++0x

RafSchietekat
Valued Contributor III
2,079 Views
"Formal definitions of the memory model were rejected as unreadable by the vast majority of programmers." - Pthreads standard, as quoted in article mentioned here

I would like to review some issues related to the upcoming revision of the C++ standard, by looking at some of the relevant texts. A good first choice seems to be "Threads Cannot Be Implemented As a Library", a must-read seminal article (or in my case must-reread, because it is an important reference in another article that I'm reading now for the first time). For brevity, I will try to limit my comments to what immediately concerns me now: a potential middle ground between the naivety exposed in this article, and unreadably rigorous formality.

In "4.1 Concurrent modification", a speculative code transformation is outlined to show that the concept of "race" should be defined. However, when used with C++0x's relaxed atomic operations, formally there is no "race" (by its definition in C++0x), but nothing has really changed, and relaxed atomic operations are still considered problematic. In my proposed reimplementation of the TBB atomics library, I have used the qualifier "volatile" specifically to disallow speculative writes, relying on the current specification that describes the behaviour of a program partly in terms of how it treats "volatile" variables (details not repeated here). So, doesn't that solve the problem (and without any language change), or can it still reoccur on the hardware side? If so, what hardware would that be, and, e.g., would such hardware not already be problematic for "volatile" accesses that have nothing to do with multithreading?

In "4.2 Rewriting of Adjacent Data", a clear argument is made to define "memory locations", an unavoidable change.

In "4.3 Register promotion", it is made clear that such promotion has to be restricted, but I see nothing that should worry a programmer: determining which promotions are still safe seems strictly a compiler writer's problem.

In "5.1 Expensive Synchronization: An Example", a plea is made... to allow races!

So, enough to make you worry about using multithreading now, but only requiring simple changes, I would say, and not a hint yet of threatening that the behaviour of programs containing races is undefined (quite to the contrary).
0 Kudos
31 Replies
Dmitry_Vyukov
Valued Contributor I
1,610 Views
Will it really make significant sense if it will be 29.1 and not 1.10? Somewhere semantics of atomics and fences must be described, right? Since we are talking about industrial language backed by ISO. Or you want to leave semantics of atomics and fences formally undefined as it is now with TBB atomics? What is your goal?
I think you may ask your question over comp.programming.threads, there are some people directly related to C++0x memory model hanging around...

0 Kudos
RafSchietekat
Valued Contributor III
1,610 Views
"I want the truth!" "You can't handle the truth!" - Lt. Daniel Kaffee and Col. Nathan R. Jessep

"Will it really make significant sense if it will be 29.1 and not 1.10?"

Normative or non-normative: big difference.

"Somewhere semantics of atomics and fences must be described, right?"
I don't know yet. Does it also describe what happens if you compile legacy code that uses POSIX threads? Does it also describe what happens if you link with assembler code, with FORTRAN, or use it as Objective-C++? Aren't we supposed to be able to take responsibility for that based on less totalitarian rulings, that are actually intelligible, and that don't break down so easily?

"Since we are talking about industrial language backed by ISO. Or you want to leave semantics of atomics and fences formally undefined as it is now with TBB atomics?"
It seems better not to speak of it than to legislate that "delete_all_files(); abort();" is a legitimate execution of the sieve example in the article mentioned above, or of maybe most multithreading code currently in use, including many programs based on TBB, because that's what "undefined" means. Things should be robust enough without any need to prove the absence of the proposed standard's specific definition of data races. Using a library like TBB should not cause undefined behaviour as long as it is not completely reimplemented on top of C++0x primitives. It is not "leaving semantics undefined" if some things don't require a revision, and a revision that blows its own horn without recognising that is suspect to me.

"What is your goal?"
I don't like what I'm reading, and if this emperor is not wearing clothes somebody should say something, somebody should be devil's advocate before sanctificationcanonisation can proceed. I think that the current version is not fit for ratification, because it contains an unwarranted and significant regression regarding innocuous races, as much as the standard decrees that there is no such thing. I also particularly resent the condescending lie about spurious failure of try_lock() being boasted about in the preparatory paper I'm currently reading, although I see that N2800 contains a potentially valid excuse for that.

"I think you may ask your question over comp.programming.threads, there are some people directly related to C++0x memory model hanging around..."
I may do that later.

(Correction 2009-03-24) See in the text.
0 Kudos
robert_jay_gould
Beginner
1,610 Views
Quoting - Raf Schietekat
"I want the truth!" "You can't handle the truth!" - Lt. Daniel Kaffee and Col. Nathan R. Jessep

My fear number one is that we'll never see a C++0x... we have just 9 months left...

My fear number two is since we have only 9 months left, C++0x is going to be rushed out and cause more harm than good with it ever more unintelligiblewording, andundeterminedbehavior cases.

My fear number three is a lousy C++0x will make C++ even more arcane, and compilers even morenitpickey (talking to you GCC), and drive everyone nuts, thus sending C++ to hell, so we'll never see a C++1x or C++2x.

Honestly everyday that passes makes programming friendly languages more and more attractive, I'm really beginning to pray that D, or even C# or Objective-C replace C++.I mean writing correct platform independent C++ that "actually does something" is already about as hard as programming can get nowadays, writing straight C or assembly is the more sensible option most of the time.
0 Kudos
shachris23
New Contributor I
1,610 Views

My fear number one is that we'll never see a C++0x... we have just 9 months left...

My fear number two is since we have only 9 months left, C++0x is going to be rushed out and cause more harm than good with it ever more unintelligiblewording, andundeterminedbehavior cases.

My fear number three is a lousy C++0x will make C++ even more arcane, and compilers even morenitpickey (talking to you GCC), and drive everyone nuts, thus sending C++ to hell, so we'll never see a C++1x or C++2x.

Honestly everyday that passes makes programming friendly languages more and more attractive, I'm really beginning to pray that D, or even C# or Objective-C replace C++.I mean writing correct platform independent C++ that "actually does something" is already about as hard as programming can get nowadays, writing straight C or assembly is the more sensible option most of the time.

+1 to this post. I wonder when the C++ community is going to focus on developers productivity and ease of usage.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views
Quoting - shachris23
+1 to this post. I wonder when the C++ community is going to focus on developers productivity and ease of usage.

The same day when Java/C#/Haskell/Erlang communities will start focusing on high performance, access to the underlying hardware, access to the underlying OS features, systems programming, etc.
There are just different languages for different purposes...

0 Kudos
RafSchietekat
Valued Contributor III
1,610 Views
I revisited "Threads Cannot Be Implemented As a Library" (see higher) because of a reference to it in the very first paragraph of "Foundations of the C++ Concurrency Memory Model". I have not yet read it in full, but here are some comments already.

One thing that immediately stands out even in the abstract is the complete turnaround with regard to data races: "race-free", "We give no semantics to programs with data races.". Not that I think that data races are an essential part of multithreading, far from it, but it seems rather disturbing that the behaviour of a program should become undefined because of a technique as described in the previous article, let alone an accidental slip-up against the unintelligible rules that define what is a "race", or something that happens in external code not subject to the C++ standard. Literally "undefined" means that the program might decide to start deleting all your files, clearly not an acceptable situation. In the abstract are some inviting phrases like "intuitive race definition" and "simple model", but I've already had a look at N2800 (the latest working draft for C++0x), and I beg to differ.

Is there really a machine that would allow r1=r2=1 in Figure 1 if, by declaring X and Y as even relaxed atomic (implemented to leverage the semantics of "volatile"), at least the compiler would refrain from otherwise entirely appropriate speculative writes? The example does identify a race if no atomics are used, but it would be nice to be assured at this point that this is exclusively due to compiler optimisation. I would like to think that hardware speculation is done in a multiverse, where each speculative universe is isolated from all others, and choices are only committed when tests are resolved based on non-speculative data. This would preclude r1=r2=1, because that would require inter-universe infection across the coherent cache or across a context switch. Is this a valid assumption, or do some machines really behave that way?

On pp. 2-3, the article presents "atomic" as a synonym for "in a single total order" or implicitly "sequentially consistent" unless explicitly qualified as "low-level". It designates as an "unfortunate overload" its use "in the sense that no other individual thread can see a partially updated value". Actually, atomic literally means "indivisible" (atoms were conceived in ancient Greece as the elementary building blocks of matter), and this of course refers to the behaviour of a read-modify-write sequence (no other operation can come between them), whether this is encoded in a single machine instruction, or as a sequence of separate instructions that succeed or fail as a whole, like the A(tomic) in an ACID transaction. I myself fail to see how "indivisibility" could possibly have anything to do with the relative order in which various observers may see the outcomes of a set of single and/or atomic operations, so I find the newspeak in this article confusing and annoying.

In "3. Making Trylock Efficient" on p. 4, trylock() is said to potentially spuriously fail as a "simple solution" to avoid "introducing the complexity of different types of synchronization or a happens-before relationship to define a data race". My comments about this in an earlier posting should suffice.

(Clarification 2009-03-25) I do also see its use "in the sense that no other individual thread can see a partially updated value" as part of the real meaning of "atomic" (together with indivisible operation sequences), even though on modern hardware with its wide-enough data buses this typically only requires attention to proper alignment. Note that, strictly speaking, this describes isolation instead of atomicity, but atomics provide both (entangled) properties.

(Second clarification 2009-03-25) The "sequence of separate instructions that succeed or fail as a whole" (on architectures that work this way) that I likened to "the A(tomic) in an ACID transaction" is of course only superficially related, because in the case of atomics only the last instruction tries to write anything (a write+commit combo that may succeed, or fail, in which case the user has to start all over).

(Added 2009-03-28) More comments to be expected (interesting article).
0 Kudos
robert_jay_gould
Beginner
1,610 Views
Quoting - Dmitriy Vyukov

The same day when Java/C#/Haskell/Erlang communities will start focusing on high performance, access to the underlying hardware, access to the underlying OS features, systems programming, etc.
There are just different languages for different purposes...


high performance: 90% of the high performance libraries for anything from threading, to databases, to text processing, to hardware graphics, are written in C. I actually think its more like 98% but I'll leave it at 90% just in case.

hardware: I've worked on ordinary computers, their hardware is accessed best through C or assembly, same goes for mobile devices, embedded devices, game machines, etc... I've worked on over a dozen different computing devices, and have yet to see one that provides hardware interfaces in C++, that are not a wrapper of their C API.

OS features: You mean Windows? I think you actually have easier less error prone access through C#, or again C. Macs, they provide a nice Objective-C interface over their C interface, the C++ interface? They literally threw it away. Or Unix/Posix? that's C too. Can't think of an OS that relies mostly on C++, can you?

Systems Programming: Ok as sad as it is about half of the systems that keep our world in order (in this day ad age) are written in COBOL... sigh... the other half is mostly C and Java.

Myverdictis if people want real control and/or performance they use C, and not C++. C++ is too crazy, hard, and platformdependentfor real performance. Your basically left betting on the compilers at that point, since the same code's performance is compiler dependent. If you want easy, you use Java, C#, or Objective-C.

If you want both you should use C++, because it was supposed to be nicer than programming in C, but with just as good a performance, but that promise is being broken, its getting ever crazier and undefined. And now all of the sudden they are saying that threaded code/constructs people have been using for decades are broken? I mean how can they say Posix threads is broken, even Windows threading API works fine!

Ok, so thecommitteewants to drive us nuts by saying we can no longer rely on our code because some imaginary hardware, andoverzealouscompilerwriters(GCC as usual comes to mind here) might screw us over by over optimizing illogical/irrational code constructs?

Now the real issue is they may not be saying this at all. But for myself, and I probably speak for many developers in the trenches, their wording isunintelligible, they SHOULD write specs in a way that doesn't confuse their audience, or they will loose their audience, C++ was never meant to be an academiclanguage, but they keep trying to make it so. If I wanted academic I'd use Haskell, its about as fast as C++ anyways.

0 Kudos
RafSchietekat
Valued Contributor III
1,610 Views
#3 "My fear number one is that we'll never see a C++0x... we have just 9 months left..."
How do you figure that? We have 6 years, 8 months and 1 week until the end of '0F (as the joke goes)!

#3 "My fear number two is since we have only 9 months left, C++0x is going to be rushed out and cause more harm than good with it ever more unintelligible wording, and undetermined behavior cases."
Was anything learnt at all from "The Rise and Fall of CORBA", with its terribly clever C++ mapping (emphasis on terribly), see last page?

#3 "My fear number three is a lousy C++0x will make C++ even more arcane, and compilers even more nitpickey (talking to you GCC), and drive everyone nuts, thus sending C++ to hell, so we'll never see a C++1x or C++2x."
I hear this a lot, that C++ is going to go away, and it disturbs me. I want C++ to be there "for the ages", on top of C, on top of the hardware. It's languages like COBOL and Pascal that may come and go, not C++ (warts and all, just not too many, please). But I'll leave it at that as far as C++ vs. everything else goes.

#7 "And now all of the sudden they are saying that threaded code/constructs people have been using for decades are broken? I mean how can they say Posix threads is broken, even Windows threading API works fine!"
I must point out that the problem is not quite imaginary, I'm only questioning the form the proposed solution is taking.
0 Kudos
robert_jay_gould
Beginner
1,610 Views
Implementations appear to have converged oncharacteristics that make it possible to write correct multi-threaded applications, though largely, we believe, based onpainful experiences rather than strict adherence to standards.

This single line of the paper on Threading Libraries is where the real issue is IMHO. What's wrong with learning from experience and THEN standardizing that? Why do people think standard need to come first? I mean a "standard" literally means the "standard way people do stuff", so thecommitteeshould be basing the standard on the hard earn knowledge of hardware makers, library makers and programmers, not ignoring what works and has worked, to replace it with some imaginary scenario where we have to become paranoid of every single line of code that was perfectly valid, but might no longer be because implementations are now undefined behavior. They should make successful implementations of into defined behavior.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views

high performance: 90% of the high performance libraries for anything from threading, to databases, to text processing, to hardware graphics, are written in C. I actually think its more like 98% but I'll leave it at 90% just in case.

hardware: I've worked on ordinary computers, their hardware is accessed best through C or assembly, same goes for mobile devices, embedded devices, game machines, etc... I've worked on over a dozen different computing devices, and have yet to see one that provides hardware interfaces in C++, that are not a wrapper of their C API.

OS features: You mean Windows? I think you actually have easier less error prone access through C#, or again C. Macs, they provide a nice Objective-C interface over their C interface, the C++ interface? They literally threw it away. Or Unix/Posix? that's C too. Can't think of an OS that relies mostly on C++, can you?

Systems Programming: Ok as sad as it is about half of the systems that keep our world in order (in this day ad age) are written in COBOL... sigh... the other half is mostly C and Java.

Myverdictis if people want real control and/or performance they use C, and not C++. C++ is too crazy, hard, and platformdependentfor real performance. Your basically left betting on the compilers at that point, since the same code's performance is compiler dependent. If you want easy, you use Java, C#, or Objective-C.

If you want both you should use C++, because it was supposed to be nicer than programming in C, but with just as good a performance, but that promise is being broken, its getting ever crazier and undefined. And now all of the sudden they are saying that threaded code/constructs people have been using for decades are broken? I mean how can they say Posix threads is broken, even Windows threading API works fine!




I meant "C/C++", sorry for that, usually I am trying to clearly distinct "C", "C++" and "C/C++".
C and C++ are basically equal in this context. As far as I understand C++09 memory model, atomics and threading API will be directly employed by next C standard (AFAIK people from C group was especially invited to C++ memory model working group, and that's why we see all those atomic_exchange_explicit() ) .

Btw, Symbian is OS with C++ interface, and Android is OS with Java interface.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views
Now the real issue is they may not be saying this at all. But for myself, and I probably speak for many developers in the trenches, their wording isunintelligible, they SHOULD write specs in a way that doesn't confuse their audience, or they will loose their audience, C++ was never meant to be an academiclanguage, but they keep trying to make it so. If I wanted academic I'd use Haskell, its about as fast as C++ anyways.


Ok, C++09 specification aside for a moment. How many of your colleges are able to read, understand and write synchronization which rely on low-level atomics with fine-grained memory ordering constraints? How do you think how many of the C++ overall are able to do this? How many developers are able to write from scratch at least primitive double-checked initialization, clearly explain why they write it this way, and why it is guaranteed to work?
I can't understand why you and Raf are blaming the specification. It's not the specification that is complicated, difficult, unintelligible, non-intuitive, etc-etc. It's the domain itself (thread interleavings, absence of global order, mutual ordering, reorderings, control-dependencies, data-dependencies, atomicity, etc-etc).
One can make complicated specification for simple thing (bad). One can make simple specification for simple thing (good). One can make complicated specification for complicated thing (good). But one can NOT make simple specification for complicated in itself thing (impossible and basically what you and Raf are proposing).
Relaxed memory models are somehow similar to the theory of relativity (time is property of location). Is there simple descriptions of theory of relativity which are accessible to housewifes?
C++ memory working group put ennormous effort into making specification at least not over complicated (further than the domain itself), into making it self-consistent and integral, into providing high-level ("what") specification and not low-level ("how") specification.
Making specification on low-level ("how") is generally a bad thing. For example, if it will be based on "fences" (which define possible local reorderings around them), then what to do with implementation which does not provide correct ordering but then has some post-mortem means to recover from inconsistency (trick actually used on Alpha)? User anyway will not be able to look at the assembly code and figure out what is happening. Or what to do with absence of global order? In the presence of absence of global order model anyway does not reduces to the simple "fences + sequential consistency" model.
In any case, if you do not need/want/able to use low-level memory model, C++09 also specifies simple subset - sequentially consistent operations or even just mutexes (basically POSIX). You are always allowed to stay on that level.
Blaming is easy. Orders of magnitude simplier specifications for relaxed memory models (as well as theory of relativity and quantuum physics) are welcome.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views
Quoting - Raf Schietekat
Literally "undefined" means that the program might decide to start deleting all your files, clearly not an acceptable situation.

Ok, and what happens when you calling virtual method of the already destroyed object? LOL. Defining results of data races as undefined does not add nor cut anything from current C++. If you care about your files you may program in sandboxed javascript.

0 Kudos
RafSchietekat
Valued Contributor III
1,610 Views
#10 "I meant "C/C++", sorry for that, usually I am trying to clearly distinct "C", "C++" and "C/C++".
C and C++ are basically equal in this context. As far as I understand C++09 memory model, atomics and threading API will be directly employed by next C standard (AFAIK people from C group was especially invited to C++ memory model working group, and that's why we see all those atomic_exchange_explicit() ) ."

Time to have a look at N2800's "29 Atomic operations library [atomics]".

Alarm bells immediately go off when I look at the synopsis: where is the header file? "17.6.2.3 Headers" only mentions in "Table 14 - C++ headers for C library facilities", and there is not an (or another <(std)atomic(s)> variant) to be found in the entire document. Where is the C++ template going to fit? Is it an afterthought, maybe? I don't see contents inside , for a good reason, and I expect the same for atomics: any C functions should only be available if a specific header is included, and vice versa. Why should C++ be a slave to C's need to use names like "memory_order_release" because C does not have namespaces (if not as a ploy to impose the default memory_order_seq_cst)?

What might possibly be the difference between flag and bool? We're not deemed worthy of a rationale (only of a silly joke about atomic objects being "neither active nor radioactive"). Should we guess, or just blindly accept? I never saw a need to differentiate. On what machine is providing fewer operations necessary and sufficient to give access to a lock-free implementation?

"// For each of the integral types:" forgets that bool is also an integral type.

How does addition for address types take into account the size of the referent: the template can do that, but why should it publicly inherit from atomic_address? Will the operations available on atomic_address have their original meaning, or will they be overloaded to suddenly know about the referent size? In what way isn't that confusing? What happens for void*, which does not have a specific template specialisation?

Where are floating-point atomics?

In "29.1 Order and Consistency [atomics.order]", what might "regular (non-atomic)" possibly mean? Is it another instance of equating "atomic" with "sequentially consistent"?

I still don't know what a "consume operation" is supposed to mean.

"Implementations shall not move an atomic operation out of an unbounded loop." Why should implementations be able to move an atomic out of any loop? Are they allowed to do that with a volatile variable? If not, what would be the use of allowing things to be done to an atomic that would not be allowed with a volatile?

In "29.3.1 Integral Types [atomics.types.integral]", why are there only variants of compare_exchange, which really means compare_store? Why aren't all the other combinations available as well?

Why are there no atomic operations that do not return the previous value, which can easily be implemented as locked instructions on x86?

I have not combed through all the details yet, though.

I still have the same objection against memory_order as a non-constant parameter, where only a constant value makes sense: this can be enforced as a template parameter, but apparently C++ is not to benefit from features that are not also available in C. Conversely, weak/strong can easily, depending on the underlying architecture, either be ignored, or be used naturally in the loop that synthesises strong from weak, so whyever should it be an affix where memory order is not even required to be constant?

And of course I also object against sequential consistency as the only possible default (a reflection of the draft's imposition of sequentially consistent semantics throughout), instead of being able to configure an atomic based on its usage, or even to always require explicit semantics.

So it's safe to say that I'm not entirely happy with the current draft, and I think that C++ users would benefit from a revision before this becomes the standard.

0 Kudos
RafSchietekat
Valued Contributor III
1,610 Views
#11 "Ok, C++09 specification aside for a moment. How many of your colleges are able to read, understand and write synchronization which rely on low-level atomics with fine-grained memory ordering constraints?"
"What's an atomic?" :-)

"How do you think how many of the C++ overall are able to do this?"
Once anybody gets going in the world of atomics, I don't see why simple release/acquire should be such a challenge. Or even relaxed atomics. Just pick the right tool for the job, and it takes a very particular job to actually require what C++0x is planning to make the default.

"How many developers are able to write from scratch at least primitive double-checked initialization, clearly explain why they write it this way, and why it is guaranteed to work?"
How about the following (I failed to come up with a reason why I would need anything beyond release/acquire):

[cpp]static mutex m;
static atomic s_apT;

T* getInstance() {
  T* l_pT;
  if(NULL==(l_pT = s_apT.load())) { // match store()
    mutex::lock l(m); // only one thread should create
    if(NULL==(l_pT = s_apT.load())) { // match store()
      s_apT.store(l_pT = new T()); // don't publish until construction complete
    }
  }
  return l_pT; // object data acquired or locally created
}
[/cpp]

"I can't understand why you and Raf are blaming the specification. It's not the specification that is complicated, difficult, unintelligible, non-intuitive, etc-etc. It's the domain itself (thread interleavings, absence of global order, mutual ordering, reorderings, control-dependencies, data-dependencies, atomicity, etc-etc)."
Both, I guess. But it's like, e.g., lift over an airfoil: you don't really need Navier-Stokes and circulation theories to make sense of it, that's just gobbledygook to impress people, without imparting any useful insight like the causal relationship mainly from lower pressure to higher airspeed instead of the other way around (mainly, because they reinforce each other).

"One can make complicated specification for simple thing (bad). One can make simple specification for simple thing (good). One can make complicated specification for complicated thing (good). But one can NOT make simple specification for complicated in itself thing (impossible and basically what you and Raf are proposing)."
If you write a program in assembler using fences, would you still use these theories? That's your prerogative, of course, but why impose them? Why shouldn't anybody be allowed to construct a cantilever-spar cable-stayed bridge if he so prefers, based on general physics, instead of being forced into a suspended-deck suspension bridge because the theorists decided that all bridges should be like that, so surely the behaviour of any other type of bridge must be "undefined"?

"Relaxed memory models are somehow similar to the theory of relativity (time is property of location). Is there simple descriptions of theory of relativity which are accessible to housewifes?"
A symptom of a derogatory attitude, perhaps?

"C++ memory working group put ennormous effort into making specification at least not over complicated (further than the domain itself), into making it self-consistent and integral, into providing high-level ("what") specification and not low-level ("how") specification."
I don't want effort, I want results. And why should it even be high-level: maybe I do want access to all that basic physics can offer, instead of getting a CAD system that can only design suspended-deck suspension bridges and is clueless about any other design. (I'm big on metaphores.)

"Making specification on low-level ("how") is generally a bad thing. For example, if it will be based on "fences" (which define possible local reorderings around them), then what to do with implementation which does not provide correct ordering but then has some post-mortem means to recover from inconsistency (trick actually used on Alpha)? User anyway will not be able to look at the assembly code and figure out what is happening."
I may have created confusion by using the word "fence" where I meant conceptual fence, not necessarily identifiable as such on a particular architecture, i.e., I would consider "release" a basic operation. In my defence, I have to point out that it is rare for any text to discuss what a fence actually is, or more to the point, what varieties exist. I think it should be possible to reason in terms of conceptual fences, which can then be conservatively mapped to whatever the machine provides, even if the opposite direction is too hard, like a disassembler is never able to fully reconstruct the original program even if the code was not deliberately obfuscated.

"Or what to do with absence of global order? In the presence of absence of global order model anyway does not reduces to the simple "fences + sequential consistency" model."
Well, then it would simply be "fences + something else". Sequential consistency may be desirable, but it is not a panacea for instant insight.

"In any case, if you do not need/want/able to use low-level memory model, C++09 also specifies simple subset - sequentially consistent operations or even just mutexes (basically POSIX). You are always allowed to stay on that level."
I don't think that is what this discussion is about.

"Blaming is easy. Orders of magnitude simplier specifications for relaxed memory models (as well as theory of relativity and quantuum physics) are welcome."
What is so terrible about simply using what already exists: compiler fences (opaque function calls or special assembler code) and an atomics library that generates some conservative approximations. How much is to be gained (literally: how much performance can the compiler create) by forcing the programmer to buy into a model that can do some things, but that also decrees that whatever else he wants to do, and previously successfully did, is now undefined? It's the difference between "here are the physics, and if you want you can buy a CAD system of your choice to help you figure it all out" and "we won't tell you about the physics, you'll just have to buy into our system and choose something within its range of solutions".

0 Kudos
RafSchietekat
Valued Contributor III
1,610 Views
#12 "Ok, and what happens when you calling virtual method of the already destroyed object? LOL. Defining results of data races as undefined does not add nor cut anything from current C++. If you care about your files you may program in sandboxed javascript."
There was a time that only doing dangerous things produced undefined program behaviour (I may be romanticising), but now it's everything outside the arbitrary subset that was chosen. You don't need that totalitarian viewpoint to avoid an elementary mistake like calling an object being or already having been destroyed. The current draft is panicky and hysterical, because, e.g., when two threads can write to the same int at the same time, only the value of the int should be undefined, not the behaviour of the program.
0 Kudos
robert_jay_gould
Beginner
1,610 Views
Quoting - Dmitriy Vyukov
How many of your colleges are able to read, understand and write synchronization which rely on low-level atomics with fine-grained memory ordering constraints? How do you think how many of the C++ overall are able to do this? How many developers are able to write from scratch at least primitive double-checked initialization, clearly explain why they write it this way, and why it is guaranteed to work?

This is THE problem, I'd say about a third of my co-workers have been using C/C++ for over a decade (myself included), and another third for half a decade, and none of us could get this right in a our first 20 attempts without feeling the compiler or hardware might betray us, so we're allparanoid. A language that makes its user paranoid is fundamentally doing something wrong.

Relaxed memory models are somehow similar to the theory of relativity (time is property of location). Is there simple descriptions of theory of relativity which are accessible to housewifes?

How about that model where you put a driver in a "super-duper-ultra-turbo" fast car, and when he stops his watch's time will be different from the coach's watch, or the model where an astronaut (or cosmonaut) that travel to another star at the speed of light and when they come back everyone they knew is old or dead? Everyone knows these examples, and they are more than enough for everyone to understand that such aphenomenonexists, and more or less the consequences, of course not enough to do the math, but our atomics model should be like this too. The average programmer with 5 years (even this much required experience is crazy) of C/C++ under their belt should haveabsolutelyno problem using this sort of stuff, but that's not the case.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views
Quoting - Raf Schietekat
#12 "Ok, and what happens when you calling virtual method of the already destroyed object? LOL. Defining results of data races as undefined does not add nor cut anything from current C++. If you care about your files you may program in sandboxed javascript."
There was a time that only doing dangerous things produced undefined program behaviour (I may be romanticising), but now it's everything outside the arbitrary subset that was chosen. You don't need that totalitarian viewpoint to avoid an elementary mistake like calling an object being or already having been destroyed. The current draft is panicky and hysterical, because, e.g., when two threads can write to the same int at the same time, only the value of the int should be undefined, not the behaviour of the program.

Ok, consider following code:

char placeholder [sizeof(object)]; // alignment aside for a moment
object* g_obj; // = 0

// thread 1
g_obj = new (placeholder) object;


// thread 2
if (g_obj)
{
if (g_obj == &placeholder)
g_obj->virtual_func();
}

How would you define behavior of the program? Provide formal specification which will completely define behavior.

Note that the loaded value is completely defined here - we know that it's *right* pointer. The problem is deeper here, it's not only about separate undefined values. Anyway, if program contains some undefined values, does it make sense to reason about it behavior further? Undefined value will be used is some way in the program, so basically uncertainty will propagate through the program like the plague.

Well, yes, specification might provide 100 page description of all possible combinations of threads interaction and involved races and for some of them define behavior, for some of them define several possible behaviors, and for some of them still say just UB (see example above). Hmmm... was not you saying that specification must not be overloaded?
And what to do with virtual functions, etc? How would you define behavior which involves virtual functions provided that their implementation is implementation defined?

If you are loading int and then pass it to the printf, then we may say that the output value will be undefined. And what if we are loading char*? You must be insane if you want to cope with this in the specification. Ok, probably not, show your variant of the spec. What I see todate is that IMHO you want to underspecify significant moments, overspecify insignificant moments and bind abstract specification to some particular implementation...

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views

This is THE problem, I'd say about a third of my co-workers have been using C/C++ for over a decade (myself included), and another third for half a decade, and none of us could get this right in a our first 20 attempts without feeling the compiler or hardware might betray us, so we're allparanoid. A language that makes its user paranoid is fundamentally doing something wrong.



I am not sure I get you. So since the matter is too complicated in itself you proposing just to eliminate it from the specification and provide POSIX-like mutex-based simple model, right?


0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,610 Views
How about that model where you put a driver in a "super-duper-ultra-turbo" fast car, and when he stops his watch's time will be different from the coach's watch, or the model where an astronaut (or cosmonaut) that travel to another star at the speed of light and when they come back everyone they knew is old or dead? Everyone knows these examples, and they are more than enough for everyone to understand that such aphenomenonexists, and more or less the consequences, of course not enough to do the math, but our atomics model should be like this too. The average programmer with 5 years (even this much required experience is crazy) of C/C++ under their belt should haveabsolutelyno problem using this sort of stuff, but that's not the case.

Ah, I see. So by formal specification you mean just some examples which indeed show that the matter itself is brain-damaging complicated and totally dis-coordinates with prior knowledge. Catch the formal specification:
"Different threads may see actions in different orders. If several threads write to the variable then it may contain garbage. Program order is not respected as observed by other threads."


0 Kudos
robert_jay_gould
Beginner
1,579 Views
Quoting - Dmitriy Vyukov

Ah, I see. So by formal specification you mean just some examples which indeed show that the matter itself is brain-damaging complicated and totally dis-coordinates with prior knowledge. Catch the formal specification:
"Different threads may see actions in different orders. If several threads write to the variable then it may contain garbage. Program order is not respected as observed by other threads."



There we go, see it can be explained in simple terms. Not saying that this is enough for a standard, or a formal documentation of any type, but just that it can be explained in terms the even the layperson can understand.
0 Kudos
Reply