Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Spin-Locks vs. Atomic Instructions

AJ13
New Contributor I
3,978 Views
Hello all,

I have been implementing some algorithms from "The Art of Multiprocessor Programming" in TBB. I have found in this book, however, that it relies sometimes upon the Java volatile keyword. The best way to mimick this functionality in TBB, it seems, is to have a spin-lock wrapped around the variable that would have been volatile in Java.

I have found that the limited size of atomic variables is a problem with even simple algorithms. For instance, if I want to atomically alter two variables I need to pack them into a single atomic. What I have done so far, is use spin-locks around the variables that I would normally want to alter atomically. How well would this scale as the number of threads/cores increases?

Is software transactional memory worth looking at, performance wise? Can it really offer any better performance than spin-locks?

Do I have other alternatives to atomic/spin-locks in writing fast parallel algorithms?

Thanks,

AJ
0 Kudos
35 Replies
RafSchietekat
Valued Contributor III
978 Views
Well, I shouldn't have referred to that discussion without first reading it through, sorry, I just thought that maybe you just forgot that you really agree with me. :-)
0 Kudos
RafSchietekat
Valued Contributor III
978 Views
I don't recall ever having that much trouble with quantum physics...

On the one hand you disagree with "a normal load before a Java volatile load or a normal store after a Java volatile store can be reordered" because of the reasons you give in #17, and on the other hand those exact same reorderings in the Cookbook "do not affect formal semantics".

I used to call that a direct contradiction, before I entered the twilight zone of memory semantics.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
978 Views
Quoting - Raf Schietekat
I don't recall ever having that much trouble with quantum physics...

On the one hand you disagree with "a normal load before a Java volatile load or a normal store after a Java volatile store can be reordered" because of the reasons you give in #17, and on the other hand those exact same reorderings in the Cookbook "do not affect formal semantics".

I used to call that a direct contradiction, before I entered the twilight zone of memory semantics.

Quantum physics relates to real-world so it has to be consistent. Formal memory models do not.

Some reorderings may happen on machine instruction level, but you are not able to detect whether they happen or not. So, on one hand, yes, they happen. On the other hand, you can think that they do not happen. Is it contradiction? I don't know.

0 Kudos
RafSchietekat
Valued Contributor III
978 Views
So formal memory models do not have to be consistent, and you can be an expert without being able to spot a contradiction? To me that sounds more like astrology than computer science, and I refuse to accept that.

Here's another one, from a FAQ for us dummies: "Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire." Nice and consistent with my observation from the Cookbook "that a normal load before a Java volatile load or a normal store after a Java volatile store can be reordered" (as I interpreted it).

Yet you specifically "do not agree" (#17). So who is mistaken here? You can't all be right, or can you?
0 Kudos
Dmitry_Vyukov
Valued Contributor I
978 Views
Quoting - Raf Schietekat
So formal memory models do not have to be consistent, and you can be an expert without being able to spot a contradiction? To me that sounds more like astrology than computer science, and I refuse to accept that.

Here's another one, from a FAQ for us dummies: "Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire." Nice and consistent with my observation from the Cookbook "that a normal load before a Java volatile load or a normal store after a Java volatile store can be reordered" (as I interpreted it).

Yet you specifically "do not agree" (#17). So who is mistaken here? You can't all be right, or can you?

Ok, let's narrow it down one more time: you are interested in machine instruction level (you are compiler writer)? Or you are interested in program semantics (you are end user, programmer)? The answer will vary depending on your choice. If you are compiler writer forget about specification, use cookbook. If you are programmer forget about cookbook, use specification. But then do not try to extrapolate rules of one world to another.



0 Kudos
RafSchietekat
Valued Contributor III
978 Views
That's ridiculous: why would these points of view not be in complete agreement? Well, they could both be providing a partial view depending on what hat you're wearing, but then there should also be a description from which both can be deduced. In this case I am still assuming a priori that a Cookbook can be deduced from the Java Language Specification, and that Doug Lea's Cookbook is a correct deduction.

I think that in #17 you were misinterpreting happens-before, maybe because it has such a horribly deceptive name: even within section 17.4.5 it is made clear that this relationship is only a building block for the specification, not a final description of what reorderings are allowed, and it is not even true in general that (as you write) "you are not able to detect whether they happen or not".

0 Kudos
Dmitry_Vyukov
Valued Contributor I
978 Views
Quoting - Raf Schietekat
That's ridiculous: why would these points of view not be in complete agreement? Well, they could both be providing a partial view depending on what hat you're wearing, but then there should also be a description from which both can be deduced. In this case I am still assuming a priori that a Cookbook can be deduced from the Java Language Specification, and that Doug Lea's Cookbook is a correct deduction.

I think that in #17 you were misinterpreting happens-before, maybe because it has such a horribly deceptive name: even within section 17.4.5 it is made clear that this relationship is only a building block for the specification, not a final description of what reorderings are allowed, and it is not even true in general that (as you write) "you are not able to detect whether they happen or not".



Ok, another ridiculous example: single-threaded instruction reordering by optimizing compiler (hmmm... probably it's actually the same example, but I hope the point is more clear here). Do they happen? The answer depends on who you asking. If you will ask compiler writer, then you will get "Yes, of course! We are doing tons of reordering in optimizing mode". If you will ask programmer, then you will get "No, definitely no! How it's possible that I am writing one program and compiler reorders some operations?! The execution of my single-threaded program is always sequentially consistent, so NO reordering is allowed!".
So you get 2 opposite answers. Is it ridiculous? Is is contradiction?

0 Kudos
RafSchietekat
Valued Contributor III
978 Views
The compiler writer should know better than to frighten the innocent programmer with incomplete information. It's even worse if he doesn't provide a more specific answer when prompted. It's a failure to communicate effectively.

Actually, the programmer is also wrong: only the observed order of sequence points will be preserved.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
978 Views
Quoting - Raf Schietekat
The compiler writer should know better than to frighten the innocent programmer with incomplete information. It's even worse if he doesn't provide a more specific answer when prompted. It's a failure to communicate effectively.

As you can see, this does not cause any problems in real life over tens of years with millions of programmers. So I believe here is no problem at all. And the glue here is "documented/observable behavior", the task of programmer is to rely on documented/observable behavior, the task of compiler writer is to preserve documented/observable behavior, nothing more and nothing less.
Java specification says nothing about reordering of machine instructions in machine code, it only declares visible sequential consistency. And you are trying to combine low-level reordering on machine instruction level with high-level observable behavior.

0 Kudos
RafSchietekat
Valued Contributor III
978 Views

"Ne supra crepidam sutor iudicaret." - Plinius

Apparently you want the cobbler to stick to his last: a programmer should stay in his cubicle and is not supposed to be thinking for himself other than within the boundaries set for him by his betters. But the serial-code comparison is flawed (it does not address the reorderings that are visible, let alone the possible problems in the context of multithreaded code), and you draw the wrong conclusions (I say that the compiler writer should have given more complete information, and you say that "as can see" it doesn't matter), so what is the relevance, other than creating a distraction?

Of course I am trying to combine both viewpoints! Why shouldn't a user be allowed to consider what might happen if a normal store follows a volatile store, and why, and what the corresponding machine code will be? I do not subscribe to the idea that specifications should be obscure, and that knowledge should trickle down a hierarchy from high priesthood to simpleton programmers. A significant number of the latter category should be able to go directly to the specification to resolve issues, and this specification should be as transparent as it can be (I have my doubts). I see contradictions between what you say and what even a FAQ says (I referred to a FAQ for programmers as well as to the Cookbook supposedly for compiler writers), and I should be able to say that you can't both be right (meant as a euphemism). You also misinterpreted the "high-level" specification, because you confused "happens-before" with observable behaviour by invoking this part of the specification for disagreeing with my example.

(To the administrators: can we sell BBB PPV tickets? :-) )

0 Kudos
Dmitry_Vyukov
Valued Contributor I
978 Views

Quoting - Raf Schietekat

"Ne supra crepidam sutor iudicaret." - Plinius

Apparently you want the cobbler to stick to his last: a programmer should stay in his cubicle and is not supposed to be thinking for himself other than within the boundaries set for him by his betters. But the serial-code comparison is flawed (it does not address the reorderings that are visible, let alone the possible problems in the context of multithreaded code), and you draw the wrong conclusions (I say that the compiler writer should have given more complete information, and you say that "as can see" it doesn't matter), so what is the relevance, other than creating a distraction?

Of course I am trying to combine both viewpoints! Why shouldn't a user be allowed to consider what might happen if a normal store follows a volatile store, and why, and what the corresponding machine code will be? I do not subscribe to the idea that specifications should be obscure, and that knowledge should trickle down a hierarchy from high priesthood to simpleton programmers. A significant number of the latter category should be able to go directly to the specification to resolve issues, and this specification should be as transparent as it can be (I have my doubts). I see contradictions between what you say and what even a FAQ says (I referred to a FAQ for programmers as well as to the Cookbook supposedly for compiler writers), and I should be able to say that you can't both be right (meant as a euphemism). You also misinterpreted the "high-level" specification, because you confused "happens-before" with observable behaviour by invoking this part of the specification for disagreeing with my example.

(To the administrators: can we sell BBB PPV tickets? :-) )


Sorry, I can't read Latin.

I don't understand what you want me to say :) I can try to make some more statements, probably I will guess:

Programmer is allowed to look at the generated machine code.

The relevance of the example with single-threaded reordering: some reorderings are visible nowhere except machine code. So of little interest to anybody except compiler developers.

Normal store may hoist above volatile store in the machine code. Normal load may sink below volatile load in the machine code.

Above fact is of little interest for programmer.

Programmer may work on a compiler and hardware platform which do not do above mentioned reordering for years. And them move to the compiler and/of hardware which do. Nothing will change for programmer.

Java volatiles do provide sequential consistency for all memory accesses (including surrounding normal memory accesses).

Happens-before is the most significant determinant of observable behaviour regarding multithreading. Because it determines "allowed to observe" ("visible side effect" or "visible sequence of side-effects") and presence/absence of data races (i.e. basically everything what memory model defines regarding multithreading).

0 Kudos
RafSchietekat
Valued Contributor III
978 Views
I took some time to read section 17 of the Java Language Specification. Jolly! Well, not quite. I still have the impression that many of the formal specifications are just magical incantations for their own sake, where more mundane descriptions, perhaps even in terms of fences, could suffice (an implementer would still be allowed to move those fences around and even eliminate some, of course, but that's just because he wants to lure customers, who can still write histheir programs against a more straightforward contract, unaware of what really happens underneath for the better performance to happen). I'm sure that the authors are very likeable when you meet them in person, but why did they feel the need to impose those theoretical formalisms on us non-academics?

One statement I found particularly dubious: "This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races." The guarantee is about "A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.", so this means that the programmer does not need to reason about reorderings if he just checks... all possible sequentially consistent executions!

I would certainly like to see some evidence about whether it is really necessary to describe the behaviour this way, that it's not just showing off. Now I'm left wondering if this emperor is wearing clothes at all.

"Sorry, I can't read Latin."
I was just trying to find out the English equivalent of a Dutch expression, found a literal onetranslation (just after the Latin), got suspicious, and traced it back to its origins.

"I don't understand what you want me to say"
For example, that the distinction between formal semantics and possible execution was artificial?

"Programmer is allowed to look at the generated machine code."
Not just machine code: I consider the conceptual fences just another way of representing program semantics; a library has to make conservative corresponding choices, but a compiler might be able to perform much more significant optimisations.

"Normal store may hoist above volatile store in the machine code. Normal load may sink below volatile load in the machine code."
Now you say that, but previously you insisted about volatiles that: "They order all memory accesses." They order only some (those in correctly synchronised code).

"Above fact is of little interest for programmer."
It's just one way to reason about this. You shouldn't prescribe how people should think if one formalism is as good as the other.

"Programmer may work on a compiler and hardware platform which do not do above mentioned reordering for years. And them move to the compiler and/of hardware which do. Nothing will change for programmer."
That still applies if you specify things in terms of conceptual fences etc.

"Java volatiles do provide sequential consistency for all memory accesses (including surrounding normal memory accesses)."
They help, but you still have to "synchronise correctly". By itself, the compiler is not smart enough to diagnose incorrectly synchronised code, so the specification does not guarantee sequential consistency, only something like happens-before consistency.

"Happens-before is the most significant determinant of observable behaviour regarding multithreading. Because it determines "allowed to observe" ("visible side effect" or "visible sequence of side-effects") and presence/absence of data races (i.e. basically everything what memory model defines regarding multithreading)."
But you forget that for "happens-before" to have real meaning, the observer must share a happens-before relationship with the other actions, otherwise it may observe that the normal write moves up across the volatile write. Maybe that's an assumption you made, but I'm not telepathic, I can only go by what you write.

Note that we're talking about Java here, where the semantics are less flexible and much of the specification still moderately human, but in C++ the situation seems more problematic, without good reason: unlike for Java, a perfectly good specification can be made in terms of mostly pre-existing concepts like "volatile" side effects (or is it necessary to be able to coalesce atomic accesses?), external-function-like compiler fences (or an equivalent like an empty GNU asm statement that clobbers all memory), and some low-level details about what fences can do. It doesn't need so much new stuff at the language level (one can write an atomics library with pre-C++0x facilities), so this feels more like assassination by committee than good design to me. It will likely get accepted, though, because the voters will either have drunk the Kool-Aid or don't want to admit that they don't understand it. Convincingly arguing against it seems a lot more difficult (and I don't feel inclined to go bang my head against the wall).
0 Kudos
Dmitry_Vyukov
Valued Contributor I
978 Views
"I would certainly like to see some evidence about whether it is really necessary to describe the behaviour this way, that it's not just showing off. Now I'm left wondering if this emperor is wearing clothes at all."

Propose your simpler specification.


""I don't understand what you want me to say"
For example, that the distinction between formal semantics and possible execution was artificial?"

It's not artificial. Example with single-threaded optimizations: formal semantics is than NO reordering possible (further than allowed by partial order of expressions), possible executions mock at it (just look at the compiled code).


""Programmer is allowed to look at the generated machine code."
Not just machine code: I consider the conceptual fences just another way of representing program semantics; a library has to make conservative corresponding choices, but a compiler might be able to perform much more significant optimisations."

Ok, is there ANY reason to include into specification the statement that normal stores may hoist above volatile store? Other than additional complication.


""Normal store may hoist above volatile store in the machine code. Normal load may sink below volatile load in the machine code."
Now you say that, but previously you insisted about volatiles that: "They order all memory accesses." They order only some (those in correctly synchronised code)."

These 2 statements are not conflicting. Notice "in the machine code" part.



""Above fact is of little interest for programmer."
It's just one way to reason about this. You shouldn't prescribe how people should think if one formalism is as good as the other."

I don't think that the "formalism" with fences (described in the cookbook) is as good as the specification. What comes to mind: fences say nothing about global order, Sparc RMO model is not the weakest so reasoning based on is not suitable for weaker machines (your program will crash on weaker machines), Sparc membars are not the bast way to describe semantics (for example, #LoadLoad|#LoadStore is actually much stronger than required load-acquire - it's terrible for the abstract memory model to force stronger ordering), there are more modern ISAs based on acquire/release primitives (ld.acq does NOT implements '#LoadLoad|#LoadStore' - use stronger fence!!!). I believe there are much more reasons.




""Java volatiles do provide sequential consistency for all memory accesses (including surrounding normal memory accesses)."
They help, but you still have to "synchronise correctly". By itself, the compiler is not smart enough to diagnose incorrectly synchronised code, so the specification does not guarantee sequential consistency, only something like happens-before consistency."

If there is something "not correct" in the program, then who cares? Java just had to provide semantics for "not correct" programs, but the reason is not that one must write not correct programs, but just because one don't want "non correct" applet downloaded from web format his harddrive. C++09 does not cares about "non correct" programs at all - it's just UB.



""Happens-before is the most significant determinant of observable behaviour regarding multithreading. Because it determines "allowed to observe" ("visible side effect" or "visible sequence of side-effects") and presence/absence of data races (i.e. basically everything what memory model defines regarding multithreading)."
But you forget that for "happens-before" to have real meaning, the observer must share a happens-before relationship with the other actions, otherwise it may observe that the normal write moves up across the volatile write. Maybe that's an assumption you made, but I'm not telepathic, I can only go by what you write."

And how then you interpreted it? If program contains single volatile access somewhere than it's totally sequentially consistent?


"Note that we're talking about Java here, where the semantics are less flexible and much of the specification still moderately human, but in C++ the situation seems more problematic, without good reason: unlike for Java, a perfectly good specification can be made in terms of mostly pre-existing concepts like "volatile" side effects (or is it necessary to be able to coalesce atomic accesses?), external-function-like compiler fences (or an equivalent like an empty GNU asm statement that clobbers all memory), and some low-level details about what fences can do. It doesn't need so much new stuff at the language level (one can write an atomics library with pre-C++0x facilities), so this feels more like assassination by committee than good design to me. It will likely get accepted, though, because the voters will either have drunk the Kool-Aid or don't want to admit that they don't understand it. Convincingly arguing against it seems a lot more difficult (and I don't feel inclined to go bang my head against the wall)."

Ok, propose your full specification. Then we will see which is simplier.
C++09 exactly learns from "simplier" specifications, one of which is constantly patched, and the other everybody complains too simple to be useful.

0 Kudos
RafSchietekat
Valued Contributor III
978 Views
"Propose your simpler specification."
I think that hardly anything at all is needed at the language level (KISS), just some essential details about memory locations, starting and stopping a program, etc. Sugar for thread-local storage is also nice to have, although all this still perpetuates the myth that threads are the user's way to concurrency, not just the underlying infrastructure. Most of "1.10 Multi-threaded execution" should be removed though, or relegated to a non-normative appendix. A supporting library can easily use existing properties, like compiler fences (the equivalent of calling a function whose implementation is unknown, mandating the compiler to make everything externally visible, so that inter-thread actions have access to all reachable state and cannot be erroneously optimised away, although maybe a volatile qualification in an atomic might help as well), to make this a hardware problem, and then it can use hardware instructions to tell that hardware how to behave. It has already been demonstrated that way, and it works well with the idea of C/C++ as a language close to the hardware. I think that the burden is on whoever wants to make those big changes to prove that they are required (and consistent with the possibility of code compiled or assembled from other languages, not subject to the same formal specifications, existing in the same program).

"It's not artificial. Example with single-threaded optimizations: formal semantics is than NO reordering possible (further than allowed by partial order of expressions), possible executions mock at it (just look at the compiled code)."
It hardly appears "formal" to require that execution should appear, as far as side effects are concerned, to occur in program order between sequence points. BTW, I am not advocating to specify the allowed compiled code, or exact placement of fences, just a straightforward specification instead of one that appears to be floating in the air, one that is written for programmers instead of predigested for an arguably far more limited audience. Did I miss anything in the pre-multithreading specification about code transformation into data-flow graphs and a formal specification of the set of well-formed executions? I don't think so. Some people need to reason about the code in those terms, but they should probably also be capable of reasoning in terms of an integration of the programmer's view and the compiler writer's view. Compiler vendors can still form their own A-teams to do all kinds of magic behind the scenes, based on all manner of elaborate graph theories and theorem provers, to shave off some more time here and there, but I as a user don't want to be bothered with those theories, and probably neither does a debug build.

"Ok, is there ANY reason to include into specification the statement that normal stores may hoist above volatile store? Other than additional complication."
Not specifically, but you made a big point of (erroneously) saying that they can't. It should be straightforward to shift between reasoning in terms of happens-before and in terms of reordering.

"These 2 statements are not conflicting. Notice "in the machine code" part."
There is no essential difference between source code and machine code in this regard.

"I don't think that the "formalism" with fences (described in the cookbook) is as good as the specification. What comes to mind: fences say nothing about global order, Sparc RMO model is not the weakest so reasoning based on is not suitable for weaker machines (your program will crash on weaker machines), Sparc membars are not the bast way to describe semantics (for example, #LoadLoad|#LoadStore is actually much stronger than required load-acquire - it's terrible for the abstract memory model to force stronger ordering), there are more modern ISAs based on acquire/release primitives (ld.acq does NOT implements '#LoadLoad|#LoadStore' - use stronger fence!!!). I believe there are much more reasons."
How do you mean that "fences say nothing about global order" (surely the language has to be executed at some point)? Maybe the Cookbook is oriented toward a particular machine model, but is there any reason why that objection could not (easily) be met by substituting more specific abstract fences, e.g., let's call it "anchored acquire" (where the acquire specifies the variable on which the corresponding release must have occurred), and giving a similar list of conservative approximations? But what situation (in Java!) would be able to make use of something less powerful than #LoadLoad|#LoadStore, if #LoadLoad is apparently already implicitly used as anchored (ld.acq, cheaper than an unanchored #LoadLoad, right?): at first sight Itanium will be actually more expensive because #LoadLoad|#LoadStore seems to require adding mf (semi-rhetorical question)?

"If there is something "not correct" in the program, then who cares?"
As long as the compiler does not reject programs that are not correctly synchronized by its own definition, I would like to know that the behaviour is still defined, so that I can reason about what it will do. As a human, I may even be able to rule out possible races where even elaborate data flow analysis normally associated with optimising builds would not suffice (I concede that it may be more likely that a tool will find a human error than that a false positive occurs, but still...). Maybe the observing code does not participate in the main logic and can be cheaper without correct synchronisation (pure guess)? In short, I don't want to have to use a tool to prove correctness and hold my hand just to write some basic multithreaded code, even though I still want to have the option to use one like your own if I find myself in deep water.

"Java just had to provide semantics for "not correct" programs, but the reason is not that one must write not correct programs, but just because one don't want "non correct" applet downloaded from web format his harddrive. C++09 does not cares about "non correct" programs at all - it's just UB."
Sounds worrisome... What is "UB"?

"And how then you interpreted it? If program contains single volatile access somewhere than it's totally sequentially consistent?"
No, but I don't want to have to prove that the program is correctly synchronised just to create the illusion that its entire execution is sequentially consistent. Or at least I don't want a recommendation to correctly synchronise to also be a gag order about non-compliant code.

"Ok, propose your full specification. Then we will see which is simplier.
C++09 exactly learns from "simplier" specifications, one of which is constantly patched, and the other everybody complains too simple to be useful."
I gather that if fate had wanted me to play a role in the C++0x memory model, somebody would have told me to put down James Reinders' book (that's just for dramatic effect: I don't even remember whether I had it yet) and scoot over to where this was being voted into the working draft (that you can take literally). Also, a year and a half has passed since then, so I don't hold much hope for anything to come out of this, other than maybe some mental exercise and the opportunity to vent. So here goes:

Has anybody demonstrated any significant performance benefit of the present convoluted proposal relative to a simple one that uses the already existing full compiler fences (opaque calls to empty functions)? If not, what is the point, other than enabling the compiler vendors to say, like Rene, when he was caught in a compromising situation with Yvette, in the delightful "'Allo 'Allo!": "You stupid programmer! Don't you see that evaluation A is not dependency-ordered before evaluation B, because, even though A performs a release operation on M, and B performs a consume operation on M, it does not read any value written by any side effect in the release sequence headed by A, and neither is there an evaluation X that carries a dependency to B with A dependency-ordered before X? So how could you possibly think that A inter-thread happens before B?"? What is the point of making contracts this difficult (while none of STL is officially thread safe yet, in N2800 anyway, not even evaluating vector::operator[]), if not to hide behind them? Who benefits, other than the lawyers? Why should I not be bored by now with semantics (for all the right reasons), and only concerned with data structures and algorithms?

But if somebody can promise me a fair chance at influencing the outcome, I'll write up a counterproposal in full detail. Or is there something that already documents why it would not work, after all?
0 Kudos
RafSchietekat
Valued Contributor III
978 Views
Here's a flipped situation that N2800 curiously provides itself (in 29.1 Order and Consistency):

[ Note: The requirements do allow r1 == r2 == 42 in the following example, with x and y initially zero:

// Thread 1:
r1 = x.load(memory_order_relaxed);
if (r1 == 42) y.store(r1, memory_order_relaxed);

// Thread 2:
r2 = y.load(memory_order_relaxed);
if (r2 == 42) x.store(42, memory_order_relaxed);

However, implementations should not allow such behavior. - end note ]

Even disregarding the matter of that exhortation's appropriateness (how would you know that your code will survive in the wild if you let it grow up in a pampered environment?), it is probably entirely superfluous, or should be: why would any implementation even accidentally arrive at this result? This seems like just a manifestation of the craziness of the rules (and I don't mean crazy like a fox). In my book, writing to an atomic, even a relaxed one, would be like accessing a volatile, which would disallow speculation and hence out-of-thin-air outcomes (well, I hope?). That would rule out any other outcome than r1 == r2 == 0, and I don't understand by what twists and turns of perverted logic one was posited: how would two threads possibly interact (through the coherent cache and/or context switches!) for speculative execution in the multiverse to come up with this self-fulfilling prophecy and commit to its outcome? I can imagine speculative local writes, with support to prevent circular reasoning, but surely these writes would not be set free in an environment where such support is absent? Does anybody know a hardware company that might have somebody to enlighten us?

At some point I'll have to tune it down again, of course, but right now I'm saving on therapy sessions, and I'm hoping it's less boring this way. If not, please just tell me. Meanwhile, I have of course condemned myself to go deeper into this, anyway, and the scientific method dictates that I should try and find good reasons for this state of affairs, so I shall spend some more time reading, including some papers that were referred to me (or was I referred to them?). Hmm, hopefully at the end of all this I won't have to come in twice a week instead...
0 Kudos
Reply