According to the Intel TBB Reference manual. The "acquire" means operations after the atomic operation never move over it.And the "release"means operationsbefore the atomic operation never move over it.
What are the real meaning of "acquire" and "release"? For example,
msg = 14; while (!ready) ; // read with acquire
ready = true; // store with releaseinta =msg;
Does it mean "msg = 14" have committed to the memory before "ready = true"?
What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?
Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": it might all be smoke and mirrors, an illusion carefully crafted by coherent-cache logic, but only for the participants who follow the rules.
"What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?"
If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches you would at least also need to know that stores are not mutually reordered in P1's processor's write buffer, and that reads are not mutually reordered due to prefetching or so in P2's processor. If you have such an architecture, acquire and release will probably not be more expensive, because everything already is anyway, so you might as well just use them and not assume anything about the environment.
"Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": ..."
So how "operations after/before the atomic operation never move over it" could be done depends on the system? Just make things appear to be so no matter how this could be done? Is that what you meaned?
"If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches..."
Do you mean if there is an architecture has non-coherence caches and do not reorder read/write operations, then we can use the shared cache as a communication channel?
Sorry, Idon't totally understandwhat you said. Could you give me more explanation? Thanks.
Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic
"Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic
No, why? I meant that if on a particular architecture you're already paying extra during normal operation by foregoing possible optimisations related to reordering instructions, atomic release-store and load-acquire are not going to add more overhead (even if other kinds of atomic operations still might), and they will save the day when you move to any more adventurous architecture, even Intel's own Itanium, so I don't see any reason not to use them.
(Added 2010-07-30) Whoops, sorry, I understoodyou the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic.
Yes, "msg=14" should reach cache (memory subsystem) earlier than "ready=true" (release semantics for 'ready').
And accordingly on the consumer side request for loading 'ready' should reach cache (memory subsystem) earlier than request for loading 'msg' (acquire semantics for 'ready').
There may be some user-invisible speculations on an implementation level, though.
There is an opprtunity that shared cache could be a communication between cores. And this means "msg" can be transmitted through the shared cache.
Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?
In general, yes, communication via shared cache can efficient and "not so efficient". The details are involved, it depends on whether a system has exclusive, inclusive or hybrid cache, whether a system has Owner cache state, etc. I'm unable to go that deep, perhaps you will get more definitive answers if you ask over comp.arch.