A question about acquire/release semantic

azru0512 · ‎07-28-2010

I have a question about the acquire/release semantic.

According to the Intel TBB Reference manual. The "acquire" means operations after the atomic operation never move over it.And the "release"means operationsbefore the atomic operation never move over it.

What are the real meaning of "acquire" and "release"? For example,

atomic ready;
int msg;

P1 P2

msg = 14; while (!ready) ; // read with acquire
ready = true; // store with releaseinta =msg;

Does it mean "msg = 14" have committed to the memory before "ready = true"?

What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?

Thanks.

RafSchietekat · ‎07-28-2010

"Does it mean "msg = 14" have committed to the memory before "ready = true"?"
Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": it might all be smoke and mirrors, an illusion carefully crafted by coherent-cache logic, but only for the participants who follow the rules.

"What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?"
If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches you would at least also need to know that stores are not mutually reordered in P1's processor's write buffer, and that reads are not mutually reordered due to prefetching or so in P2's processor. If you have such an architecture, acquire and release will probably not be more expensive, because everything already is anyway, so you might as well just use them and not assume anything about the environment.

azru0512 · ‎07-29-2010

"Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": ..."

So how "operations after/before the atomic operation never move over it" could be done depends on the system? Just make things appear to be so no matter how this could be done? Is that what you meaned?

"If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches..."

Do you mean if there is an architecture has non-coherence caches and do not reorder read/write operations, then we can use the shared cache as a communication channel?

Sorry, Idon't totally understandwhat you said. Could you give me more explanation? Thanks.

Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy thebenifits brought by shared cache at the same time?

Thanks again.

Dmitry_Vyukov · ‎07-29-2010

Quoting azru0512

Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy thebenifits brought by shared cache at the same time?

Shared cache if present is always used. All stores (plain, atomic or whatever) always go to cache on write-back memory type.

RafSchietekat · ‎07-29-2010

"Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy the benifits brought by shared cache at the same time?"
Intel Architecture keeps writes mutually ordered, reads mutually ordered, and has a coherent cache. The implementation of atomic has nothing else to do (for these specific operations anyway!) than preventing the compiler from being too smart for the situation (the generated machine code will not contain anything that looks any different from serial code, it just won't be optimised so much that it won't do what you want anymore), so you have no disincentive from writing portable code using atomic there. (Don't use volatile: that's compiler-specific.)

azru0512 · ‎07-29-2010

So in above example, "msg = 14" reach cacheeariler than "ready = true". Is that what you meaned?

RafSchietekat · ‎07-29-2010

"So in above example, "msg = 14" reach cache eariler than "ready = true". Is that what you meaned?"
No, why? I meant that if on a particular architecture you're already paying extra during normal operation by foregoing possible optimisations related to reordering instructions, atomic release-store and load-acquire are not going to add more overhead (even if other kinds of atomic operations still might), and they will save the day when you move to any more adventurous architecture, even Intel's own Itanium, so I don't see any reason not to use them.

(Added 2010-07-30) Whoops, sorry, I understoodyou the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic.

Dmitry_Vyukov · ‎07-29-2010

Quoting azru0512

So in above example, "msg = 14" reach cacheeariler than "ready = true".

Indeed. It's cache where "memory subsystem" begins from the point of view of an execution core in a modern architecture.

azru0512 · ‎07-30-2010

Maybe you misunderstood what I said.

I mean in the above example (ready is an ordered atomic variable), if "msg = 14" reach cache eariler than "ready = true".

Dmitry_Vyukov · ‎07-30-2010

Quoting azru0512

I mean in the above example (ready is an ordered atomic variable), if "msg = 14" reach cache eariler than "ready = true".

Yes, "msg=14" should reach cache (memory subsystem) earlier than "ready=true" (release semantics for 'ready').

And accordingly on the consumer side request for loading 'ready' should reach cache (memory subsystem) earlier than request for loading 'msg' (acquire semantics for 'ready').

There may be some user-invisible speculations on an implementation level, though.

azru0512 · ‎07-30-2010

As mentioned by Tian in the followingarticle,

http://www.drdobbs.com/high-performance-computing/196902836

There is an opprtunity that shared cache could be a communication between cores. And this means "msg" can be transmitted through the shared cache.

Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?

Thanks.

Dmitry_Vyukov · ‎07-30-2010

Quoting azru0512

Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?

In general, yes, communication via shared cache can efficient and "not so efficient". The details are involved, it depends on whether a system has exclusive, inclusive or hybrid cache, whether a system has Owner cache state, etc. I'm unable to go that deep, perhaps you will get more definitive answers if you ask over comp.arch.

azru0512 · ‎07-30-2010

Thanks anyway. : )

RafSchietekat · ‎07-30-2010

I just added this to #6: "(Added 2010-07-30) Whoops, sorry, I understood you the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic."