Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
4 Views

A question about acquire/release semantic

I have a question about the acquire/release semantic.

According to the Intel TBB Reference manual. The "acquire" means operations after the atomic operation never move over it.And the "release"means operationsbefore the atomic operation never move over it.

What are the real meaning of "acquire" and "release"? For example,


atomic ready;
int msg;

P1 P2

msg = 14; while (!ready) ; // read with acquire
ready = true; // store with releaseinta =msg;

Does it mean "msg = 14" have committed to the memory before "ready = true"?

What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?

Thanks.
0 Kudos
13 Replies
Highlighted
Black Belt
4 Views

"Does it mean "msg = 14" have committed to the memory before "ready = true"?"
Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": it might all be smoke and mirrors, an illusion carefully crafted by coherent-cache logic, but only for the participants who follow the rules.

"What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?"
If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches you would at least also need to know that stores are not mutually reordered in P1's processor's write buffer, and that reads are not mutually reordered due to prefetching or so in P2's processor. If you have such an architecture, acquire and release will probably not be more expensive, because everything already is anyway, so you might as well just use them and not assume anything about the environment.
0 Kudos
Highlighted
Beginner
4 Views

"Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": ..."


So how "operations after/before the atomic operation never move over it" could be done depends on the system? Just make things appear to be so no matter how this could be done? Is that what you meaned?


"If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches..."

Do you mean if there is an architecture has non-coherence caches and do not reorder read/write operations, then we can use the shared cache as a communication channel?

Sorry, Idon't totally understandwhat you said. Could you give me more explanation? Thanks.

Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy thebenifits brought by shared cache at the same time?

Thanks again.
0 Kudos
Highlighted
Valued Contributor I
4 Views

Quoting azru0512
Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy thebenifits brought by shared cache at the same time?

Shared cache if present is always used. All stores (plain, atomic or whatever) always go to cache on write-back memory type.

0 Kudos
Highlighted
Black Belt
4 Views

"Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy the benifits brought by shared cache at the same time?"
Intel Architecture keeps writes mutually ordered, reads mutually ordered, and has a coherent cache. The implementation of atomic has nothing else to do (for these specific operations anyway!) than preventing the compiler from being too smart for the situation (the generated machine code will not contain anything that looks any different from serial code, it just won't be optimised so much that it won't do what you want anymore), so you have no disincentive from writing portable code using atomic there. (Don't use volatile: that's compiler-specific.)

0 Kudos
Highlighted
Beginner
4 Views

So in above example, "msg = 14" reach cacheeariler than "ready = true". Is that what you meaned?
0 Kudos
Highlighted
Black Belt
4 Views

"So in above example, "msg = 14" reach cache eariler than "ready = true". Is that what you meaned?"
No, why? I meant that if on a particular architecture you're already paying extra during normal operation by foregoing possible optimisations related to reordering instructions, atomic release-store and load-acquire are not going to add more overhead (even if other kinds of atomic operations still might), and they will save the day when you move to any more adventurous architecture, even Intel's own Itanium, so I don't see any reason not to use them.

(Added 2010-07-30) Whoops, sorry, I understoodyou the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic.
0 Kudos
Highlighted
Valued Contributor I
4 Views

Quoting azru0512
So in above example, "msg = 14" reach cacheeariler than "ready = true".

Indeed. It's cache where "memory subsystem" begins from the point of view of an execution core in a modern architecture.

0 Kudos
Highlighted
Beginner
4 Views

Maybe you misunderstood what I said.

I mean in the above example (ready is an ordered atomic variable), if "msg = 14" reach cache eariler than "ready = true".
0 Kudos
Highlighted
Valued Contributor I
4 Views

Quoting azru0512
I mean in the above example (ready is an ordered atomic variable), if "msg = 14" reach cache eariler than "ready = true".

Yes, "msg=14" should reach cache (memory subsystem) earlier than "ready=true" (release semantics for 'ready').

And accordingly on the consumer side request for loading 'ready' should reach cache (memory subsystem) earlier than request for loading 'msg' (acquire semantics for 'ready').

There may be some user-invisible speculations on an implementation level, though.

0 Kudos
Highlighted
Beginner
4 Views

As mentioned by Tian in the followingarticle,

http://www.drdobbs.com/high-performance-computing/196902836

There is an opprtunity that shared cache could be a communication between cores. And this means "msg" can be transmitted through the shared cache.

Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?

Thanks.
0 Kudos
Highlighted
Valued Contributor I
4 Views

Quoting azru0512
Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?

In general, yes, communication via shared cache can efficient and "not so efficient". The details are involved, it depends on whether a system has exclusive, inclusive or hybrid cache, whether a system has Owner cache state, etc. I'm unable to go that deep, perhaps you will get more definitive answers if you ask over comp.arch.

0 Kudos
Highlighted
Beginner
4 Views

Thanks anyway. : )
0 Kudos
Highlighted
Black Belt
4 Views

I just added this to #6: "(Added 2010-07-30) Whoops, sorry, I understood you the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic."
0 Kudos