Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

A question about acquire/release semantic

azru0512
Beginner
428 Views
I have a question about the acquire/release semantic.

According to the Intel TBB Reference manual. The "acquire" means operations after the atomic operation never move over it.And the "release"means operationsbefore the atomic operation never move over it.

What are the real meaning of "acquire" and "release"? For example,


atomic ready;
int msg;

P1 P2

msg = 14; while (!ready) ; // read with acquire
ready = true; // store with releaseinta =msg;

Does it mean "msg = 14" have committed to the memory before "ready = true"?

What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?

Thanks.
0 Kudos
13 Replies
RafSchietekat
Valued Contributor III
428 Views
"Does it mean "msg = 14" have committed to the memory before "ready = true"?"
Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": it might all be smoke and mirrors, an illusion carefully crafted by coherent-cache logic, but only for the participants who follow the rules.

"What happened if P1 and P2 have a shared cache? Can we use the shared cache as a communication channel?"
If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches you would at least also need to know that stores are not mutually reordered in P1's processor's write buffer, and that reads are not mutually reordered due to prefetching or so in P2's processor. If you have such an architecture, acquire and release will probably not be more expensive, because everything already is anyway, so you might as well just use them and not assume anything about the environment.
0 Kudos
azru0512
Beginner
428 Views

"Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": ..."


So how "operations after/before the atomic operation never move over it" could be done depends on the system? Just make things appear to be so no matter how this could be done? Is that what you meaned?


"If you really want to write nonportable code, in addition to knowing that there would not be several noncoherent caches..."

Do you mean if there is an architecture has non-coherence caches and do not reorder read/write operations, then we can use the shared cache as a communication channel?

Sorry, Idon't totally understandwhat you said. Could you give me more explanation? Thanks.

Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy thebenifits brought by shared cache at the same time?

Thanks again.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
428 Views
Quoting azru0512
Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy thebenifits brought by shared cache at the same time?

Shared cache if present is always used. All stores (plain, atomic or whatever) always go to cache on write-back memory type.

0 Kudos
RafSchietekat
Valued Contributor III
428 Views

"Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic and enjoy the benifits brought by shared cache at the same time?"
Intel Architecture keeps writes mutually ordered, reads mutually ordered, and has a coherent cache. The implementation of atomic has nothing else to do (for these specific operations anyway!) than preventing the compiler from being too smart for the situation (the generated machine code will not contain anything that looks any different from serial code, it just won't be optimised so much that it won't do what you want anymore), so you have no disincentive from writing portable code using atomic there. (Don't use volatile: that's compiler-specific.)

0 Kudos
azru0512
Beginner
428 Views
So in above example, "msg = 14" reach cacheeariler than "ready = true". Is that what you meaned?
0 Kudos
RafSchietekat
Valued Contributor III
428 Views
"So in above example, "msg = 14" reach cache eariler than "ready = true". Is that what you meaned?"
No, why? I meant that if on a particular architecture you're already paying extra during normal operation by foregoing possible optimisations related to reordering instructions, atomic release-store and load-acquire are not going to add more overhead (even if other kinds of atomic operations still might), and they will save the day when you move to any more adventurous architecture, even Intel's own Itanium, so I don't see any reason not to use them.

(Added 2010-07-30) Whoops, sorry, I understoodyou the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
428 Views
Quoting azru0512
So in above example, "msg = 14" reach cacheeariler than "ready = true".

Indeed. It's cache where "memory subsystem" begins from the point of view of an execution core in a modern architecture.

0 Kudos
azru0512
Beginner
428 Views
Maybe you misunderstood what I said.

I mean in the above example (ready is an ordered atomic variable), if "msg = 14" reach cache eariler than "ready = true".
0 Kudos
Dmitry_Vyukov
Valued Contributor I
428 Views
Quoting azru0512
I mean in the above example (ready is an ordered atomic variable), if "msg = 14" reach cache eariler than "ready = true".

Yes, "msg=14" should reach cache (memory subsystem) earlier than "ready=true" (release semantics for 'ready').

And accordingly on the consumer side request for loading 'ready' should reach cache (memory subsystem) earlier than request for loading 'msg' (acquire semantics for 'ready').

There may be some user-invisible speculations on an implementation level, though.

0 Kudos
azru0512
Beginner
428 Views
As mentioned by Tian in the followingarticle,

http://www.drdobbs.com/high-performance-computing/196902836

There is an opprtunity that shared cache could be a communication between cores. And this means "msg" can be transmitted through the shared cache.

Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?

Thanks.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
428 Views
Quoting azru0512
Butif my understaning about Tip #5 inthe above article is right, it is possible that "msg"written by P1 cannot fall into shared cache on time so that P2 can catch up (i.e., cache miss). Am I right?

In general, yes, communication via shared cache can efficient and "not so efficient". The details are involved, it depends on whether a system has exclusive, inclusive or hybrid cache, whether a system has Owner cache state, etc. I'm unable to go that deep, perhaps you will get more definitive answers if you ask over comp.arch.

0 Kudos
azru0512
Beginner
428 Views
Thanks anyway. : )
0 Kudos
RafSchietekat
Valued Contributor III
428 Views
I just added this to #6: "(Added 2010-07-30) Whoops, sorry, I understood you the other way around, as you noticed in #8! Allow me to rephrase: yes. :-) Well, sort of, because "the cache" would be an illusion created by the cache coherence logic."
0 Kudos
Reply