- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On VC8, I see that __TBB_store_with_release() and __TBB_load_with_acquire() are both implemented with _ReadWriteBarrier(). Having just learned about memory barriers and such, I'm have a question about this. Can __TBB_store_with_release() use a _WriteBarrier() barrier instead and similarly _ReadBarrier() for __TBB_load_with_acquire() ?
Thanks!
Thanks!
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - e4lam
On VC8, I see that __TBB_store_with_release() and __TBB_load_with_acquire() are both implemented with _ReadWriteBarrier(). Having just learned about memory barriers and such, I'm have a question about this. Can __TBB_store_with_release() use a _WriteBarrier() barrier instead and similarly _ReadBarrier() for __TBB_load_with_acquire() ?
No, they can't.
Read barrier is a kind of orthogonal to acquire barrier. While acquire barrier prevents all memory accesses (i.e. both reads and writes) to hoist above the load, read barrier prevents reads on one side of the barrier to intermix with reads on the other side of the barrier. The same for write barrier.
However, IMHO, fine-grained precise compiler fences are mostly useless, because they affect only compiler, so have basically zero run-time cost. So IMHO it's Ok to put the strongest full compiler fence everywhere.
Link Copied
25 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - e4lam
On VC8, I see that __TBB_store_with_release() and __TBB_load_with_acquire() are both implemented with _ReadWriteBarrier(). Having just learned about memory barriers and such, I'm have a question about this. Can __TBB_store_with_release() use a _WriteBarrier() barrier instead and similarly _ReadBarrier() for __TBB_load_with_acquire() ?
No, they can't.
Read barrier is a kind of orthogonal to acquire barrier. While acquire barrier prevents all memory accesses (i.e. both reads and writes) to hoist above the load, read barrier prevents reads on one side of the barrier to intermix with reads on the other side of the barrier. The same for write barrier.
However, IMHO, fine-grained precise compiler fences are mostly useless, because they affect only compiler, so have basically zero run-time cost. So IMHO it's Ok to put the strongest full compiler fence everywhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"No, they can't."
I would say that the answer is yes, but maybe you know something that I don't (or that I have forgotten again)?
"Read barrier is a kind of orthogonal to acquire barrier. While acquire barrier prevents all memory accesses (i.e. both reads and writes) to hoist above the load, read barrier prevents reads on one side of the barrier to intermix with reads on the other side of the barrier. The same for write barrier."
Can you quote the specification for these functions (maybe _ReadBarrier(), _WriteBarrier() and _ReadWriteBarrier() are all just compiler fences?), and clarify what you mean exactly with "hoist" and "intermix" (maybe "hoist" for C++ vs. execution and "intermix" for C++ vs. machine code?)?
"However, IMHO, fine-grained precise compiler fences are mostly useless, because they affect only compiler, so have basically zero run-time cost. So IMHO it's Ok to put the strongest full compiler fence everywhere."
Even if they only affect the compiler without causing any specific instruction to be emitted (on a specific architecture, notably x86!), their cost and/or effect may not be zero, because they could, at least conceivably, be preventing an optimisation reordering that would otherwise corrupt the program, so I wouldn't call them "useless" (that may be clear to you, but you have to keep your audience in mind when you write such things). By the same logic, perhaps a weaker compiler fence might allow a "partial optimisation" to still occur (subject to testing), so indiscriminately putting the strongest compiler fence everywhere might not be appropriate, even if it would be a conservative approximation (conserving correctness, I mean).
I would say that the answer is yes, but maybe you know something that I don't (or that I have forgotten again)?
"Read barrier is a kind of orthogonal to acquire barrier. While acquire barrier prevents all memory accesses (i.e. both reads and writes) to hoist above the load, read barrier prevents reads on one side of the barrier to intermix with reads on the other side of the barrier. The same for write barrier."
Can you quote the specification for these functions (maybe _ReadBarrier(), _WriteBarrier() and _ReadWriteBarrier() are all just compiler fences?), and clarify what you mean exactly with "hoist" and "intermix" (maybe "hoist" for C++ vs. execution and "intermix" for C++ vs. machine code?)?
"However, IMHO, fine-grained precise compiler fences are mostly useless, because they affect only compiler, so have basically zero run-time cost. So IMHO it's Ok to put the strongest full compiler fence everywhere."
Even if they only affect the compiler without causing any specific instruction to be emitted (on a specific architecture, notably x86!), their cost and/or effect may not be zero, because they could, at least conceivably, be preventing an optimisation reordering that would otherwise corrupt the program, so I wouldn't call them "useless" (that may be clear to you, but you have to keep your audience in mind when you write such things). By the same logic, perhaps a weaker compiler fence might allow a "partial optimisation" to still occur (subject to testing), so indiscriminately putting the strongest compiler fence everywhere might not be appropriate, even if it would be a conservative approximation (conserving correctness, I mean).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
"No, they can't."
I would say that the answer is yes, but maybe you know something that I don't (or that I have forgotten again)?
"Read barrier is a kind of orthogonal to acquire barrier. While acquire barrier prevents all memory accesses (i.e. both reads and writes) to hoist above the load, read barrier prevents reads on one side of the barrier to intermix with reads on the other side of the barrier. The same for write barrier."
Can you quote the specification for these functions (maybe _ReadBarrier(), _WriteBarrier() and _ReadWriteBarrier() are all just compiler fences?), and clarify what you mean exactly with "hoist" and "intermix" (maybe "hoist" for C++ vs. execution and "intermix" for C++ vs. machine code?)?
I would say that the answer is yes, but maybe you know something that I don't (or that I have forgotten again)?
"Read barrier is a kind of orthogonal to acquire barrier. While acquire barrier prevents all memory accesses (i.e. both reads and writes) to hoist above the load, read barrier prevents reads on one side of the barrier to intermix with reads on the other side of the barrier. The same for write barrier."
Can you quote the specification for these functions (maybe _ReadBarrier(), _WriteBarrier() and _ReadWriteBarrier() are all just compiler fences?), and clarify what you mean exactly with "hoist" and "intermix" (maybe "hoist" for C++ vs. execution and "intermix" for C++ vs. machine code?)?
Of course:
http://www.google.com/search?q="_readbarrier"+"_writebarrier"
Since here is a link for official documentation, please ignore my "hoist" and "intermix" at this point.
Quoting - Raf Schietekat
"However, IMHO, fine-grained precise compiler fences are mostly useless, because they affect only compiler, so have basically zero run-time cost. So IMHO it's Ok to put the strongest full compiler fence everywhere."
Even if they only affect the compiler without causing any specific instruction to be emitted (on a specific architecture, notably x86!), their cost and/or effect may not be zero, because they could, at least conceivably, be preventing an optimisation reordering that would otherwise corrupt the program, so I wouldn't call them "useless" (that may be clear to you, but you have to keep your audience in mind when you write such things). By the same logic, perhaps a weaker compiler fence might allow a "partial optimisation" to still occur (subject to testing), so indiscriminately putting the strongest compiler fence everywhere might not be appropriate, even if it would be a conservative approximation (conserving correctness, I mean).
I am quite skeptical regarding their practical usefulness. I would be interesting to see some (at least synthetic) show-case for fine-grained compiler fences where finer-grained fence makes significant difference over coarser-grained fence. May you construct a one?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The specification from Microsoft is quite unsatisfactory (so is it a compiler fence, or isn't it? and will _ReadWriteBarrier() keep a write before a read?), but the mentioning of specific hardware architectures at least seems to imply that on specific architectures any necessary machine instructions will be issued.
I have no ambition to demonstrate any real difference, let alone a significant one, but how are you going to prove a negative...
I have no ambition to demonstrate any real difference, let alone a significant one, but how are you going to prove a negative...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
The specification from Microsoft is quite unsatisfactory (so is it a compiler fence, or isn't it? and will _ReadWriteBarrier() keep a write before a read?), but the mentioning of specific hardware architectures at least seems to imply that on specific architectures any necessary machine instructions will be issued.
I have no ambition to demonstrate any real difference, let alone a significant one, but how are you going to prove a negative...
I have no ambition to demonstrate any real difference, let alone a significant one, but how are you going to prove a negative...
Yes, the documentation is unsatisfactory.
_ReadWriteBarrier() will keep a write before a read.
_Read/_Write/_ReadWriteBarrier() are compiler only fences (see http://msdn.microsoft.com/en-us/library/ms684208%28VS.85%29.aspx).
I can't prove the opposite. Proving negative things are usually more problematic because I must test ALL cases, and you must find just one...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"_ReadWriteBarrier() will keep a write before a read."
How could that possibly be useful without a hardware fence?
"_Read/_Write/_ReadWriteBarrier() are compiler only fences (see http://msdn.microsoft.com/en-us/library/ms684208%28VS.85%29.aspx)."
Ah, look: "The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only." Obviously in the documentation about these functions/intrinsics themselves you don't have such a statement... assuming this one is correct, of course. So, here we have the heaviest fence of all, with just the generic name MemoryBarrier() for your confusion, to be avoided if at all possible, but the documentation doesn't tell you that, and there's no reference in sight to a cheaper alternative for use where needed... Not very nice at all. So how should one implement __TBB_store_with_release() and __TBB_load_with_acquire() so that it doesn't break down on other architectures than x86/x64?
"I can't prove the opposite. Proving negative things are usually more problematic because I must test ALL cases, and you must find just one..."
If you think there's no cost anyway, then that's all the more reason to be conservative instead of avoiding the use of those functions/intrinsics.
How could that possibly be useful without a hardware fence?
"_Read/_Write/_ReadWriteBarrier() are compiler only fences (see http://msdn.microsoft.com/en-us/library/ms684208%28VS.85%29.aspx)."
Ah, look: "The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only." Obviously in the documentation about these functions/intrinsics themselves you don't have such a statement... assuming this one is correct, of course. So, here we have the heaviest fence of all, with just the generic name MemoryBarrier() for your confusion, to be avoided if at all possible, but the documentation doesn't tell you that, and there's no reference in sight to a cheaper alternative for use where needed... Not very nice at all. So how should one implement __TBB_store_with_release() and __TBB_load_with_acquire() so that it doesn't break down on other architectures than x86/x64?
"I can't prove the opposite. Proving negative things are usually more problematic because I must test ALL cases, and you must find just one..."
If you think there's no cost anyway, then that's all the more reason to be conservative instead of avoiding the use of those functions/intrinsics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
"_ReadWriteBarrier() will keep a write before a read."
How could that possibly be useful without a hardware fence?
How could that possibly be useful without a hardware fence?
I am aware of at least 3 practical use cases:
1. Interaction between a thread and a UNIX signal handler.
2. Interaction between threads running on the same processor.
3. Interaction between arbitrary threads when hardware fences are provided by other means.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
"_Read/_Write/_ReadWriteBarrier() are compiler only fences (see http://msdn.microsoft.com/en-us/library/ms684208%28VS.85%29.aspx)."
Ah, look: "The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only." Obviously in the documentation about these functions/intrinsics themselves you don't have such a statement... assuming this one is correct, of course. So, here we have the heaviest fence of all, with just the generic name MemoryBarrier() for your confusion, to be avoided if at all possible, but the documentation doesn't tell you that, and there's no reference in sight to a cheaper alternative for use where needed... Not very nice at all. So how should one implement __TBB_store_with_release() and __TBB_load_with_acquire() so that it doesn't break down on other architectures than x86/x64?
Ah, look: "The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only." Obviously in the documentation about these functions/intrinsics themselves you don't have such a statement... assuming this one is correct, of course. So, here we have the heaviest fence of all, with just the generic name MemoryBarrier() for your confusion, to be avoided if at all possible, but the documentation doesn't tell you that, and there's no reference in sight to a cheaper alternative for use where needed... Not very nice at all. So how should one implement __TBB_store_with_release() and __TBB_load_with_acquire() so that it doesn't break down on other architectures than x86/x64?
Just mark the variable as volatile. That's all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitriy Vyukov
I am aware of at least 3 practical use cases:
1. Interaction between a thread and a UNIX signal handler.
2. Interaction between threads running on the same processor.
3. Interaction between arbitrary threads when hardware fences are provided by other means.
1. Interaction between a thread and a UNIX signal handler.
2. Interaction between threads running on the same processor.
3. Interaction between arbitrary threads when hardware fences are provided by other means.
1. Maybe, but I don't know what the issues are here.
2. Can probably be disregarded because obsolete.
3. You wouldn't be able to meaningfully combine them with _ReadWriteBarrier(), is what I'm saying.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitriy Vyukov
Just mark the variable as volatile. That's all.
(Added) Literally: don't do that unless it's well encapsulated and won't infect the rest of the program with Microsoft-onliness.
(Added) And why would the compiler add machine instructions without applying the accompanying compiler fence? That makes no sense at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
Really?
1. Maybe, but I don't know what the issues are here.
2. Can probably be disregarded because obsolete.
3. You wouldn't be able to meaningfully combine them with _ReadWriteBarrier(), is what I'm saying.
1. Maybe, but I don't know what the issues are here.
2. Can probably be disregarded because obsolete.
3. You wouldn't be able to meaningfully combine them with _ReadWriteBarrier(), is what I'm saying.
Well, what can I say... I am a bit confused... I can go in deep details regarding each point... however, Raf, don't you trolling on this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitriy Vyukov
Well, what can I say... I am a bit confused... I can go in deep details regarding each point... however, Raf, don't you trolling on this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
That's an unfair assumption, but of course you're not obliged to continue this.
In short:
1. You need only compiler fences here, basically you need to 'strip' hardware part from a fence. Since a thread part and a signal part are executed on a single OS/hardware thread, there is no issue of hardware ordering.
2. It's not obsolete. You can bind two or more threads to a single processor, which is somehow reasonable for low-level parallelism support libraries like TBB. Then you need only compiler part of fences too.
3. Me and not only me are indeed able combine them in a meaningful way. Check out Joe Seigh's SMR+RCU:
http://lkml.indiana.edu/hypermail/linux/kernel/0505.1/0252.html
or David Dice et et Asymmetric Dekker Synchronization:
http://home.comcast.net/~pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt
or my Asymmetric Reader-Writer Mutex:
http://groups.google.com/group/lock-free/browse_frm/thread/1efdc652571c6137
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
I'll pretend I didn't see that.
(Added) Literally: don't do that unless it's well encapsulated and won't infect the rest of the program with Microsoft-onliness.
(Added) And why would the compiler add machine instructions without applying the accompanying compiler fence? That makes no sense at all.
(Added) Literally: don't do that unless it's well encapsulated and won't infect the rest of the program with Microsoft-onliness.
(Added) And why would the compiler add machine instructions without applying the accompanying compiler fence? That makes no sense at all.
Since you consider MS volatiles as a replacement for MS _ReadWriteBarrier(), MS-onliness is not an issue at all. Anyway for now (until C++0x) on every platform you will have to fall onto platform-specific level, so I do not see how you can get better than that anyway.
MS volatiles provide both compiler and hardware ordering. Only hardware fences do not make any sense. MS guys understand this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See, how else would I have obtained those specific links without trawling the whole Internet? :-) Thanks, I'll do some reading tonight, and maybe tomorrow some more trolling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the replies!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitriy, sorry for the delayed response.
#13 No, I don't see it. Or maybe it's a misunderstanding. I'm not aware of any bidirectional machine-level memory fences, so why would there be compiler-level ones? Isn't the real meat in the atomic operation, sided by necessarily asymmetric fences, on one side or both? That would go for 1 and 2. I couldn't find any mention of "compiler fence" in the first two references for 3, and in your own example the uses of _ReadWriteBarrier() are even commented as either acquire or release, so why not use _ReadBarrier() and _WriteBarrier() instead?
#14 As for my reaction to the MS-specific treatment of "volatile", that's just because it's so much easier to infect code by changing the meaning of an existing keyword than by the use of a new construct that would cause a compilation error elsewhere.
I still haven't found an accessible discussion about how those operations actually work. For example, if one thread does a release-write, why would that be more costly than just a compiler fence if the read-acquire happens to be on the same core even if that wasn't known before it so happened? Well, that's just out of curiosity at this point...
#13 No, I don't see it. Or maybe it's a misunderstanding. I'm not aware of any bidirectional machine-level memory fences, so why would there be compiler-level ones? Isn't the real meat in the atomic operation, sided by necessarily asymmetric fences, on one side or both? That would go for 1 and 2. I couldn't find any mention of "compiler fence" in the first two references for 3, and in your own example the uses of _ReadWriteBarrier() are even commented as either acquire or release, so why not use _ReadBarrier() and _WriteBarrier() instead?
#14 As for my reaction to the MS-specific treatment of "volatile", that's just because it's so much easier to infect code by changing the meaning of an existing keyword than by the use of a new construct that would cause a compilation error elsewhere.
I still haven't found an accessible discussion about how those operations actually work. For example, if one thread does a release-write, why would that be more costly than just a compiler fence if the read-acquire happens to be on the same core even if that wasn't known before it so happened? Well, that's just out of curiosity at this point...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
Dmitriy, sorry for the delayed response.
#13 No, I don't see it. Or maybe it's a misunderstanding. I'm not aware of any bidirectional machine-level memory fences, so why would there be compiler-level ones?
#13 No, I don't see it. Or maybe it's a misunderstanding. I'm not aware of any bidirectional machine-level memory fences, so why would there be compiler-level ones?
As for hardware bidirectional fences, check out membar #LoadLoad, membar #StoreStore on SPARC RMO, and SFENCE, LFENCE on x86.
I believe they are actually same useful and same widespread as uni-directional fences.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
Isn't the real meat in the atomic operation, sided by necessarily asymmetric fences, on one side or both? That would go for 1 and 2. I couldn't find any mention of "compiler fence" in the first two references for 3, and in your own example the uses of _ReadWriteBarrier() are even commented as either acquire or release, so why not use _ReadBarrier() and _WriteBarrier() instead?
The compiler barrier must be in the same place where you would normally put #StoreLoad fence. In my asymmetric mutex you can find that place by "no explicit #StoreLoad" comment.
Barriers that are commented as acquire and release are different barriers, they are not relevant for the discussion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Raf Schietekat
#14 As for my reaction to the MS-specific treatment of "volatile", that's just because it's so much easier to infect code by changing the meaning of an existing keyword than by the use of a new construct that would cause a compilation error elsewhere.
Agree.
It may make porting of MSVC code to other platforms quite problematic.
The better way would be to finally implement something along the lines of std::atomic<>.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page