- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, all
I want to know the false sharing miss in multithreaded programs(ex. splash2) recently.
There was a thread(2004) discussing the "Memory Order Machine Clear" event inForums before.
I also read "the Reduce False Sharing in .NET*" artical and help document in Vtune.
However,i couldn't figure out the meaning of "Memory Order Machine Clear".... :(
The false sharing isindependet datas in the same cache lineuesd manythreads..
I just know the false sharing is caused by cache coherence...
Why we saythe "Memory Order Machine Clear" is related false sharing...
Brief of my questions:
1. Is "Memory Order Machine Clear" equal to the false sharing miss ??
2. What is the meaning of "Memory Order Machine Clear"?
Isanyone could answer my questions?
Thank you very much :)
Regards
Dennise
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The articles on Memory Order Machine Clear refer to the Intel NetBurst CPU architecture. Back then, this was a recommended VTune event for verification of performance problems related to false sharing.
More generally, if a thread updates a cached copy of a cache line (typically, a 64 byte section of virtual memory address space), all other cached copies of that cache linebecome invalid. This impacts performance when it results in unexpected cache misses. So, you would be looking for cache misses impacting performance of repeated access to the same cache line, particularly those which are associated with what Intel calls HITM events (cache hits on a modified cache line).
We have had 2 generations of Intel CPUs since NetBurst, each attempting to improve on the cache coherency schemes. MOMC might have been an inefficient way to implement coherency; Core i7 doesn't necessarily require taking the time to wipe out the cache physically, as it allows for seeing that there is another more up to date copy. Recent efforts in diagnosis of false sharing seemed to center on analyzing memory locality, which is supported by PTU for Core i7 (see WhatIf forum).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, Is "memory Order Machine Clear" related false sharing miss and true sharing miss?
Memory Order Machine Clear False Sharing miss + True Sharing Miss ?
If it is correct, then I could get the false sharing miss roughly in the benchmark if ihaveknow the true shaing miss.
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Term False-sharing in most cases is wrong and bad term. Processors really have no means to distinguish false sharing and true sharing, and penalize the former and do not penalize the latter. In most cases false sharing and true sharing have exactly the same consequences. In some cases it's even difficult to say whether it's false sharing or true sharing. Consider - array of elements, some threads iterate over whole array, and some threads update individual elements. If we will place several elements into same cache-line - is it false-sharing?
So what you must try to eliminate - is just sharing (not on the source code level, but on the physical level).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am curious as to how are you going to measure true sharing miss?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am curious as to how are you going to measure true sharing miss?
I'm was actually wondering the same thing myself; care to comment on this?
- Matt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The memory analysis in PTU also should assist diagnosis of multiple threads hitting the same address or the same cache line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The memory analysis in PTU also should assist diagnosis of multiple threads hitting the same address or the same cache line.
I don't mean race detection. I mean true sharing, i.e. sharing of data, but not false sharing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't mean race detection. I mean true sharing, i.e. sharing of data, but not false sharing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why only read-only? Isn't plain mutex read-only true sharing?
Just to make it explicit, I am only curious how it's possible to estimate false sharing as false_sharing = total_sharing - true_sharing. I can imagine how to measure total_sharing, for example as total number of cache line transfers between cores. But how to measure true_sharing?
You've provided some hook - memory analysis of PTU may show that threads access same cache line but different addresses. But I don't think that it's correct gauge of false sharing. Consider, data structure protected by mutex, both contained at the same cache line. While current owner of the mutex works with data structure, other threads access mutex. It's definitely not false sharing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why only read-only? Isn't plain mutex read-only true sharing?
Just to make it explicit, I am only curious how it's possible to estimate false sharing as false_sharing = total_sharing - true_sharing. I can imagine how to measure total_sharing, for example as total number of cache line transfers between cores. But how to measure true_sharing?
You've provided some hook - memory analysis of PTU may show that threads access same cache line but different addresses. But I don't think that it's correct gauge of false sharing. Consider, data structure protected by mutex, both contained at the same cache line. While current owner of the mutex works with data structure, other threads access mutex. It's definitely not false sharing.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page