- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am having some performance degradation when using a multi-processor Xeon with HTT. An MSN article points out what it calls "cache sloshing" as a possible culprit. What they are describing sounds an awful lot like false cache line sharing to me. Can someone explain the difference between false sharing and cache sloshing (if there is one!)?
MSN Article: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndllpro/html/msdn_scalabil.asp
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That article is over 10 years old; the terminology of the time probably wasn't solidified yet. By the definition in the article, sloshing sounds like what I've heard more often referred to as ping-ponging, though that too is probably not a formal term.
Using the article's terminology: False sharing is a special case of sloshing. The article goes as far as mentioning this (though without using the term false sharing). False sharing means that while separate processors might be working on disjoint data, the data occupies the same cache line. This can usually be addressed by padding dijoint data so that it cannot share cache lines.
But sloshing, in general, will also occur when the data is truly shared. Reducing the impact in this scenario requires changes to the behavior of your program to reduce the use of shared data structures and minimize the frequency that data is shared.
Richard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is hard to know what the author ofthe article meant by sloshing. One could assume (possibly incorrectly) that cache sloshing means data migrates back and forth between processor caches. False sharing is where one processor cache falsly evicts the otherprocessor cache data. One scenario for sloshingis when the thread affinity includes more than one processor and the O/S context switches the threads between multiple processors. In which case the cach needs to be repopluated. If your O/S and application permit finer tuning of affinity you might experiment with various settings. VTune and CodeAnalyst (for AMD) can be setup to track cache Hit/Miss statistics. You can use those statistics for tuning, the real proof is in the wall clock time though.
Jim Dempsey.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page