Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

False Cache Line Sharing vs "Cache Sloshing"

bklopfer
Beginner
1,240 Views

I am having some performance degradation when using a multi-processor Xeon with HTT. An MSN article points out what it calls "cache sloshing" as a possible culprit. What they are describing sounds an awful lot like false cache line sharing to me. Can someone explain the difference between false sharing and cache sloshing (if there is one!)?

MSN Article: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndllpro/html/msdn_scalabil.asp

0 Kudos
2 Replies
Intel_C_Intel
Employee
1,240 Views

That article is over 10 years old; the terminology of the time probably wasn't solidified yet. By the definition in the article, sloshing sounds like what I've heard more often referred to as ping-ponging, though that too is probably not a formal term.

Using the article's terminology: False sharing is a special case of sloshing. The article goes as far as mentioning this (though without using the term false sharing). False sharing means that while separate processors might be working on disjoint data, the data occupies the same cache line. This can usually be addressed by padding dijoint data so that it cannot share cache lines.

But sloshing, in general, will also occur when the data is truly shared. Reducing the impact in this scenario requires changes to the behavior of your program to reduce the use of shared data structures and minimize the frequency that data is shared.

Richard.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,240 Views

It is hard to know what the author ofthe article meant by sloshing. One could assume (possibly incorrectly) that cache sloshing means data migrates back and forth between processor caches. False sharing is where one processor cache falsly evicts the otherprocessor cache data. One scenario for sloshingis when the thread affinity includes more than one processor and the O/S context switches the threads between multiple processors. In which case the cach needs to be repopluated. If your O/S and application permit finer tuning of affinity you might experiment with various settings. VTune and CodeAnalyst (for AMD) can be setup to track cache Hit/Miss statistics. You can use those statistics for tuning, the real proof is in the wall clock time though.

Jim Dempsey.

0 Kudos
Reply