- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am reading Performance events from manual chapter19.
It says
24H 03H L2_RQSTS.ALL_DEMAND_DATA_RD Counts any demand and L1 HW prefetch data load requests to L2.
24H 0CH L2_RQSTS.ALL_RFO Counts all L2 store RFO requests.
What is the difference between RFO(read for ownership) requests and demand data load requests.? Both are trying to read/acquire data from L2 (mostly on L1 miss) (ignore prefetch requests for now). When CPU executing LOAD instruction, it needs to read data (on data to be read it not require exclusive permission) as its just LOAD(read). If CPU issues store instruction then it needs exclusive access (so as RFO) then Request for RFO is counted.
According to my thinking, L2_RQSTS.ALL_RFO is a subset of L2_RQSTS.ALL_DEMAND_DATA_RD. Am I correct in my understanding? If not please correct me in understanding these events.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both RFO and DemandDataRead result in data being loaded into the requesting core's cache, but they are different transactions because they need to be treated differently by the other caches in the system. These events count the two transaction types independently -- there is no overlap.
Other events may differentiate in other ways. On Haswell processors, Event 0xF1 "L2_LINES_IN" counts lines coming into the L2 cache by their coherence state, not by the transaction type that initiated the transfer. Demand reads, for example, can bring lines into the L2 in S state or E state, depending on whether the line is present in a clean state in any other cache.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both RFO and DemandDataRead result in data being loaded into the requesting core's cache, but they are different transactions because they need to be treated differently by the other caches in the system. These events count the two transaction types independently -- there is no overlap.
Other events may differentiate in other ways. On Haswell processors, Event 0xF1 "L2_LINES_IN" counts lines coming into the L2 cache by their coherence state, not by the transaction type that initiated the transfer. Demand reads, for example, can bring lines into the L2 in S state or E state, depending on whether the line is present in a clean state in any other cache.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your answer John, But can you provide any resource to know clearly about these events, what they are actually counting.? The two line explanation in the manual is not good enough for beginners like me. How to find the subtle differences among these, as you pointed out about one such like LINES_IN and DEMAND_RAEDS.?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know of any resources that describe these in detail for any modern processor, though Intel has published a number of recommendations for performance counter analysis for specific processors that contain useful guidance. Overall, however, the implementations are too complex for simple explanations, and complex explanations are likely to reveal "trade secrets" or can be used by "patent trolls" to support claims of patent infringement.
Intel has model-specific write-ups of performance analysis with Amplifier XE (aka VTune) at https://software.intel.com/en-us/articles/processor-specific-performance-analysis-papers
One of the more detailed white papers on the topic is "Performance Analysis Guide for Intel® CoreTM i7 Processor and Intel® XeonTM 5500 processors" by David Levinthal. This is for the Nehalem processors, so the details are somewhat out of date, but the principles and nomenclature are useful. (There is a link to this paper at the bottom of the page referenced above.)
Another very useful reference is Appendix B of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (document 248966).
There is lots of good information in academic sources, but it can be very challenging to bridge the nomenclature -- that is why I recommend working with Intel reference material when possible. One exception is the book "A Primer on Memory Consistency and Cache Coherence" by Sorin, Hill, and Wood (ISBN 9781608455645), which is an indispensible (advanced) reference on cache coherence with a good discussion of (the authors' interpretation of) Intel's cache coherence protocol and memory consistency model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you John, for these useful links. I will look at them.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page