Solved: I don't know of any resources

anthony_b_ · ‎05-11-2016

I am reading Performance events from manual chapter19.

It says

24H 03H L2_RQSTS.ALL_DEMAND_DATA_RD Counts any demand and L1 HW prefetch data load requests to L2.

24H 0CH L2_RQSTS.ALL_RFO Counts all L2 store RFO requests.

What is the difference between RFO(read for ownership) requests and demand data load requests.? Both are trying to read/acquire data from L2 (mostly on L1 miss) (ignore prefetch requests for now). When CPU executing LOAD instruction, it needs to read data (on data to be read it not require exclusive permission) as its just LOAD(read). If CPU issues store instruction then it needs exclusive access (so as RFO) then Request for RFO is counted.

According to my thinking, L2_RQSTS.ALL_RFO is a subset of L2_RQSTS.ALL_DEMAND_DATA_RD. Am I correct in my understanding? If not please correct me in understanding these events.

Thank you

McCalpinJohn · ‎05-12-2016

Both RFO and DemandDataRead result in data being loaded into the requesting core's cache, but they are different transactions because they need to be treated differently by the other caches in the system. These events count the two transaction types independently -- there is no overlap.

Other events may differentiate in other ways. On Haswell processors, Event 0xF1 "L2_LINES_IN" counts lines coming into the L2 cache by their coherence state, not by the transaction type that initiated the transfer. Demand reads, for example, can bring lines into the L2 in S state or E state, depending on whether the line is present in a clean state in any other cache.

View solution in original post

McCalpinJohn · ‎05-12-2016

Both RFO and DemandDataRead result in data being loaded into the requesting core's cache, but they are different transactions because they need to be treated differently by the other caches in the system. These events count the two transaction types independently -- there is no overlap.

Other events may differentiate in other ways. On Haswell processors, Event 0xF1 "L2_LINES_IN" counts lines coming into the L2 cache by their coherence state, not by the transaction type that initiated the transfer. Demand reads, for example, can bring lines into the L2 in S state or E state, depending on whether the line is present in a clean state in any other cache.

anthony_b_ · ‎05-14-2016

Thanks for your answer John, But can you provide any resource to know clearly about these events, what they are actually counting.? The two line explanation in the manual is not good enough for beginners like me. How to find the subtle differences among these, as you pointed out about one such like LINES_IN and DEMAND_RAEDS.?

Thank you

McCalpinJohn · ‎05-16-2016

I don't know of any resources that describe these in detail for any modern processor, though Intel has published a number of recommendations for performance counter analysis for specific processors that contain useful guidance. Overall, however, the implementations are too complex for simple explanations, and complex explanations are likely to reveal "trade secrets" or can be used by "patent trolls" to support claims of patent infringement.

Intel has model-specific write-ups of performance analysis with Amplifier XE (aka VTune) at https://software.intel.com/en-us/articles/processor-specific-performance-analysis-papers

One of the more detailed white papers on the topic is "Performance Analysis Guide for Intel® CoreTM i7 Processor and Intel® XeonTM 5500 processors" by David Levinthal. This is for the Nehalem processors, so the details are somewhat out of date, but the principles and nomenclature are useful. (There is a link to this paper at the bottom of the page referenced above.)

Another very useful reference is Appendix B of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (document 248966).

There is lots of good information in academic sources, but it can be very challenging to bridge the nomenclature -- that is why I recommend working with Intel reference material when possible. One exception is the book "A Primer on Memory Consistency and Cache Coherence" by Sorin, Hill, and Wood (ISBN 9781608455645), which is an indispensible (advanced) reference on cache coherence with a good discussion of (the authors' interpretation of) Intel's cache coherence protocol and memory consistency model.

anthony_b_ · ‎05-16-2016

Thank you John, for these useful links. I will look at them.

Difference between RFO requests and data read requests to L2?