events

thouraya87 · ‎02-15-2010

hello
I want to know how to test my code with VTune
and what the event is equivalent to
number of L1 cache miss
cost of a L1 cache miss
number of L2 cache miss
cost of an L2 cache miss

number of write access
Cost of a write access
number of read access
out a read access
Thank you

Vladimir_T_Intel · ‎02-15-2010

Please, check out this topic. You will find therethe links to good papers for reading.

thouraya87 · ‎02-17-2010

hello

Thank you for the link

I ask you if you have another link

I want to know if VTune allows me to have a curve that shows the change of program in abcsisse the data size and in cordones the cost formula is entered (eg the number of L2 cache miss)

Thank you very match

Vladimir_T_Intel · ‎02-17-2010

Hello,

VTune does not support data analysis. Intel Performance Tuning Utility does, but I'm not sure in the way you described.

thouraya87 · ‎02-17-2010

hello,
I want to know the number of access in reading and writing and their cost of my program with VTune is that it is possible and what are these events because I have not found

Thank you very much

Vladimir_T_Intel · ‎02-18-2010

I dont know why you need to calculate the cost of all readings and writings for the program. General methodology is to estimate the program performance and to find out the hotspots (using the default sampling configuration). Once youve identified the hotspots, you can configure the sampling activity by adding countable eventsto identify the performance issues and estimate their cost using the data provided in the white paper. Appropriate events are listed there as well.

thouraya87 · ‎02-18-2010

Hello
I need access costs in reading and writing as well as the number of L1 cache miss and L2 and their costs is by my program to calculate the total time of my program based on a model that is an equation based on these terms
I want to know if VTune gives me thes values or it is the analysis and the indication of hotspots only
thank you
Best regards

robert-reed · ‎02-18-2010

VTune analyzer does provide the means to sample the number of cache misses at the various levels, but it would have to be a quite sophisticated equation to make any use of these numbers to estimate total time of a program run, which is behind the questions Vladimir asked. Modern processors use Out-Of-Order execution so the latency associated with cache misses can often overlap other work that the processor is completing. If you were just to string out the cache misses and their combined penalties, you would likely come up with an estimate that far exceeds the actual time the program uses, because some of the delays can be hidden while doing other work. The best way to estimate such costs is to time your program on a particular architecture of interest and measure the distribution of program execution times. VTune analysis of cache misses can be a valuable next step to the hot spot analysis Vladimir mentioned, particularly when those cache misses are occurring in code that is frequently called.

thouraya87 · ‎02-19-2010

Hello

the model or the equation that I am trying to find its corresponding values depends on some values estimated by VTune
in fact I need the number and cost of access to L1 cache, L2, writing and reading only in my model
it is a simple equation that does not depend on several parameters to calculate the end time total progrmme
eg
MEM_LOAD_RETIRED.L1D_LINE_MISS gives me the number of L1 cache miss my program?? Or the number of L1 miss doing my programs depends on several other event?

Thank you

robert-reed · ‎02-19-2010

As I said earlier, it's complicated. MEM_LOAD_RETIRED.L1D_LINE_MISS does NOT give the number of L1 cache misses in your program. Here is the description of this Intel Core i7 event:

This event counts the number of load operations that miss the L1 data cache and send a request to the L2 cache to fetch the missing cache line. That is the missing cache line fetching has not yet started. This event count is equal to the number of cache lines fetched from the L2 cache by retired loads. The event might not be counted if the load is blocked (see LOAD_BLOCK events).

So to start, it's not counting "cache misses" but "cache line misses" and not all of them. There are L1 data cache misses that are not being counted here (if they are to an L1 cache line that's already been missed--there's a separate event to count all those) and there are various load blocking conditions (which are also described in the VTune reference manual) that keep this event from being counted.

Also consider this: MEM_LOAD_RETIRED.L1D_LINE_MISS tells you the number of cache line requests that went from L1 data to L2; it doesn't tell you how many of those events were for cache lines that were already in L2 versus those that required a subsequent last level cache request or ultimately amemory fetch to satisfy an L2 miss. So I end as I started, by saying that these mechanisms are complicated and formulaic approaches to prediction are likely to be frustrated by the stochastic nature of the machine internals.

robert-reed · ‎02-19-2010

p.s., getting back to the methodology for whichVladimir provided references, if you found a hot spot in your code that simultaneously was exhibiting a lot of MEM_LOAD_RETIRED.L1D_LINE_MISS events, and (since this event can be made PRECISE) the precise location was just after a load operation, you might well conclude that something about the algorithm or data organization is befuddling the hardware prefetchers such that frequently the needed cache line is not preloaded.