I'm trying to use vtune to understand the performance of my test program by measuring L3 cache misses. I use Xeon E5-4620. Intel software developer's manual lists all performance events supported by the E5 family, but I can't find one that measure L3 cache misses. If I understand correctly, OFFCORE_REQUESTS or OFFCORE_RESPONSE measures the number of requests sent to uncore or the number of responses from uncore. I also tried MEM_LOAD_UOPS_RETIRED.LLC_MISS, but I'm not sure if it's equivalent to the number of L3 cache misses.
My test program reads an array of 1GB sequentially, so there should be a lot of cache misses (I have turned off prefetch in BIOS), but MEM_LOAD_UOPS_RETIRED.LLC_MISS shows that there are very few cache misses. I wonder if it's the problem of my test program or MEM_LOAD_UOPS_RETIRED.LLC_MISS is a wrong event for L3 cache misses. Any comments?
Thanks, Peter. The predefined memory access analysis also monitors MEM_LOAD_UOPS_RETIRED.LLC_MISS.
iliyapolak, does it mean MEM_LOAD_UOPS_RETIRED.LLC_MISS is the same as the number of L3 cache misses?
I monitored my test program with MEM_LOAD_UOPS_RETIRED.LLC_MISS and some other events: MEM_LOAD_UOPS_RETIRED.L2_HIT, MEM_LOAD_UOPS_RETIRED.LLC_HIT, OFFCORE_REQUESTS.DEMAND_DATA_RD. And here are the results of these events of a run:
If I understand correctly, MEM_LOAD_UOPS_RETIRED.LLC_HIT+MEM_LOAD_UOPS_RETIRED.LLC_MISS should be the same as OFFCORE_REQUESTS.DEMAND_DATA_RD. But apparently, it's not the case here. It's unlikely data loading can bypass the cache because my test program uses the add instruction to load data and add to a register. I suppose there are some other events that count L3 cache hits and misses.
I don't know what you mean by immediate values. But my test program reads 1GB data, so the cache misses of instructions should be negligible.
One thing I concern is that MEM_LOAD_UOPS_RETIRED.LLC_MISS excludes unknown data source, as stated in the developer's manual. I don't know what is considered as unknown data source.
Immediate data will be this instruction mov eax,1000h non immediate value will be stored in memory,but this is not your case.Yes I aggree regarding instruction cache misses because they could be mainly rep movss instruction which has high frequency of repeating in your code(some kind of loop)
The question is unknown data source related to profiled currently executing hardware thread.