Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5104 Discussions

Performance events for L3 cache miss

zhengda1936
Beginner
953 Views

Hello,

I'm trying to use vtune to understand the performance of my test program by measuring L3 cache misses. I use Xeon E5-4620. Intel software developer's manual lists all performance events supported by the E5 family, but I can't find one that measure L3 cache misses. If I understand correctly, OFFCORE_REQUESTS or OFFCORE_RESPONSE measures the number of requests sent to uncore or the number of responses from uncore. I also tried MEM_LOAD_UOPS_RETIRED.LLC_MISS, but I'm not sure if it's equivalent to the number of L3 cache misses.

My test program reads an array of 1GB sequentially, so there should be a lot of cache misses (I have turned off prefetch in BIOS), but MEM_LOAD_UOPS_RETIRED.LLC_MISS shows that there are very few cache misses. I wonder if it's the problem of my test program or MEM_LOAD_UOPS_RETIRED.LLC_MISS is a wrong event for L3 cache misses. Any comments?

Thank you,
Da 

0 Kudos
8 Replies
Peter_W_Intel
Employee
953 Views

As a quick answer - you may use (predefined) "memory access" analysis type which includes all L1/L2/LLC cache metrics.

0 Kudos
Bernard
Valued Contributor I
953 Views

It seems by reading description that MEM_LOAD_UOPS_RETIRED.LLC_MISS count of retired memory load uops which data source were not hit in LLC.

0 Kudos
zhengda1936
Beginner
953 Views

Thanks, Peter. The predefined memory access analysis also monitors MEM_LOAD_UOPS_RETIRED.LLC_MISS.

iliyapolak, does it mean MEM_LOAD_UOPS_RETIRED.LLC_MISS is the same as the number of L3 cache misses?

I monitored my test program with MEM_LOAD_UOPS_RETIRED.LLC_MISS and some other events: MEM_LOAD_UOPS_RETIRED.L2_HIT, MEM_LOAD_UOPS_RETIRED.LLC_HIT, OFFCORE_REQUESTS.DEMAND_DATA_RD. And here are the results of these events of a run:
MEM_LOAD_UOPS_RETIRED.L2_HIT=0
MEM_LOAD_UOPS_RETIRED.LLC_HIT=0
MEM_LOAD_UOPS_RETIRED.LLC_MISS=280,000
OFFCORE_REQUESTS.DEMAND_DATA_RD=134,400,000.

If I understand correctly, MEM_LOAD_UOPS_RETIRED.LLC_HIT+MEM_LOAD_UOPS_RETIRED.LLC_MISS should be the same as OFFCORE_REQUESTS.DEMAND_DATA_RD. But apparently, it's not the case here. It's unlikely data loading can bypass the cache because my test program uses the add instruction to load data and add to a register. I suppose there are some other events that count L3 cache hits and misses.

0 Kudos
Bernard
Valued Contributor I
953 Views
I think that MEM_LOAD_UOPS_RETIRED.LLC_MISS counts number of load uops which data were not present in LLC cache.Unfortunatly I can not find in VTune manual any description of OFFCORE_REQUESTS.DEMAND_DATA_RD. Can you point me to the source of information?
0 Kudos
zhengda1936
Beginner
953 Views

All events for E5 family are listed in Software Developer’s Manual V3. section 19.4

0 Kudos
Bernard
Valued Contributor I
953 Views

Regarding my previous post I would like to add that total L3 misses could include also cache miss of instructions. Does your code operate on immediate values only?

0 Kudos
zhengda1936
Beginner
953 Views

I don't know what you mean by immediate values. But my test program reads 1GB data, so the cache misses of instructions should be negligible.

One thing I concern is that MEM_LOAD_UOPS_RETIRED.LLC_MISS excludes unknown data source, as stated in the developer's manual. I don't know what is considered as unknown data source.

0 Kudos
Bernard
Valued Contributor I
953 Views

Immediate data  will be this instruction mov eax,1000h non immediate value will be stored in memory,but this is not your case.Yes I aggree regarding instruction cache misses because they could be mainly rep movss instruction which has high frequency of repeating in your code(some kind of loop)

The question is unknown data source related to profiled  currently executing hardware thread.

0 Kudos
Reply