Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4995 Discussions

how do I measure how many L2 cache lines are written back to the memory?

icoming
Beginner
677 Views
Hello,
I'm using Atom processors. Is it possible for me to measure the number of L2 cache lines written back to the memory? L2_LINES_OUT counts all cache lines evicted from L2, but not all evicted cache lines are written back to memory. Does VTune on Atom processors provide this metric?
Thanks,
Da
0 Kudos
6 Replies
icoming
Beginner
677 Views
Hello,
It seems no one answers my original question. Maybe my question is confusing. What I am really interested in is to measure how much data is read to the cache and written back to memory. So I tried to measure how many cache lines are allocated and how many dirty cache lines are evictedin L2 because I don't find any other metrics that can help me estimate the size of data being accessed by CPU.
If there is a way to measure how much data is accessed by CPU directly, it'll be better.
Thanks,
Da
0 Kudos
icoming
Beginner
677 Views
I tried L2_M_LINES_OUT too. I thought the number of modified cache lines evicted from L2 is equal to the size of data written back to memory. It seems not.I run a program just copying memory and measure the counts ofL2_M_LINES_OUT andL2_LINES_IN, andL2_M_LINES_OUT is really insignificant.

Now I get confused and when I think of it carefully, I think I don't really understand how the store instruction works. For the case of memory copy, data is loaded to cache, and the data is copied to another address, i.e., data is written to another cache line. I suppose the new cache line isn't considered as modified. When the data in the destination address isn't in the cache, what should happen? I guess CPU just allocates a new cache line, and write copied data to the cache line. If that is the case,L2_LINES_IN should also count new cache lines allocated for the destination address, but it seemsL2_LINES_IN doesn't.
Could anyone help clear this? how does memory copy work in the cache?
Thanks a lot,
Da
0 Kudos
Kirill_R_Intel
Employee
677 Views

Hello Da,

Maybe the following events can be helpful for your needs:

BUS_TRANS_WB - bus transactions due to dirty line evictions

MEM_LOAD_RETIRED.L2_LINE_MISS for L2 memory misses for memory load (see thread

http://software.intel.com/en-us/forums/showthread.php?t=81229).

L2_DBUS_BUSY_RD Cycles the L2 transfers data to the core.

Regards,
Kirill

0 Kudos
icoming
Beginner
677 Views
Thanks for your information.
MEM_LOAD_RETIRED.L2_LINE_MISS andL2_DBUS_BUSY_RD don't work because I want to measure how much data is read from the main memory and written back to the main memory, and the data prefected by CPU also counts.

I guess I should monitor the bus instead of cache, soBUS_TRANS_WB and related events should be monitored.BUS_TRANS_WB isn't enough because it only counts the transactionsdue to dirty line evictions. Unfortunately, some instructions such asMOVNTI can bypass the CPU cache. I guess BUS_TRANS_MEM or BUS_TRANS_BURST might be more relevant.
However, one question is how I translate the number of bus transactions to the size of data transmitted in the bus. Does one bus transaction operate on one cache line of data? The second question is what BUS_TRANS_MEM.ALL_AGENTS is? It says "Counts activity initiated by any agent on the bus", but what does it mean? Currently, the processor I'm monitoring is Atom 330, which has two cores, and Hyperthreading is enabled, so there are 4 logical cores. All agents mean the 4 cores? VTune showsBUS_TRANS_MEM.ALL_AGENTS is about 4 times as big asBUS_TRANS_MEM.SELF. It seemsBUS_TRANS_MEM.ALL_AGENTS is much larger than I can expect.
Then there arises another question. Do L2_LINES_IN.self and L2_M_LINES_OUT.self count the number of activities on one core? If it does, which core? VTune doesn't provideL2_LINES_IN.BOTH_CORES andL2_M_LINES_OUT.BOTH_CORES. How can I count activities on all cores?

0 Kudos
Kirill_R_Intel
Employee
677 Views
Regarding bus transactions - 64 bytes is passed with each transaction. Will check about other questions.
0 Kudos
Kirill_R_Intel
Employee
677 Views
Da,

You're right that it's better to monitor bus with BUS_TRANS_MEM. One bus transaction operates on one cache line 64 bytes. BUS_TRANS_MEM.ALL_AGENTS counts for events for all processors, only physical cores are counted.

L2_LINES_IN.SELF counts the number of activities on one core. There is a possibility in Intel VTune Amplifier XE to show each event per CPU. Refer to the image below. You can just press the >> button close to the event name. Than sort by package 0 appears with the same >> sign press it and youll see the event counts for each CPU.






In the Intel VTune Amplifier XE 2011 update 2 there are events L2_LINES_IN.BOTH_CORES and L2_M_LINES_OUT.BOTH_CORES that youre looking for. What version do you run?

Regards,
Kirill
0 Kudos
Reply