Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1708 Discussions

Record count and data address for memory accesses programmatically

kundnani__harsh
Beginner
2,565 Views

I am trying to record count and data address information of memory access for my project programmatically. I do not want the access count across whole address space but the count and address for only heap region. I used Intel Pin tool and obtained the desired information but Pin tool records it for each instruction which introduces a high overhead. Is there a way I can use Intel Pin tool to only check those instructions that have last level cache miss and record the count and data address of those instructions? I don't need the exact count, I can even work with statistical sampling.

Also, I read that I can use Hardware Performance Counters for getting last level cache miss count and it will cause less overhead compared to PIN tool. But I don't just need the count, I need the data address too. I read that the PEBS record contains the Data Linear Address which is the address I think I am looking for. I used PAPI's statistical profiling to record last level cache miss count but I don't know how to read the PEBS record using PAPI, Is it even possible? If not, then how can I programmatically read PEBS record? 

Please let me know if I should provide any additional information.

0 Kudos
1 Solution
Travis_D_
New Contributor II
2,565 Views
kundnani, harsh wrote:

Thank you for your answer Travis.

I require a library like PAPI which provides certain APIs since I want to get the count and address inside my application code.

So my application code is in C++ and at runtime I would like to collect the count and data address. perf mem cannot be used inside my code. 

I'm not sure if PAPI supports this directly - it usually lags a bit behind on cutting edge features, especially platform specific ones, since after all it ties to offer a generic interface to abstract the underlying PMU on different hardware. That said, you can definitely do what you want, even using perf directly with this trick: https://stackoverflow.com/a/51689586 or if you don't want to do use that "hack", keep in mind that perf mem and perf record are themselves built on the perf_event_open API, so you can use that directly within your program to capture the events you want. None of this is particularly easy, especially when PEBS is involved, but it's all doable. You might check out pmu-tools by Andi Kleen on github for several examples, including recording PEBS events.

View solution in original post

0 Kudos
4 Replies
Travis_D_
New Contributor II
2,565 Views

Yes, the PEBS memory latency sampling stuff on newer chips is probably exactly what you want.

You can set a threshold in cycles, and memory accesses longer than this amount will be randomly sampled. The PEBS record includes both the instruction address, and the accessed memory location, so you can filter the results however you want after the fact (e.g., by restricting it to heap accesses).

If you're on Linux, you can access this directly with perf mem. Honestly this tool is a bit unloved, so if you run into bugs or limitations, you can always just use perf record directly: internally perf mem is calling perf record to do the actual recording on his behalf, so you can do the same thing yourself using the perf mem source as a guide.

0 Kudos
kundnani__harsh
Beginner
2,565 Views

Thank you for your answer Travis.

I require a library like PAPI which provides certain APIs since I want to get the count and address inside my application code.

So my application code is in C++ and at runtime I would like to collect the count and data address. perf mem cannot be used inside my code. 

0 Kudos
Travis_D_
New Contributor II
2,566 Views
kundnani, harsh wrote:

Thank you for your answer Travis.

I require a library like PAPI which provides certain APIs since I want to get the count and address inside my application code.

So my application code is in C++ and at runtime I would like to collect the count and data address. perf mem cannot be used inside my code. 

I'm not sure if PAPI supports this directly - it usually lags a bit behind on cutting edge features, especially platform specific ones, since after all it ties to offer a generic interface to abstract the underlying PMU on different hardware. That said, you can definitely do what you want, even using perf directly with this trick: https://stackoverflow.com/a/51689586 or if you don't want to do use that "hack", keep in mind that perf mem and perf record are themselves built on the perf_event_open API, so you can use that directly within your program to capture the events you want. None of this is particularly easy, especially when PEBS is involved, but it's all doable. You might check out pmu-tools by Andi Kleen on github for several examples, including recording PEBS events.
0 Kudos
kundnani__harsh
Beginner
2,565 Views

Thank you for the link. I will check out the pmu-tools by Andi Kleen as you suggested and also the perf_event_open API.
 

0 Kudos
Reply