- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
I am trying to figure out the amount of memory bus traffic in
an application.
From http://assets.devx.com/goparallel/18027.pdf I thought that
BUS_TRAN_BURST.SELF (multiplied by 64) would bea good measure.
I also expected that this number would be within
2x of MEM_LOAD_RETIRED.L2_LINE_MISS (there are no RFOs
etc.). However, I see that BUS_TRAN_BURST.SELF is ~4 to 5 ofx
MEM_LOAD_RETIRED.L2_LINE_MISS. I have been trying to figure
out where thedifference comes from but I have not found a reasonable
explanation yet.
I also measuredL2_LD.SELF.DEMAND.MESI andL2_LD.SELF.ANY.MESI
and found that L2_LD.SELF.DEMAND.MESI is about half of
L2_LD.SELF.ANY.MESI and that L2_LD.SELF.DEMAND.MESI is about
double of BUS_TRAN_BURST.SELF.
The number of L2_M_LINES_OUT.SELF.ANY events is about 1.5 of
the number of MEM_LOAD_RETIRED.L2_LINE_MISS events.
Any help would be greatly appreciated.
Best regards,
Carlos
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Carlos,
So do the prefetchers kick in for your application? What does L2_LD.SELF.PREFETCH.MESI report? Have you tried the experiments with both L2 prefetchers disabled?
Kind regards
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your suggestion.
I measured L2_LD.SELF.PREFETCH.MESI and L2_LD.SELF.PREFETCH.I_STATE.
Here is a table with the event counts in (GEvents 10^9):
BUS_TRAN_BURST.SELF 11.6
MEM_LOAD_RETIRED.L2_LINE_MISS2.0
BUS_TRAN_WB.SELF2.8
BUS_TRAN_IFETCH.SELF 0.7
L2_LD.PREFETCH.MESI 21.2
L2_LD.PREFETCH.I_STATE 6.0
So it does seem that prefetch is quite active. Does it make sense to say that
the L2_LD.PREFETCH.I_STATE events cause a similar number of cache
I cannot easily disable the prefetchers on the bios as this is running on
a remote server. Is there a way to programatically disable prefetching?
Best regards,
Carlos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your post.I will try using http://etallen.com/msr.htmlto set the proper
MSR bits and will report the profile values after I repeat the experiments.I am not looking to
disable the prefetchers for performance but merely to see how much of the prefetching is wasted.
The application I am profiling is quite large has some parts where the access patterns are very "random"
and thoseparts are causing huge numbers of cache misses and bus traffic, even with only one thread.
Gathering this data should help push for a change and will help in our optimization efforts.
Best regards,
Carlos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately running the msr tool from http://etallen.com/msr.htmlis not working.
On the machines I have access to. I get
[root@...]# ./msr IA32_MISC_ENABLE.aclp_dis=1
msr: info: IA32_MISC_ENABLE.aclp_dis=1: fell back to numeric interpretation
msr: unable to write msr file at offset 0x000001a0; errno = 9 (Bad file descriptor)
(BTW the MSR module is compiled into the kernel).
I also tried just using wrmsr from asm/msr.h but that just segfaults.
I the tried wrmsr from a stap script ... and killed the machine ugh.
Can you point me to the appropriate way of manipulating MSR?
Best regards,
Carlos
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page