- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using these counters via RDPMC between two code sites. So, I call RDPMC, store this data, and then call RDPMC again.
It's turning out that on multiple iterations of the code, this (Instructions Retired . ANY)seems to be differing. Is this because of speculation? I thought that retired instructions counterwas not affected by speculation.
But then again I read somewhere (maybe here?) that Branch mispredicts can inflate this number, etc. or some such thing.
Next question is about memory allocations. I'm trying to figure out if I can use BUS_TRANS_MEM to somehow get a feel for it? Would a ratio of INST_RETIRED_LOADS/INST_RETIRED_STORES be helpful as well? L1/L2 cache miss events seem like not too suitable for this.
Just a reminder, this is all between two code sites, I don't use sampling. I'm going to turn on 7 counters (3 fixed, and 4 and always run it on our public facing production service).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For network traffic, if your box box is hooked up to the network, it may have to deal with more network traffic than what you are expecting (such as network shares).
And I forgot to mention things like virus protection which can lead to varying instructions retired.
Unless you are actually hooking into the OS context switch logic then I doubt you are accurately seeing 'every context switch'. The OS can swap you out due to some other higher priority process needing to run or your quantum of time can be used up.
To see EVERY context switch you need to use OS tracing (as I mentioned above) but this creates tons (100s of MBs of info) and is probably more than you want to know.
But all this is just guesswork... you really need some data (such as from VTune) to verify your assumptions.
Plus you could use VTune to select memory usage events and and see which one are useful to your needs.
Pat
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your code has any polling or spin loops, or maybe contention for locks then the amount work (instructions) done can vary, or a different load on the cpu (such as on servers) or network traffic, etc.
Or, if your code gets swapped out and something else runs.
I don't think any event will tell you about memory allocations... assuming you actually are trying to measure when memory is allocated (as opposed to just using memory).
OS events (like Windows ETW tracing or linux ftrace data) are probably the appropriate source for memory allocation monitoring. Or just instrument your code.
Although you are not using sampling, it sounds like sampling would be useful to verify, when you have 2 runs that have different number of instruction retired, thatyou really are doing the same work (same code paths, etc). You wouldn't have to run sampling all the time but it would be helpful to check that the assumptions you are making (about your code's behavior) are correct.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there more to read about content/spin loop, etc? I didn't get the different load on the cpu or network traffic one.
Every time my code initiates network calls, it switches to the OS, and at every context switch I save the amount of instructions retired.
I was hoping the inst. retired would work exactly for the swapped out case. It'll be different even if I save the counter?
Hmmm
It's not wildly different, and it is certainly better than RDTSC, which varies much more. I'm guessing because RDTSC is a measure of time vs. work done?
Also I did mean memory usage, and not necessarily allocations, my bad. Knowing that would would you recommend?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For network traffic, if your box box is hooked up to the network, it may have to deal with more network traffic than what you are expecting (such as network shares).
And I forgot to mention things like virus protection which can lead to varying instructions retired.
Unless you are actually hooking into the OS context switch logic then I doubt you are accurately seeing 'every context switch'. The OS can swap you out due to some other higher priority process needing to run or your quantum of time can be used up.
To see EVERY context switch you need to use OS tracing (as I mentioned above) but this creates tons (100s of MBs of info) and is probably more than you want to know.
But all this is just guesswork... you really need some data (such as from VTune) to verify your assumptions.
Plus you could use VTune to select memory usage events and and see which one are useful to your needs.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I made the same remark when counting instructions in user space between to syscall. I repeat the experiment two times, firs time with my pages are read only (so fault fault expected) and the second time with my pages RW ( so no page fault) and I noticed variation between the two count. I use MSR_PERF_FIXED_CTR0 to count instruction.
Is it true that page faults may lead to extra instruction count when using INST_RETIRED.ANY?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page