query hardware counters from code

eisenlohr__john · ‎04-29-2015

Is there an Intel API for getting hardware counters from code? I'm talking about something like PAPI where you can start counters at the beginning of a function then stop the counters at the end and read them.

Thanks.

Vitaly_S_Intel · ‎04-30-2015

Hi John,

Can you provide some details - for what purpose do you need that functionality?

David_A_Intel1 · ‎04-30-2015

Hi John:

Yes! It's called Intel® Performance Counter Monitor, and is included in the VTune Amplifier XE package! See the "contrib" sub-directory after installation.

David_A_Intel1 · ‎04-30-2015

BTW, @John E., I think what Vitaly was getting at is, if you help us understand your need, it may be that VTune Amplifier can already address your need. Or, it may be something we will consider for a future release. So, we would appreciate your comments and wish you will with whatever tool you decide to use. :)

eisenlohr__john · ‎05-01-2015

Thank you Vitaly and MrAnderson for your responses. I asked this question because I was told by two people (both with much more experience than I have) that this functionality exists in vtune but I was not able to discover how to use it. We are trying to understand memory usage in a micro benchmark and it seems to me that querying counters would be simpler, less intrusive and more accurate than the sampling approach. Maybe I’m wrong about that — are there disadvantages to the PCM-type approach besides the fact that it requires modifying code?

I have compiled and linked my executable with the PCM object files but it seems I need permissions to execute. I am running on a shared linux benchmarking machine. Do I need to talk to the administrator or is there another way to do this?

Thanks again.

Vitaly_S_Intel · ‎05-01-2015

Hi John,

Saying "memory usage" do you want to see how much memory allocated by your workload or analyze memory bandwidth? As for the second one you can create custom analysis type based on Advanced Hotspots and select "Analyze memory bandwidth" option, then you should be able to see memory bandwidth read/write overtime data on timeline.

eisenlohr__john · ‎05-01-2015

Thanks, Vitaly. We want to analyze memory bandwidth and understand what could be the cause of slowdowns. I am currently using $ amplxe-cl -collect snb-general-exploration to gather counter information. I also have been using $ amplxe-cl -collect-with runsa -knob event-config=.... to get data. This is on an ivybridge machine with icc 2013.5.192 We want to understand which stalls are affecting performance as problem size increases, so the results from snb-general-exploration are helpful, but I thought getting the counters directly instead of by sampling might give a more accurate picture. I know I can increase the sample after value in collect-with runsa -- can I do this with snb-general-exploration? Thanks again.

David_A_Intel1 · ‎05-01-2015

If you create a custom analysis type based on the General Exploration type, you can modify any and all of the sample after values. However, there is an easier way, which is to modify the "sampling interval" for the GE type. But, note, increasing the sampling rate is going to introduce more overhead and can therefore cause your results to be less accurate. There is a fine line and you need to walk it carefully when trying to get "more accurate" results.

The real different between PCM and VTune Amplifier's EBS is that PCM does not give you samples of *where* the events are occurring. You just get counter values. That can be good or bad, depending on what you want to do with the data. If what you want to measure is the cache misses for a loop, using PCM is probably a good idea. It will have lower overhead (although VTune Amplifier's EBS overhead is low) and you can focus on code. VTune Amplifier's EBS will help you narrow your focus to potential problem areas by showing you where, in your application, you are experiencing the most cache misses (for example).