Overhead for getSystemCounterState

Gaurav_B_Intel · ‎08-07-2017

Hi,

I have recently started using PCM and absolutely love the kind of information it provides. I am using the C++ APIs to instrument my code.

Particularly interested in measuring memory related issues such as total memory BW achieved, and Cache misses etc.

I use the getSytemCounterState API to get before and after states, and then use the API to get the total bytes read and written to MCs.

I am, however, seeing quite a large overhead for getSystemCounterState function call.

I see:

Around 1 ms on a 10 core Broadwell desktop

Around 6 ms on a 20 core Skylake Xeon

Around 64 ms on 68 cores KNL!

Is this expected? This is really modifying the numbers I am seeing. Is there any way to avoid this overhead?

Thanks,

Gaurav.

Richard_Nutman · ‎08-09-2017

getSystemCounterState returns the counters for the entire system, that means all sockets, and all logical cores within those sockets. Therefor the more cores you have the longer this function will take.

The question is do you need to check the entire system counters, or can you rework to just query on specific core's which would be alot quicker ?

Gaurav_B_Intel · ‎08-09-2017

Thanks for your response.

I need to query metrics such as BytesReadFromMCs, LLCMisses, etc. These are not Core metrics.

Can I still get this information by querying a specific core?

Richard_Nutman · ‎08-09-2017

Try and see if you can use getUncoreCounterStates.

It retrieves uncore information for system and sockets, but no core information.

Roman_D_Intel · ‎08-14-2017

Unfortunately getSystemCounterState is not parallelized yet. It takes linear time to read all core counters. As suggested you can try using getUncoreCounterStates or getAllCounterStates. The latter parallelizes the most time consuming part for reading core counters. Make sure you use the latest PCM from github (https://github.com/opcm/pcm/).

Thanks,

Roman