- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have recently started using PCM and absolutely love the kind of information it provides. I am using the C++ APIs to instrument my code.
Particularly interested in measuring memory related issues such as total memory BW achieved, and Cache misses etc.
I use the getSytemCounterState API to get before and after states, and then use the API to get the total bytes read and written to MCs.
I am, however, seeing quite a large overhead for getSystemCounterState function call.
I see:
Around 1 ms on a 10 core Broadwell desktop
Around 6 ms on a 20 core Skylake Xeon
Around 64 ms on 68 cores KNL!
Is this expected? This is really modifying the numbers I am seeing. Is there any way to avoid this overhead?
Thanks,
Gaurav.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
getSystemCounterState returns the counters for the entire system, that means all sockets, and all logical cores within those sockets. Therefor the more cores you have the longer this function will take.
The question is do you need to check the entire system counters, or can you rework to just query on specific core's which would be alot quicker ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response.
I need to query metrics such as BytesReadFromMCs, LLCMisses, etc. These are not Core metrics.
Can I still get this information by querying a specific core?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try and see if you can use getUncoreCounterStates.
It retrieves uncore information for system and sockets, but no core information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately getSystemCounterState is not parallelized yet. It takes linear time to read all core counters. As suggested you can try using getUncoreCounterStates or getAllCounterStates. The latter parallelizes the most time consuming part for reading core counters. Make sure you use the latest PCM from github (https://github.com/opcm/pcm/).
Thanks,
Roman
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page