- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my Vtune analysis, there are two counters which seems to show some interesting things:
- OFFCORE_RESPONSE.ALL_DEMAND_MLC_PREF_READS.LLC_MISS.ANY_RESPONSE_1
-OFFCORE_RESPONSE.ALL_DEMAND_MLC_PREF_READS.LLC_MISS.LOCAL_DRAM_0
But I have no idea what they mean, and there is no information on Vtune help about that. In fact
there is a lot of counters in Vtune with no information on help.
Does anyone have some idea the meaning of these counters?
Thanks.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The first event counts the total number of LLC misses that were due to prefetching read access.
The second event counts the subset of these requests that were satisfied from the local DRAM. Other sources may include other core's L2 cache or another socket (REMOTE_xxx)
Is this a 1 or 2 socket machine?
What are the interesting values that you are seeing?
The second event counts the subset of these requests that were satisfied from the local DRAM. Other sources may include other core's L2 cache or another socket (REMOTE_xxx)
Is this a 1 or 2 socket machine?
What are the interesting values that you are seeing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The machine have two 8 Core Processors, is it that you mean "2 socket"? Is the memory "splited" between than, like a NUMA machine?
When I say interesting, I mean, the values are completly different in two versions of the same loop. The code is in Fortran,
one version is a traditional Fortran array, and the other is a Fortran array but placed in a region of memory allocated using a allocator writen in C and linked as a dynamic library. I think the place where the allocator puts the region of memory (I'm using mmap) causes this difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, by 2 sockets I meant two processors.
This machine utilizes a NUMA architecture, so you should see significant performance behavior if the allocation is not aware of this.
there should be another set of events that end with .REMOTE_DRAM, and if that number is high or even near the LOCAL_DRAM version then you are having memory locality issues.
Your suspicion is probably correct in this regards.
This machine utilizes a NUMA architecture, so you should see significant performance behavior if the allocation is not aware of this.
there should be another set of events that end with .REMOTE_DRAM, and if that number is high or even near the LOCAL_DRAM version then you are having memory locality issues.
Your suspicion is probably correct in this regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I will investigate more to be sure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Where can I find a complete description of the counters shown in Vtune when we run it with this processor? The Vtune HELP shows just a few of them, sometimes, doesn't help much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting Rafael Silva
Where can I find a complete description of the counters shown in Vtune when we run it with this processor? The Vtune HELP shows just a few of them, sometimes, doesn't help much.
Sanath
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good, there is a lot of information on this manual, it helped. But, by example, specifically the counter I put here, if you do a search byALL_DEMAND_MLC_PREF_READS on the PDF, nothing is found.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page