- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
In my Vtune analysis, there are two counters which seems to show some interesting things:
- OFFCORE_RESPONSE.ALL_DEMAND_MLC_PREF_READS.LLC_MISS.ANY_RESPONSE_1
-OFFCORE_RESPONSE.ALL_DEMAND_MLC_PREF_READS.LLC_MISS.LOCAL_DRAM_0
But I have no idea what they mean, and there is no information on Vtune help about that. In fact
there is a lot of counters in Vtune with no information on help.
Does anyone have some idea the meaning of these counters?
Thanks.
コピーされたリンク
7 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
The first event counts the total number of LLC misses that were due to prefetching read access.
The second event counts the subset of these requests that were satisfied from the local DRAM. Other sources may include other core's L2 cache or another socket (REMOTE_xxx)
Is this a 1 or 2 socket machine?
What are the interesting values that you are seeing?
The second event counts the subset of these requests that were satisfied from the local DRAM. Other sources may include other core's L2 cache or another socket (REMOTE_xxx)
Is this a 1 or 2 socket machine?
What are the interesting values that you are seeing?
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
The machine have two 8 Core Processors, is it that you mean "2 socket"? Is the memory "splited" between than, like a NUMA machine?
When I say interesting, I mean, the values are completly different in two versions of the same loop. The code is in Fortran,
one version is a traditional Fortran array, and the other is a Fortran array but placed in a region of memory allocated using a allocator writen in C and linked as a dynamic library. I think the place where the allocator puts the region of memory (I'm using mmap) causes this difference.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Yes, by 2 sockets I meant two processors.
This machine utilizes a NUMA architecture, so you should see significant performance behavior if the allocation is not aware of this.
there should be another set of events that end with .REMOTE_DRAM, and if that number is high or even near the LOCAL_DRAM version then you are having memory locality issues.
Your suspicion is probably correct in this regards.
This machine utilizes a NUMA architecture, so you should see significant performance behavior if the allocation is not aware of this.
there should be another set of events that end with .REMOTE_DRAM, and if that number is high or even near the LOCAL_DRAM version then you are having memory locality issues.
Your suspicion is probably correct in this regards.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Thanks, I will investigate more to be sure.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Where can I find a complete description of the counters shown in Vtune when we run it with this processor? The Vtune HELP shows just a few of them, sometimes, doesn't help much.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting Rafael Silva
Where can I find a complete description of the counters shown in Vtune when we run it with this processor? The Vtune HELP shows just a few of them, sometimes, doesn't help much.
Sanath
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Good, there is a lot of information on this manual, it helped. But, by example, specifically the counter I put here, if you do a search byALL_DEMAND_MLC_PREF_READS on the PDF, nothing is found.
