- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to measure the count of remote dram read or write request of each core.
So I use the Cbox.But I can't found the base event.
But I found pmu_tool (https://github.com/andikleen/pmu-tools) has the event "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_DRAM" can measure the count of each core.
Then I found the definition of this event , the event code is "0xB7, 0xBB" and the umask is " 0x01".But it doesn't said using which box.
So I check it in Intel® Xeon® Processor E5 and E7 v3 Family Uncore Performance Monitoring Reference Manual but can not find the event related to it.
I want to know how to implement this event using the PMU to measure the remote read/write count of each core.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the OFFCORE_RESPONSE events are core-local events thus are not related to any 'box' (MSR_PMC0-7 with config register MSR_PERFEVTSEL0-7). You can program them through the MSR interface.
Depending on the event code "0xB7, 0xBB", you have to program the filter bitmask (many tools use human-readable names that specify a bitmask) in either register 0x1A6 or 0x1A7 (event 0xB7 -> register 0x1A6). Which bits can be set and their meaning is architecture specific and can be found in the SDM or in the matrix_bit_definitions files at https://download.01.org/perfmon . The matrix_bit_definitions files use the same names as pmu-tools thus it should be easy to retrieve the bitmask that is used by pmu-tools.
You will probably have a problem with 'remote DRAM write requests of each core'. The measurement facility is located between a core's private L2 and the 'ring' interconnect of the Uncore (L3, DRAM, QPI, ...). With this location, it is not possible to assign evicted cache lines from L3 to DRAM to a single CPU core. The 'COREWB' bit (WriteBack) in 0x1A6 or 0x1A7 counts dirty cache lines evicted from L2 to L3 and not from L3 to DRAM, so you cannot use it for memory bandwidth in general. (This paragraph is loosely cited from an email of Dr. Bandwidth to the PAPI mailing list)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the OFFCORE_RESPONSE events are core-local events thus are not related to any 'box' (MSR_PMC0-7 with config register MSR_PERFEVTSEL0-7). You can program them through the MSR interface.
Depending on the event code "0xB7, 0xBB", you have to program the filter bitmask (many tools use human-readable names that specify a bitmask) in either register 0x1A6 or 0x1A7 (event 0xB7 -> register 0x1A6). Which bits can be set and their meaning is architecture specific and can be found in the SDM or in the matrix_bit_definitions files at https://download.01.org/perfmon . The matrix_bit_definitions files use the same names as pmu-tools thus it should be easy to retrieve the bitmask that is used by pmu-tools.
You will probably have a problem with 'remote DRAM write requests of each core'. The measurement facility is located between a core's private L2 and the 'ring' interconnect of the Uncore (L3, DRAM, QPI, ...). With this location, it is not possible to assign evicted cache lines from L3 to DRAM to a single CPU core. The 'COREWB' bit (WriteBack) in 0x1A6 or 0x1A7 counts dirty cache lines evicted from L2 to L3 and not from L3 to DRAM, so you cannot use it for memory bandwidth in general. (This paragraph is loosely cited from an email of Dr. Bandwidth to the PAPI mailing list)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
Let's put aside the problem 'remote DRAM write requests of each core'. Can I use the PMU(may be the Cbox) to detect 'remote DRAM read requests of each core'?
Because my previous projects are based on the PMU implementation, but I do not know core-local events, do not know how to use my original project in the MSR interface implementation, I want to ask you have information on this convenient so that I can Fast understanding and use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot,
I already use the memory controllers (on the remote socket) to simulation something on the situation when local socket just have one core (others are shut down).
But when this simulator came to the case of multi-core, imc did not know that this read and write instructions from which core.
I'm sorry about the problem of msr or core-local I don't say it clearly. I set aside a complete set of msr interfaces and implemented.
But I don't have a sheet about core-local(or msr) events(I must say that I just know a little about it).
Can you give some sheet about MSR or core-local events?
Thanks for your help.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page