Solved: Core To Memory Request Identifier

CPati2 · ‎11-30-2017

Hi All,

Is there a way to understand using performance counter as to which memory out of the 8 MCDRAM and 2 DDR controller the specific core data went to?

For example: Requesting coming out of core 0, are they being fulfilled by MCDRAM on top left or bottom right etc?

Thanks.

McCalpinJohn · ‎11-30-2017

The mapping of physical addresses to MCDRAM controllers depends on the mode of operation of the chip.

If I recall correctly, when running in "All-to-All" mode, consecutive cache lines are interleaved round-robin across the 8 MCDRAM controllers. In "Quadrant" mode, the mapping is much more complex because of the need to map addresses to CHA units in the same quadrant as the corresponding MCDRAM controller. This could be done by either changing the mapping of physical addresses to CHA units, or it could be done by changing the mapping of physical addresses to MCDRAM units. My experiments suggest that the latter approach was chosen -- presumably to keep the hash the same. In "SNC-4" mode, consecutive cache lines are interleaved between the two MCDRAM controllers assigned to each quadrant, with the address space block-distributed across the four "sub-NUMA clusters".

The performance counters in the MCDRAM controllers have some peculiarities, but are generally reliable. They are certainly reliable enough to determine the mapping of physical addresses to MCDRAM channels.

Note that Intel has chosen not to disclose the mappings between processor X2APIC IDs and locations on the chip, nor have they disclosed the mapping between MCDRAM controller numbers (as seen by the performance counters) and locations on the chip. They have not disclosed the mapping between DDR4 controller numbering and locations on the chip, but that is fairly easy to reverse engineer (because the DIMMs are removable).

View solution in original post

McCalpinJohn · ‎11-30-2017

The mapping of physical addresses to MCDRAM controllers depends on the mode of operation of the chip.

If I recall correctly, when running in "All-to-All" mode, consecutive cache lines are interleaved round-robin across the 8 MCDRAM controllers. In "Quadrant" mode, the mapping is much more complex because of the need to map addresses to CHA units in the same quadrant as the corresponding MCDRAM controller. This could be done by either changing the mapping of physical addresses to CHA units, or it could be done by changing the mapping of physical addresses to MCDRAM units. My experiments suggest that the latter approach was chosen -- presumably to keep the hash the same. In "SNC-4" mode, consecutive cache lines are interleaved between the two MCDRAM controllers assigned to each quadrant, with the address space block-distributed across the four "sub-NUMA clusters".

The performance counters in the MCDRAM controllers have some peculiarities, but are generally reliable. They are certainly reliable enough to determine the mapping of physical addresses to MCDRAM channels.

Note that Intel has chosen not to disclose the mappings between processor X2APIC IDs and locations on the chip, nor have they disclosed the mapping between MCDRAM controller numbers (as seen by the performance counters) and locations on the chip. They have not disclosed the mapping between DDR4 controller numbering and locations on the chip, but that is fairly easy to reverse engineer (because the DIMMs are removable).