Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Xeon E5 Family : Integrated Memory Controller

sarkar__saptarshi
1,435 Views

Hi,

   I have the following queries :

     1) There is no mention as regard's to E5 family Integrated Memory Controller's 

           a) Address mapping scheme i.e. DRAM address translation and decoding technique.I am interested in knowing how host address bits maps to row and column pins of DRAM device by IMC and where to infer the same from. I have found a detailed mention only in the x965 Express chipset family( Section 10.2.1.3). But for IMC, there is no such explanation. 

            b) Secondly, how to determine what type of row buffer management policy a memory controller uses, from datasheets.

    Thank you

Saptarshi

  

0 Kudos
3 Replies
Patrick_F_Intel1
Employee
1,435 Views

Hello Saptarshi,

I spoke to an expert in Intel and he said the info you are looking for depends on many parameters and is quite proprietary and it is not public. I assume you've looked at the data sheets and not found the information.

You might be able to figure some of the questions out from the performance monitoring events of the e5 uncore IMC. See http://www.intel.com/content/dam/www/public/us/en/documents/design-guides/xeon-e5-2600-uncore-guide.pdf . But finding interesting events and then test cases to 1) prove that the event is working properly and 2) show whatever it is that you think the event will tell you. This road will be a lot of work.

Sorry to not be more help,

Pat

0 Kudos
Bernard
Valued Contributor I
1,435 Views
@Saptarshi I have found this link about the row buffer management policy (not related to Intel technology) Web Link ://www.cs.utah.edu/~rajeev/pubs/pact11b.pdf
0 Kudos
McCalpinJohn
Honored Contributor III
1,435 Views

My experiments with the Integrated Memory Controller performance counters in the Xeon E5-26xx uncore suggest that at least the counts for page activation (ACT_COUNT), page closes (PREC_COUNT), and CAS events (CAS_COUNT) appear to be correct (or at least close enough that I can't tell the difference).

The default configuration for uniformly configured systems appears to be (at least close to) the obvious one for open page access:

  • one cache line maps to one channel
  • consecutive cache lines within the same DRAM page are mapped (round-robin) across the four channels 
  • consecutive page-sized blocks are mapped (round-robin) to consecutive banks within the same rank
  • after all the banks in a rank are mapped, the process repeats in the next rank (round-robin)
  • after all ranks have been accessed, the mapping wraps around to the beginning

There are probably variations on this using XOR of higher-order address bits to minimize pathological bank conflict cases.  To look for these you have to have the full physical address -- large pages only control the bottom 21 bits and that might not be enough to catch all the bits used in a (hypothetical) XOR swizzle.  Although it is not particularly user-friendly, versions of Linux since 2.6.25 have a "/proc/<pid>/pagemap" interface file that can be used to convert from virtual to physical address.

Most of the other events are much more difficult to test via microbenchmarks, but the access counts (plus the ability to translated from virtual address to physical address) should be enough to determine the mappings.  

0 Kudos
Reply