Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1711 Discussions

Core and L3 numbering vs physical layout on Xeon Platinum 8160 (and KNL)

McCalpinJohn
Honored Contributor III
983 Views

Intel server processors since Sandy Bridge have had "ring traffic counters" (now "mesh traffic counters") that can be used to monitor traffic on the four buses.  These are described with varying levels of detail in the Uncore Performance Monitoring Guide for each generation, but at a high level

  • AD (Address) - carries read/write requests from cores to L3 and/or coherence agents, and carries snoop requests from some agents to cores.
  • AK (Acknowledge) - carries ACKs between some agents and carries snoop responses from cores to L3 and/or coherence agents.
  • BL (Block Data) - carries data (usually cache lines)
  • IV (Invalidate) - carries snoop requests from L3 and/or coherence agents to cores

I like the idea of monitoring traffic on the mesh, but I quickly realized that Intel does not document the mapping of the core or L3 numbers and the location of these units on the physical die.  This makes it impossible to turn all this lovely data into pictures that correspond to the flow of traffic on the die.

A few years ago, I used performance counters and microbenchmarks to determine the layout of the Xeon E5-2690 v3 (Haswell EP, 12-core) processor, but never published anything.  More recently, I extended these approaches to determining the numbering of the cores and tiles on the Xeon Platinum 8160 (Skylake Xeon, 24-core) and the Xeon Phi 7250 (Knights Landing, 68-core), and this time I did get around to doing a presentation on the topic.

You can find the slides and the video from my April 12, 2018 presentation at https://www.ixpug.org/working-groups (look for "McCalpin" in the page and the links will be in the right-hand column).

I am still working on cleaning up the codes that I used for these measurements so that they can be distributed and potentially used for mapping other Skylake Xeon processor models.

0 Kudos
5 Replies
Igual__Francisco
Beginner
983 Views

Dear John,

Would it be possible to access the codes you are mentioning in your post to determine the physical location of cores (I am specially interested in experimenting it on a Xeon Gold 6138)?

Thanks in advance,

Francisco

0 Kudos
McCalpinJohn
Honored Contributor III
983 Views

I am gearing up for the SuperComputing 2018 conference next week, and don't expect that I will have time to look at this before December.

0 Kudos
Agrawal__Mohit
Beginner
983 Views

Hello John,

Could you please enlighten me on how to determine the physical location of cores with different topology of cores in the processor? If you could share the code, that would be really helpful. 

 

Thanks!

Mohit

0 Kudos
McCalpinJohn
Honored Contributor III
983 Views

The only details I have published so far are at https://www.ixpug.org/documents/1524216121knl_skx_topology_coherence_2018-03-23.pptx -- see slides 12-24 (with some extra material in slides 38-44).   It is not clear to me whether this process can be completely automated -- I needed results from multiple chips with different patterns of disabled cores/tiles to fully disambiguate the pattern.

Writing this up is back on my "to do" list now that Frontera is in production, but I have not decided where this report will land in the priority ranking....

0 Kudos
Agrawal__Mohit
Beginner
983 Views

Hello John,

The work seems very interesting. Thanks for sharing the details.

The answer to this question is extremely important to me. So, please share the details, whenever you work on it.

 

Thanks!

Mohit

0 Kudos
Reply