Intel server processors since Sandy Bridge have had "ring traffic counters" (now "mesh traffic counters") that can be used to monitor traffic on the four buses. These are described with varying levels of detail in the Uncore Performance Monitoring Guide for each generation, but at a high level
I like the idea of monitoring traffic on the mesh, but I quickly realized that Intel does not document the mapping of the core or L3 numbers and the location of these units on the physical die. This makes it impossible to turn all this lovely data into pictures that correspond to the flow of traffic on the die.
A few years ago, I used performance counters and microbenchmarks to determine the layout of the Xeon E5-2690 v3 (Haswell EP, 12-core) processor, but never published anything. More recently, I extended these approaches to determining the numbering of the cores and tiles on the Xeon Platinum 8160 (Skylake Xeon, 24-core) and the Xeon Phi 7250 (Knights Landing, 68-core), and this time I did get around to doing a presentation on the topic.
You can find the slides and the video from my April 12, 2018 presentation at https://www.ixpug.org/working-groups (look for "McCalpin" in the page and the links will be in the right-hand column).
I am still working on cleaning up the codes that I used for these measurements so that they can be distributed and potentially used for mapping other Skylake Xeon processor models.
Would it be possible to access the codes you are mentioning in your post to determine the physical location of cores (I am specially interested in experimenting it on a Xeon Gold 6138)?
Thanks in advance,
Could you please enlighten me on how to determine the physical location of cores with different topology of cores in the processor? If you could share the code, that would be really helpful.
The only details I have published so far are at https://www.ixpug.org/documents/1524216121knl_skx_topology_coherence_2018-03-23.pptx -- see slides 12-24 (with some extra material in slides 38-44). It is not clear to me whether this process can be completely automated -- I needed results from multiple chips with different patterns of disabled cores/tiles to fully disambiguate the pattern.
Writing this up is back on my "to do" list now that Frontera is in production, but I have not decided where this report will land in the priority ranking....
The work seems very interesting. Thanks for sharing the details.
The answer to this question is extremely important to me. So, please share the details, whenever you work on it.