I have been working on Skylake Server CPUs (Xeon Platinum 8180). Together with switch to mesh topology its cache architecture is changed to a directory based protocol. I am looking into understanding how it works, but cannot find much official documentation. Some descriptions of directory based cache coherency protocols describes that there is a directory structure kept in memory, and caching agent caches it. That would mean, there might be two memory accesses if the requested data is not in the caches, and corresponding directory information is not in the directory cache. However, I am not sure if the directory is implemented this way. The WikiChip mentions that, NCID (non-inclusive cache, inclusive directory architecture) might be used in the Skylake Server CPUs. However, there is not much information about it. If NCID is used, I am curious whether there is a directory entry for every set and way of the L2 and LLC of each core? If the inclusive directory entries are not directly mapped to the cache lines, then I might not be able to use all the cache lines. Reading data into cache should evict other cache lines, not because I am out of cache lines, but because I am out of directory storage right? I am asking this because there is a recent paper (http://iacoma.cs.uiuc.edu/iacoma-papers/ssp19.pdf), which thinks that such a problem can occur, as some directory entries are shared between L2 and LLC lines. I provide below a figure that shows the reverse engineered directory structure.
May | kindly ask, if their reverse engineering is accurate? Can I use all cache lines, or if I am limited with directory entry counts?