I am trying to measure the bandwidth between L2/L3 and Main Memory on Intel Skylake SP platforms. I try to use Memory Bandwidth Monitor, but it show inaccuracies (this tool has one reported bug by Intel that cover this inaccuracies, SKZ4).
So, I am looking an alternative way to measure the bandwidth between L2/L3 and main memory by core . I try to aggregate several core counters such as: L2_LINES_OUT.NON_SILENT, L2_LINES_OUT.SILENT, IDI_MISC.WB_DOWNGRADE… But there are several lines coming to/from main memory that I am no able to count (L3 write-backs to main memory, hardware prefetching…). Are there any hardware counters that I can use to have a good approximation of bandwidth between L2/L3 and main memory?
If it’s not possible, is there any way to measure the inaccuracy of Memory Bandwith Monitor?
If I recall correctly, I never found a core counter on SKX that tracked L2 WriteBacks to L3, but I can't find the corresponding notes right now.... (I think I need to take a vacation and write a bunch of new importers for Apple's Spotlight search facility....)
Thanks for the reply.
I only find the following core counter on SKX that tracked L2 to L3 (clean and no clean blocks): IDI_MISC.WB_UPGRADE.
Also, I find that the Intel Memory Bandwidth Monitor allow to track the bandwidth between L2/L3 and main memory, but it has some bugs (SKX4).
You can use the counter IDI_MISC.WB_UPGRADE to get the cache lines moved from L2 to L3. There is no event to get the lines from L3 to L2 again. For cache lines dropped at L2, there is the IDI_MISC.WB_DOWNGRADE event. I started a page about it (https://github.com/RRZE-HPC/likwid/wiki/L2-L3-MEM-traffic-on-Intel-Skylake-SP-CascadeLake-SP), but it is not finished yet.