We are trying to get the memory BW for a Skylake based client.
We could get IA DDR BW from SoC watch, want to trace out the SW apis that cause this BW. So we are using LONGEST_LAT_CACHE.MISS event.
So is IA DDR BW (Read + Write) = LONGEST_LAT_CACHE.MISS * 64B (Cache line size)/ time + Uncached Mem BW.
Seems Longest_LAT_CACHE.MISS counts both demand data+code reads and prefetches. Does it also count the Misses for write into memory?
Are there better formula to correlate?
Intel's event listings at descriptions at 01.org are the most comprehensive. The Skylake client info at https://download.01.org/perfmon/SKL/skylake_core_v46.json says
"Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches from L1 and L2. It does not include all misses to the L3."
I have not tested this event on a Skylake client, but the "Read-for-Ownership" transactions are the memory reads for stores that miss in the caches.
To get more specific about transaction types, the performance counter event called OFFCORE_RESPONSE allows the most control. It requires programming an auxiliary MSR with information about the transaction type, the snoop response, and the data provider. This event has been buggy in the past, but I don't see any mentions of related events in the "6th Generation Intel Processor Family Specification Update" (document 332689-023, November 2019). There are over 250 specific combinations of criteria defined for this event in the perfmon web page above.
None of the core performance counter events will measure the DRAM writes caused by the writeback of dirty cache lines. These have to be measured using performance counter events in the uncore.