What does MEM_UNCORE_RETIRED.OTHER_LLC_MISS measure?
I am studying an effect of compute node sharing on our old Westmere HPC cluster. And that metric shows high correlation with jobs run time. So I am really want to find out what does it measure.
Intel architecture development manual said about MEM_UNCORE_RETIRED.OTHER_LLC_MISS: “Load instructions retired other LLC miss”, “Applicable to two sockets only”.
Looks like this performance event exists only for Westmere and probably due to such high specificity there in no much information about it.
From our studies MEM_UNCORE_RETIRED_OTHER_LLC_MISS.core0 correlates with UNC_LLC_LINES_OUT_ANY.cpu1, UNC_LLC_LINES_IN_ANY.cpu1, UNC_LLC_MISS_READ.cpu1, UNC_LLC_MISS_PROBE.cpu0 (Pearson correlation is ~0.48). So probably it corresponds to reading from remote RAM. But in this case it should correlates with MEM_UNCORE_RETIRED_REMOTE_DRAM but it doesn’t.
So I am lost. Can someone help me with that?
I think I found the answer.
MEM_UNCORE_RETIRED.OTHER_LLC_MISS on wesmere is same as MEM_UNCORE_RETIRED.LOCAL_DRAM on later architectures, at least event number and umask values matches.
Based on https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf, when LLC miss happens and data reside on local DRAM, request to local DRAM and snoop request to other CPU LLC is sent at the same time. If other LLC do not contain requested cache line the data from local DRAM will be used. So MEM_UNCORE_RETIRED.OTHER_LLC_MISS and MEM_UNCORE_RETIRED.LOCAL_DRAM are the same. I think "local dram" notation more helpful.