I found this thread (What set of events to use to profile the intra-processor and inter-processor NUMA cache coherence ov...) with some suggestions for NUMA cache coherence towards the bottom, that is helpful, but I am also looking for general memory information.
Also, some of the suggestions in the thread are dated and don't appear to exist in Update 2 (REMOTE_CACHE_LOCAL_HOME_HIT), or perhaps just not my processor.
I am relatively new to VTune so I hope this doesn't appear overly naive :-) Any help you can give would be greatly appreciated.
First at all, I recommend this article for your reference.
If you set NUMA on in BIOS, so associated performance event counts can be used:
Above indicates memory access for all offcore cacheline traffic. There are similar events can be used:
Additionally the article provides many latency info (penalty) for offcore memory access
To evaluateData Latency Analysis Ratios caused by "Remote DRAM", the formula is:
"LLC Load Driven Misses - Remote DRAM" = 275 * MEM_UNCORE_RETIRED.REMOTE_DRAM / CPU_CLK_UNHALTED.THREAD
About using performance counts on VTune AmplifierXE 2011 Update directly (command line) - please refer to this article.