- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In most of case, developers want to know times of data access. Base on different CPU microarchitecture, hardware event name is different:
1. Sandy Bridge: MEM_LOAD_UOPS_RETIRED
2. Nahalem: MEM_LOAD_RETIRED
3. Core 2 Duo: MEM_LOAD_RETIRED.MISS + MEM_LOAD_RETIRED.L1D_MISS + L1D_ALL_REF
Here are extending read:
http://software.intel.com/en-us/articles/using-intel-vtune-amplifier-xe-to-tune-software-on-the-2nd-generation-intel-core-processor-family/?wapkw=(Using+Intel+VTune+Amplifier+XE+%0bto+Tune+Software+on+the++%0b2nd+generation+Intel+Core+processor+family)
http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-on-intel-core-i7-processors/?wapkw=(Using+Intel+VTune+Performance+%0bAnalyzer+to+Optimize+Software+on+%0bIntel+Core+i7+Processors)
http://software.intel.com/sites/products/collateral/hpc/vtune/cycle_accounting_analysis.pdf?wapkw=(Cycle+Accounting+Analysis+on+Intel+Core2+Processors)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>software access the RAM)...
In case of a Windows platform you could also look at:
- Windows Management Instrumentation ( WMI )interfaces;
- Platform's SDK utility Pstat.exe;
- PerfToolexample with source codes located at:
folder.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two meanings of memory access counting: 
1) Count all memory accesses, includes: load data from memory (local memory or remote memory), load data from cache (local cache or remote cache). Same logic in your algorithm.
2) Only count about loading data from memory. Count number is less than in your algorithm
You said "From another hand, it could be nice if I can also measure the number of LLC cache misses in Total", that is thesituation 2 (Load LLC missed from loca and remote). So simply use event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS.
If my assumption is wrong (you meant situation 1):
Use MEM_UOPS_RETIRED.ALL_LOADS_PS, and MEM_UOPS_RETIRED.ALL_STORES_PS 
(Please understand hardware events are overlapped in function. It means above two events cover: cache miss, memory load / store)
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For Nehalem processors, you can use:
MEM_INST_RETIRED.LOADS ;Instructions retired which contains a load
MEM_INST_RETIRED.STORES ;Instructions retired which contains a store
MEM_LOAD_RETIRED.LLC_MISS in Nehalem is equivalent to MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS in Sand bridge.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose that you have to use this for Nahalem: BUS_TRANS_MEM.ALL_AGENTS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DRAM access includes local dram and remote dram, use events MEM_UNCORE_RETIRED.LOCAL_DRAM and MEM_UNCORE_RETIRED.REMOTE_DRAM. Also you can simply use MEM_LOAD_RETIRED.LLC_MISS instead, whatever the counter was from local or remote.
Above DRAM access isnot to count memory access when data already is in L1-2-3 cache.
You have to use event FP_COMP_OPS_EXE.x87 to know FLOPS, anddivided by INST_RETIRED.ANY? Know average FLOPSper instruction?
Regards, Peter
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
