There are two meanings of memory access counting:
1) Count all memory accesses, includes: load data from memory (local memory or remote memory), load data from cache (local cache or remote cache). Same logic in your algorithm.
2) Only count about loading data from memory. Count number is less than in your algorithm
You said "From another hand, it could be nice if I can also measure the number of LLC cache misses in Total", that is thesituation 2 (Load LLC missed from loca and remote). So simply use event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS.
If my assumption is wrong (you meant situation 1):
Use MEM_UOPS_RETIRED.ALL_LOADS_PS, and MEM_UOPS_RETIRED.ALL_STORES_PS
(Please understand hardware events are overlapped in function. It means above two events cover: cache miss, memory load / store)
For Nehalem processors, you can use:
MEM_INST_RETIRED.LOADS ;Instructions retired which contains a load
MEM_INST_RETIRED.STORES ;Instructions retired which contains a store
MEM_LOAD_RETIRED.LLC_MISS in Nehalem is equivalent to MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS in Sand bridge.
DRAM access includes local dram and remote dram, use events MEM_UNCORE_RETIRED.LOCAL_DRAM and MEM_UNCORE_RETIRED.REMOTE_DRAM. Also you can simply use MEM_LOAD_RETIRED.LLC_MISS instead, whatever the counter was from local or remote.
Above DRAM access isnot to count memory access when data already is in L1-2-3 cache.
You have to use event FP_COMP_OPS_EXE.x87 to know FLOPS, anddivided by INST_RETIRED.ANY? Know average FLOPSper instruction?