Identifying Data Block Access Latencies and Ping-Pong Activity across

Can one measure on a per thread basis the actual cost of memory block access in multi-threaded code on a nehalem-EP platform using VTune? For instance can one measure the %age of memory accesses from the different memories per thread? Can the same infor be broken down per cache level hierarchy ?

Can one measure how often the same cache block ping-pongs among caches when threads running on different sockets compete for write access to it?

Can one measure teh cache miss rates / level per thread using VTune ?

thanks ...
