I try to use VTune to measure the performance of the Atom processor, and get a little confused by the hardware events. For the bus events, it provides separate events for a single core and for all agents on the bus. For example,BUS_TRANS_MEM.ALL_AGENTS andBUS_TRANS_MEM.SELF. For other events, it provides for a single core, such asL2_M_LINES_OUT.SELF.
However, it seems to me that events such as L2_M_LINES_OUT.SELF andBUS_TRANS_MEM.SELFcount the events of the entire processor. For example, I run my program, which uses two cores, and the result fitsBUS_TRANS_MEM.SELF much better.BUS_TRANS_MEM.ALL_AGENTS is four times as large asBUS_TRANS_MEM.SELF. (my Atom processor has two cores, and hyperthreading is enabled, so there are 4 logical processors.)
My question is: do all logical cores of an Atom processor use one set of performance counters? If I want to measure the throughput of memory bus, should I look atBUS_TRANS_MEM.SELF orBUS_TRANS_MEM.ALL_AGENTS?