Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

PMCs between Core microarchitecture and Sandy Brige

Chenjie_Y_
Beginner
256 Views

I am now working for a power modeling project on chip E5-4603 (SandyBridge). I read several papers that introduce a set of activity ratios (say input of the model) based on performance monitoring counter. Unfortunately, these papers are for Core microarchitecture chips.

So my first job is to find the counterparts of these (Core micoarthitecture) PMC based activity ratio in our Sandy Bridge chip.

 For exampe, in Core architecture we can formulate, UOPS_RETIRED.ANY/CPU_CLK_UNHALTED.CORE_P, as the so-called activity ratio of the whole in-order engine (i.e. front end, including components like Fetch unit, DECODE unit, etc.), except for BPU, which can be characterized by a standalone activity ratio, BR_INST_DECODED/CPU_CLK_UNHALTED.CORE_P.

In contrast, I found a similar counterpart for the front end (also without BPU) in our Sandy Bridge PMC:

UOPS_RETIRED.ALL /CPU_CLK_UNHALTED:THREAD_P (Question 1, is that correct?)

But for BPU, it’s a little bit uncertainty, because I didn’t find such an event inSandyBridge, which has the totally same meaning like the event

BR_INST_DECODED in Core micoroarchitecture.

 I can only find a not so similar event: BR_INST_RETIRED.ALL_BRANCHES. Obviously, according to their definitions, BR_INST_DECODED and BR_INST_RETIRED.ALL_BRANCHES are very easy to distinguish, and their difference normally can not be ignored. (Question 2) Is there any better(more similar) substitute in Sandy Bridge for BR_INST_DECODED in Core micoroarchitecture.

 

And in Core micoroarchitecture, one can use the following ratio to formulate the activity ratio of the integer computing operation (here only normal integer computing operations are considered, SIMD Integer operations exclusive):

(RS_UOPS_DISPATCHED_CYCLES.PORT_0 + RS_UOPS_DISPATCHED_CYCLES.PORT_1 + RS_UOPS_DISPATCHED_CYCLES.PORT_5 - FP_COMP_OPS_EXE-SIMD_UOPS_EXEC - BR_INST_RETIRED.ANY)/ CPU_CLK_UNHALTED:THREAD_P

It’s easy to get the above ratio, because all the computational operations are operated on Port 0, Port1 and Port5 inthe out-of-order engine of Core micoroarchitecture, and one can easily get the statistics of integer operation by just excluding the floating part, SIMD and branch operation from the total.

(Question 3) Is there any event (or a group of events) provided inSandy bridge can function like SIMD_UOPS_EXEC in Core micoroarchitecture? Or in other way around (actually more straight-forward way) is there any event can directly tell the number of integer operation (also SIMD integer operations exclusive)?

Question 4Is there any offcore(uncore)event for FSB (front side bus)monitoring in Sandy Bridge?

 Many thanks!

The paper is added as an attachment.

0 Kudos
0 Replies
Reply