Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Performance Counters on SandyBridge

Yunqi_Z_
Beginner
568 Views

Hi all,

I've just started doing some profiling work on SandyBridge recently, so the following questions might be stupid.

I've checked the Intel SDM and found CYCLE_ACTIVITY should be very useful for my work. But when I actually tried to get that counter but found it seems that only IvyBridge has it. It that right?

In other words, my goal is to find how many cycles are stalled on data for a certain application. How can I do that on a SandyBridge machine (or IveyBridge).

Thanks a lot!

0 Kudos
12 Replies
Bernard
Valued Contributor I
568 Views

You can use VTune for gathering CPU related activity(Count of retired uops and etc...).

0 Kudos
Yunqi_Z_
Beginner
568 Views

I was wondering if there is suppor for CYCLE_ACTIVITY on Sandy Bridge. Because the optimization manual said so but I couldn't find it on the Software Develop's Manual. Thanks!

0 Kudos
Bernard
Valued Contributor I
568 Views

Yunqi Z. wrote:

I was wondering if there is suppor for CYCLE_ACTIVITY on Sandy Bridge. Because the optimization manual said so but I couldn't find it on the Software Develop's Manual. Thanks!

Where in the SDM did you try to find it?You need to refer to Volume 3 System Programming Manual chapters 18 and 19.

0 Kudos
Yunqi_Z_
Beginner
568 Views

In the 64 IA-32 Architectures Optimization Manual, Appendix B.3.2.3, there mentioned to use CYCLE_ACTIVITY.STALLS_LDM_PENDING and other CYCLE_ACTIVITY counters to characterize memory subsystem. But in SDM Volume 3 19.4 for Sandy Bridge, there aren't these counters.

And I also found a link (http://software.intel.com/en-us/forums/topic/277820) saying there should be these counters on Sandy Bridge.

Thanks a lot!

0 Kudos
Yunqi_Z_
Beginner
568 Views

In the 64 IA-32 Architectures Optimization Manual, Appendix B.3.2.3, there mentioned to use CYCLE_ACTIVITY.STALLS_LDM_PENDING and other CYCLE_ACTIVITY counters to characterize memory subsystem. But in SDM Volume 3 19.4 for Sandy Bridge, there aren't these counters.

And I also found a link (the title is "Ivy Bridge performance monitoring events CYCLE_ACTIVITY.*?") saying there should be these counters on Sandy Bridge.

Thanks a lot!

0 Kudos
Bernard
Valued Contributor I
568 Views

Are you refering to this link :http://software.intel.com/en-us/forums/topic/277820

I went through the all posts in that thread and it was clearly stated by one of the Intel engineer that futute editions of SDM will include information about the counters on Sandy Bridge.

What SDM revision do you use?

0 Kudos
Bernard
Valued Contributor I
568 Views

Btw. You have a nice avatar.IIRC this is J.B Fourier.

0 Kudos
Yunqi_Z_
Beginner
568 Views

In the 64 IA-32 Architectures Optimization Manual, Appendix B.3.2.3, there mentioned to use CYCLE_ACTIVITY.STALLS_LDM_PENDING and other CYCLE_ACTIVITY counters to characterize memory subsystem. But in SDM Volume 3 19.4 for Sandy Bridge, there aren't these counters.

And I also found a link (the title is "Ivy Bridge performance monitoring events CYCLE_ACTIVITY.*?") saying there should be these counters on Sandy Bridge.

Thanks a lot!

0 Kudos
Yunqi_Z_
Beginner
568 Views

Aha, that's right! Thanks

0 Kudos
Bernard
Valued Contributor I
568 Views

>>>And I also found a link (the title is "Ivy Bridge performance monitoring events CYCLE_ACTIVITY.*?") saying there should be these counters on Sandy Bridge>>>

Did you mention this link :And I also found a link :http://software.intel.com/en-us/forums/topic/277820

There is a respone from one of the Intel engineers he clearly states that future revision of SDM will include those counters mentioned by you.

0 Kudos
perfwise
Beginner
568 Views

On Intel or any architecture... I would propose looking at the front end "uops per clock" provided while the front end is busy.  So count the clks that the front end is actually doing something, that includes the DSB / MS and ILD, and then compare that with the execution core's upc ( uops per clock ).  If the front end upc while it's busy == that of the execution core, then you might be front end limited.  I only mention this since you're focusing on activity.. and thought you might think there's some limitation in the front of the machine.  In my inspections on many applications, Intel's rarely limited in the front end of their pipeline and their DSB provides much greater throughput than the execution core can  chew.  You also might want to generate a distribution of the throughput of the various front end and execution resources to see how often nothing is done... it's a large % of the time.

perfwise

0 Kudos
Yunqi_Z_
Beginner
568 Views

Thanks a lot perfwise. :)

0 Kudos
Reply