Is there a way to find out the breakdown of stall cycles on the Intel Pentium processors using performance counters? In other words, I want to find out what percentage of the total execution time was unstalled, had data cache stalls, br-mispred-stalls, etc. etc. I am interested in analyzing some application and finding out which stalls are most common so that I may focus my efforts on optimizing for these stalls.
I know that the Itanium family has a lot of counters that give a detailed breakdown- i.e. data access stall cycles, scoreboard dependency stalls, branch misprediction stalls, unstalled cycles, i-cache stalls, etc, etc.
Specifically I am interested in P3 related information. Is there some document that explains how to derive the stall cycle contributions from performance counter data?