As anyone who cares about performance knows, the # of branch mispredicts and the latency to get back on the good path are important drivers of performance. Some of my applications have a fair number, so I decided to explore what this is on the chips in my hands which I have an SB and IB. I was very surprised by the reductions in the "minimum" redirect latency from the uop$ and I was also surprised by the increase when coming from the uop$. Using a list of "indirect jmps" I determine that the minimum redirect latency is 15 clks on SB and IB from the uop$.. and I'm observing 22 (SB) and 23 (IB) when redirecting from the IF/DE which doesn't happen often due to the high hit rate in the uop$.
My question is 2 fold:
* are my estimates of the minimum redirect latency correct from the IF/DE, good to know if I understand my findings above
* if they are correct, then why is it 7-8 clks longer? I know I may not get a response to this and it doesn't affect me since I'm just inquiring from an inquisitive perspective.
Overall.. a very interesting endeavor. Thanks in advance for any pointers or help in understanding..
IF = Instruction Fetch and DE = Decode, essentially what ILD in the Intel Opt Guide is doing. I just wonder why it takes 7 more clocks for the redirect than it does from the uop$.
Yes...but documentation of the pmcs and msrs is poorly documented.
Me too.Unfortunataly for now Ihave only Core i3 CPU so I cannot use funcionality of Xeon Uncore PMU.If you are interested Ican share with you my work?
Do you use in your code inline assembly?
My code is a combination of C and assembly, I don't use inline asm. I'd rather not share my work but I use this forum to relay my experiences.
I use mostly inline assembly where I need to access MSR registers for setting and reading counters and registers values.How you display gathered data.I use for this purpose DbgPrint or KdPrint functions with DbView to intercept and print those values.