Minimum Redirect Latency from IF/DE and Uop$

perfwise · ‎02-12-2013

Hi,

As anyone who cares about performance knows, the # of branch mispredicts and the latency to get back on the good path are important drivers of performance. Some of my applications have a fair number, so I decided to explore what this is on the chips in my hands which I have an SB and IB. I was very surprised by the reductions in the "minimum" redirect latency from the uop$ and I was also surprised by the increase when coming from the uop$. Using a list of "indirect jmps" I determine that the minimum redirect latency is 15 clks on SB and IB from the uop$.. and I'm observing 22 (SB) and 23 (IB) when redirecting from the IF/DE which doesn't happen often due to the high hit rate in the uop$.

My question is 2 fold:

* are my estimates of the minimum redirect latency correct from the IF/DE, good to know if I understand my findings above

* if they are correct, then why is it 7-8 clks longer? I know I may not get a response to this and it doesn't affect me since I'm just inquiring from an inquisitive perspective.

Overall.. a very interesting endeavor. Thanks in advance for any pointers or help in understanding..

perfwise

Bernard · ‎02-12-2013

>>>IF/DE>>> What does it stand for?

perfwise · ‎02-12-2013

IF = Instruction Fetch and DE = Decode, essentially what ILD in the Intel Opt Guide is doing. I just wonder why it takes 7 more clocks for the redirect than it does from the uop$.

Perfwise

Bernard · ‎02-12-2013

Thanks for explanation.

How did you get your measurements?Was that VTune or other Intel monitoring tool?

perfwise · ‎02-13-2013

I wrote a directed test to measure this... which I do often to understand the drivers of my codes performance.

Perfwise

Bernard · ‎02-13-2013

Are you writing kernel mode driver and with its help getting access to performance counters control and counters MSR registers?

perfwise · ‎02-14-2013

Yes...but documentation of the pmcs and msrs is poorly documented.

Bernard · ‎02-14-2013

perfwise wrote:

Yes...but documentation of the pmcs and msrs is poorly documented.

Me too.Unfortunataly for now Ihave only Core i3 CPU so I cannot use funcionality of Xeon Uncore PMU.If you are interested Ican share with you my work?

Do you use in your code inline assembly?

perfwise · ‎02-21-2013

My code is a combination of C and assembly, I don't use inline asm. I'd rather not share my work but I use this forum to relay my experiences.

Perfwise

Bernard · ‎02-21-2013

perfwise wrote:

My code is a combination of C and assembly, I don't use inline asm. I'd rather not share my work but I use this forum to relay my experiences.

Perfwise

Thanks perfwise

I use mostly inline assembly where I need to access MSR registers for setting and reading counters and registers values.How you display gathered data.I use for this purpose DbgPrint or KdPrint functions with DbView to intercept and print those values.