Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Minimum Redirect Latency from IF/DE and Uop$

perfwise
Beginner
477 Views

Hi,

    As anyone who cares about performance knows, the # of branch mispredicts and the latency to get back on the good path are important drivers of performance.  Some of my applications have a fair number, so I decided to explore what this is on the chips in my hands which I have an SB and IB.  I was very surprised by the reductions in the "minimum" redirect latency from the uop$ and I was also surprised by the increase when coming from the uop$.  Using a list of "indirect jmps" I determine that the minimum redirect latency is 15 clks on SB and IB from the uop$.. and I'm observing 22 (SB) and 23 (IB) when redirecting from the IF/DE which doesn't happen often due to the high hit rate in the uop$.

    My question is 2 fold:

   * are my estimates of the minimum redirect latency correct from the IF/DE, good to know if I understand my findings above

   * if they are correct, then why is it 7-8 clks longer?  I know I may not get a response to this and it doesn't affect me since I'm just inquiring from an inquisitive perspective.

Overall.. a very interesting endeavor.  Thanks in advance for any pointers or help in understanding..

perfwise

0 Kudos
9 Replies
Bernard
Valued Contributor I
477 Views
>>>IF/DE>>> What does it stand for?
0 Kudos
perfwise
Beginner
477 Views

IF = Instruction Fetch and DE = Decode, essentially what ILD in the Intel Opt Guide is doing.  I just wonder why it takes 7 more clocks for the redirect than it does from the uop$.

Perfwise

0 Kudos
Bernard
Valued Contributor I
477 Views

Thanks for explanation.

How did you get your measurements?Was that VTune or other Intel monitoring tool?

0 Kudos
perfwise
Beginner
477 Views

I wrote a directed test to measure this... which I do often to understand the drivers of my codes performance.

Perfwise

0 Kudos
Bernard
Valued Contributor I
477 Views
Are you writing kernel mode driver and with its help getting access to performance counters control and counters MSR registers?
0 Kudos
perfwise
Beginner
477 Views

Yes...but documentation of the pmcs and msrs is poorly documented.  

0 Kudos
Bernard
Valued Contributor I
477 Views

perfwise wrote:

Yes...but documentation of the pmcs and msrs is poorly documented.  

Me too.Unfortunataly for now Ihave only Core i3 CPU so I cannot use funcionality of Xeon Uncore PMU.If you are interested Ican share with you my work?

Do you use in your code inline assembly?

0 Kudos
perfwise
Beginner
477 Views

My code is a combination of C and assembly, I don't use inline asm. I'd rather not share my work but I use this forum to relay my experiences.  

Perfwise

0 Kudos
Bernard
Valued Contributor I
477 Views

perfwise wrote:

My code is a combination of C and assembly, I don't use inline asm. I'd rather not share my work but I use this forum to relay my experiences.  

Perfwise

Thanks perfwise

I use mostly inline assembly where I need to access MSR registers for setting and reading counters and registers values.How you display gathered data.I use for this purpose DbgPrint or KdPrint functions with DbView to intercept and print those values.

0 Kudos
Reply