Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.



I've been exploring the perfomrance of branch prediction lately on my SB and IB parts.  I've observed the redirect latencies from the uop$ and from the ILD differ greatly.. but I'm still waiting on some explanation of the difference in 15 and 22-23 clks I observe between the uop$ and the ILD.   I measured these latencies using indirect jumps.

I then started exploring the JCC prediction behavior.  Again, because branches in my code and those high performance code I analyze are important drivers of performance.  I observed that SB and IB never mispredict a series of continually taken branches.. which surprised me.  I wondered how this could be the case.. and then I investigated the BACLEAR and BPUCLEAR stats in PMC 0xE6 and 0xE8.  I observe these are occuring.  Can you explain, if possible what these are?  The System and Optimization guides don't do any justice to their explanation.

If you have a series of taken branches, 3000 to possibly 10K, my questions are:

  • I see no BACLEARs till I start missing branches in your BTB, which is what I was interested in observing, it just surprised me that for 10K branches you never mispredicted if they were always taken
  • you never mispredict according to your PMC stats for branch prediction (again these are likely based upon executed branches)
  • I would presume you initially take the not-taken path
  • however I'm observing a fair amount of "late BPU clears" as well as "BACLEARs"
  • you likely don't know the branch is a branch until later I presume tilly our branch gets to your instruction decoder, you have no record of that branch in your BTB. Question: is this what a "late BPU clear" is. You find there's a branch, and it's a JCC, and you re-direct your fetch to the not-taken path?  
  • Once you determine it's a branch, and your re-directed to the not-taken path, which is incorrect, you then just have a "gizmo" of sorts which sees that all the JCC are taken and then you redirect the fetch again, but this time to the taken path.  Question: is a BACLEAR the 2nd redirect which uses the said "gizmo" in the BP to redirect fetch to the front end?

Again, thanks for any pointers or insight.  Also .. if someone could answer my earlier post about IF redirect in uop$ vs ILD.. I'd very much appreciate it.


0 Kudos
0 Replies