I am an undergrad workign on a performance profiling project. I specifically am measuring branch-miss impact on a bit of code using the Amplifier XE 2013 suite (vTune). I have found out where the highest branch miss rates occur.
My current goal is to come up with some kind of confirmation that this is indeed where the misses are happening. My section of code contains 27 branch-like statements (if, else if) that are condition based. I have successfully found a way to change these conditional branches into indirect jumps.
We are doing this on i5 Sandy Bridge 2400 chips running under Ubuntu 12.10. My understanding of Intel branch prediction breaks down into 2 parts. The predictor and the target buffer. The predictor does just that, predicts taken or not taken on conditional jumps using the branch history table and other means. The target buffer predicts WHERE that jump is going based on where it went last time that specific jump was encountered.
With my modified code, I have successfully bypassed the branch predictor (no condition to test for a jump). This is confirmed via dissassembly, I see a jmpq *(register). However, my indirect jumps are still victim to misses in the branch target buffer, and what I have noticed is that my runtime is actually WORSE by switching my conditional branches over to indirect jumps. This does make sense because I am calculating my jump address (hence the indirectness) and therefore the BTB has a much harder time prediciting it(large number of possible addresses I'm jumping to).
My Question: Is there a way in vTune that I can profile branch misses more directly. I want to know WHERE I'm missing (branch predictor or BTB). If you consider a basic conditional, there are 4 outcomes. The branch predictor with taken/not taken, then each of those has a hit/miss with the BTB. If I could show that my misses in the BTB are increasing and the misses in the branch predictor are decreasing as I move my conditionals over to indirects, then I can accomplish my goal.
Any input is appreciated. Please let me know if I have left anything out. And thank you!
It's easier to use "General Exploration" analysis type or use event named BR_MISP_RETIRED.ALL_BRANCHES_PS directly to profile your program, know where code has high BTB miss (by viewing bottom-up report then opening hot function to enter source view, you should see Branch Misprediction event count in your source lines, probably in "if-else if-" statements)
Avoid high branch mispredition in code, you can do one of below:
1. Adjust "if-else if -" statement, put high-possibility conditions ahead in statement, that is, try to put jump statement address with condition statement address in same cache line (64 bytes)
2. Use Intel C++ compiler, with advanced option PGO to build. Running several times of your program, next time rebuilding program will use profiled result to adjust code-sequence. In this way, you don't need to modify code and avoid high branch prediction miss.
Absolutely, you can profile new program to compare with previous one by using VTune(TM) Amplifier XE.
Thanks for the reply. Is there anyway to distinguish between where the branch misses are occuring (in prediction or target buffer)? I have succesfully profiled the branch misses as a whole, but I am attempting to determine the real life impact of these misses. Obviously the 20*#of misses gets us close (according to developer manual average miss penalty is 14-20 cycles).
Great idea with the Intel C++ compiler, and I would be doing that but unfortunatly the code will only compile with GCC. The intel compiler returns the infamous "backend signals" error.
Do you worry there is no BTB resource available when running your code, in branch? If so, there was misprediction issue. Hardware will fetch instructions from normal code sequence after running a condition.
If no, hardware always feteches instructions from BTB. Event I mentioned in last post, measure all mispredited instructions which are already in BTB. How to distinguish "mispredicion" and BTB? Maybe I don't understand your concern in depth.
I have a limited understanding, but this is my current knowledge. When you have a branch "miss" there are two possible places the miss can occur. The first being in the predictor. It simply guesses (its an educated guess) wrong. I.e. guesses taken, and it is not taken. This would be seen in the case of a conditional branch (something like: if (x>y)).
The second is in the BTB. This is where I have limited understanding. From what I've gathered, the BTB guesses WHERE a branch is jumping to. This is going to be an address. An example is with an indirect jump. In my particular case, I am calculating an address and storing it in register %rax. Then I see a "jmpq *%rax". In this case, there is no prediction to be made because the jump isn't conditional. However, it can still be missed because at the time the jmpq instruction is fetched, the value in %rax isn't necessarily valid and is guessed via the BTB.
I'm trying to determine where the misspredict is occuring, in the prediction (taken vs nottaken), or in the BTB (correct vs incorrect address). I have 2 versions of code. One with conditional branches, and one with indirect jumps. I'm simply looking to verify that even though I can eliminate the condition, the miss is still occuring in the BTB.
Again, I'm no expert, I'm an undergrad. Any advice is appreciated.
Event BR_MISP_RETIRED.ALL_BRANCHES_PS includes taken and not-taken cases.
You might use below Events to distinguish them:
Above events are not in pre-defined analysis types, you may collect data by using these events in command line directly, learn this article.
There is distinction only between taken and not taken branches.MSR_BPU_ESCR0 register(counter) can count four types of branches at address bit offset[24-9] - Branch not taken predicted,Branch not taken mispredicted,Branch taken predicted,Branch taken mispredicted.