I have a pair of microbenchmark binaries, A and B, built from the same source tree, using different versions of the same compiler.
When I run these binaries under 'perf stat -e branches,branch-misses', I observe the following:
On Skylake, the branch mis-prediction rate of binary A is twice as high as binary B.
On Broadwell, the same discrepancy exists, but the ratio is flipped: that is, the mis-prediction rate for binary B is twice as high as binary A.
The total number of branches is the same for all cases.
What are the possible causes of the effect I am seeing?
Can you suggest a method for investigating this further?
For more complete information about compiler optimizations, see our Optimization Notice.