I was doing some experiments with Intel Advisor 2020 and in particular with the roofline model. Something I can't quite understand is why the peak scalar integer performance (intop/cycle) is different than the theoretical one that I would expect especially since all other metrics match more or less (vector integer performance, floating point..)
In particular according to Intel Advisor the max peak performance (for add/mul) is around 2.3 integer operations per cycle while the theoretical value I would expect to find is 4 intop/cycle since we have 4 INT ALU in 4 different ports.
Am I missing something?
Thanks for noticing this problem! We will investigate the issue - there are no obvious extra hardware limits for scalar integer ops, so our benchmark may provide suboptimal value.