Runtime models based on L1-TLB metrics

HodB · ‎12-28-2022

We are building a model to predict the CPU runtime of a Haswell Intel Xeon E5-x600 v3 processor using the `perf` tool from the `linux-tools` package and running various benchmarks. The model uses TLB events, including dTLB and sTLB, as predictors. However, the results we obtained were unexpected, as a single-feature model over the same benchmark (and several others) showed a trend opposite to what we anticipated:

We would expect that higher number of hits will lead to decrease in cpu cycles.

Upon further investigation, we realized that we had not considered the out-of-order execution (OOOE) characteristics of the Haswell processor, which includes both retired uops and speculative events. As our goal is to compare "apples to apples," we want to ensure that we are comparing either retired events to retired events or speculative events to speculative events. We are now seeking to include speculative dTLB accesses and retired sTLB hits in our model to achieve proper predictors.

speculative dTLB accesses attempt: `L1-dcache-loads` and `L1-dcache-stores` counters as potential predictors for speculative dTLB accesses, but we later learned that these counters refer to retired events rather than speculative ones.

retired sTLB hits: We have been unable to identify a suitable counter for retired sTLB hits out of `perf –list` options.

Perhaps there is another way to calculate those features indirectly using other counters we hadn't considered