Automatic Software Performance Defect Detection with Zero-Positive Learning

MaryT_Intel · ‎12-20-2019

Machine programming is the field of research concerned with automating the development and maintenance of software (and, as a byproduct, hardware). Using the nomenclature defined in the “Three Pillars of Machine Programming”, this work falls principally in the adaptation pillar, where the automation techniques for software are focused on its adaptation to be, for example, more correct, performant, and secure.

A cornerstone of adaptation is the software update. Generally speaking, such updates are intended to enrich a software program’s feature set or improve the program’s performance. However, changes in code can sometimes adversely affect the software in a variety of unexpected ways. One type of adverse change is known as a performance regression, where the performance of the software is unintentionally degraded. Identifying regressions, performance or otherwise, is an important part of the development process, but has, historically, been a manual and tedious process.

Manual regression testing is slow, laborious, and, because it relies on human judgement, error-prone. It is often incomplete or insufficient; developers performing manual regression testing may not have the expertise necessary to perform it exhaustively or soundly. Testing and benchmarking tools that detect performance regressions are available, but often fail to find the root cause of a problem due to various limitations, such as introducing a probe effect that may inadvertently hide the performance regression. At the 2019 NeurIPS Conference, we will present AutoPerf, a novel approach to performance regression testing, which automates this historically manual process by using zero-positive learning, autoencoders, and hardware telemetry.

Zero-Positive Learning Meets Hardware Telemetry

Zero-positive learning (ZPL) is a semi-supervised machine learning (ML) technique developed for anomaly detection (AD). In AD terminology, anomalous data are represented as positives and normal data are represented as negatives. ZPL trains only on the negative (i.e., nominal) space, hence “zero-positives” exist in the training data. In the case of this work, performance defects are anomalies that deviate from the expected behavior of a given piece of software. When paired with the right ML modeling technique (in the case of this work, autoencoders), ZPL can act as a pragmatic solution to software update issues. This is because in many real-world cases, users only possess nominal data in their dataset even though they know anomalies exist.

AutoPerf identifies changes in software performance by examining hardware telemetry in the form of hardware performance counters (HWPCs). These HWPCs are purpose-built registers in CPUs that store counts of activities like cycles elapsed, cache hits and misses, branch predictions, and instructions executed. Current processors have hundreds of HWPCs, and processors in the future will provide even more. These HPWCs give us a lightweight mechanism to collect information on software activity without modifying source code and, perhaps more importantly, without introducing performance-impacting overhead.

How AutoPerf Identifies Performance Regression Bugs

Using a ZPL-trained autoencoder and HWPCs, AutoPerf looks for performance degradations in the newly checked-in version of source code. If a performance anomaly is detected, it is flagged at the function level and the system alerts the user. In our experiments thus far, AutoPerf has emitted no false negatives. That is, it has not misclassified a single performance anomaly, which we believe is critically important for the practical use of automated performance regression systems for production quality code.

Beyond performance regressions, the core technology used in AutoPerf is not constrained to performance anomalies. ZPL and autoencoders provide new insight into a possible future with far fewer labeled training data, which can often be a bottleneck in building new systems as well as being error-prone due to manual labeling of data. HWPCs also provide data on any detectable event, regardless of type, in a scalable fashion. This enables for fine-granularity even in the face of large-scale data growth.

Key Takeaways

AutoPerf is a novel and automatic performance regression detection system that fuses together three key elements: zero-positive learning, autoencoders, and hardware telemetry. In our NeurIPS paper, we illustrate AutoPerf’s utility across three types of parallel performance regressions in ten real-world performance bugs across seven benchmark and open source programs. We have found that, in general, AutoPerf can detect complex software performance defects, such as those hidden in parallel programs, at accuracy that is better than the prior state-of the-art. Moreover, for all of our experiments thus far, AutoPerf has yet to produce any false negatives (i.e., it has not missed a single performance defect, including the ones expert programmers have). We believe this is an essential metric for any automated performance regression tool that can be used reliably on production quality code.

For more on this research, read the entire paper, look for us at the 2019 NeurIPS conference, and stay tuned to @IntelAIResearch on Twitter.