Using Machine Learning Towards Early Flagging of Potentially Buggy Software Commits

April 2022

Abstract

Detection and flagging software bugs is in general a difficult problem, especially in collaboratively developed software managed via large and dynamic software development systems. There are various levels at which this problem can be tackled, such as analysis at the level of source code or binaries. Here, we explore a different approach, namely, investigating the potential to tackle it at the level of metadata in software repositories and provide a problem formulation suitable for solution using machine learning. We build an implementation of the solution using multiple classifiers, and verify the feasibility of our approach on metadata from synthetic datasets modeled by a characterization of a few large software repositories and developer profiles. The results show that, with sufficient amount of past metadata for training, it is conceivable to flag (as potentially buggy or not) new software commits to repositories. Good accuracy is observed via inferencing on suitable classifiers trained on the metadata, although, on increasing data sizes, some saturation of the accuracy is observed. To further increase the efficiency, additional metadata (such as inter-entity linkages) and more complex machine learning techniques (such as graph neural networks) may be warranted to tap more latent information in the software evolution processes. Nevertheless, the metadata-based learning approach appears promising as an automatable service towards early flagging of potentially buggy commits in software repositories. Such flags can conceivably augment the features provided as services by software hosting companies or institutions with large software bases.

Type

Conference paper

Cybersecurity AI ML Graph Binary Classifier Source code Software Repositories Commit

Kalyan Perumalla

As a Federal Program Manager in Advanced Scientific Computing Research at the U.S. Dept. of Energy, Office of Science, Kalyan Perumalla manages a $100-million R&D portfolio covering AI, HPC, Quantum, SciDAC, and Basic Computer Science. In his 25-year R&D leadership experience, he previously led advanced R&D as Distinguished Research Staff Member at the Oak Ridge National Laboratory (ORNL) developing scalable software and applications on the world’s largest supercomputers for 17 years, including as a line manager and a founding group leader. He has held senior faculty and adjunct appointments at UTK, GT, and UNL, and was an IAS Fellow at Durham University.

Using Machine Learning Towards Early Flagging of Potentially Buggy Software Commits

Abstract

Kalyan Perumalla

Related