Using Machine Learning Towards Early Flagging of Potentially Buggy Software Commits

Abstract

Detection and flagging software bugs is in general a difficult problem, especially in collaboratively developed software managed via large and dynamic software development systems. There are various levels at which this problem can be tackled, such as analysis at the level of source code or binaries. Here, we explore a different approach, namely, investigating the potential to tackle it at the level of metadata in software repositories and provide a problem formulation suitable for solution using machine learning. We build an implementation of the solution using multiple classifiers, and verify the feasibility of our approach on metadata from synthetic datasets modeled by a characterization of a few large software repositories and developer profiles. The results show that, with sufficient amount of past metadata for training, it is conceivable to flag (as potentially buggy or not) new software commits to repositories. Good accuracy is observed via inferencing on suitable classifiers trained on the metadata, although, on increasing data sizes, some saturation of the accuracy is observed. To further increase the efficiency, additional metadata (such as inter-entity linkages) and more complex machine learning techniques (such as graph neural networks) may be warranted to tap more latent information in the software evolution processes. Nevertheless, the metadata-based learning approach appears promising as an automatable service towards early flagging of potentially buggy commits in software repositories. Such flags can conceivably augment the features provided as services by software hosting companies or institutions with large software bases.

Kalyan Perumalla
Kalyan Perumalla

Kalyan Perumalla is Founder and President of Discrete Computing, Inc. He led advanced research and development at ORNL and holds senior faculty appointments at UTK, GT, and UNL.

Next
Previous

Related