Modern software development is based on software repositories and changes committed to those repositories. However, there is an inadequate insight into the nature of changes committed to repositories of different sizes. A data-based characterization of commit activity in large software hubs contributes to a better understanding of software development and can feed into early detection of bugs at the earliest phases. Here, we present preliminary results from characterizing the distribution of 452 million commits in a metadata listing from GitHub repositories. Based on multiple distributions, we find the best fits and second best fits across different ranges in the data. The characterization is aimed at synthetic repository generation suitable for use in simulation and machine learning.