Chapter23: Graph Neural Networks in Software Mining

Collin McMillan, University of Notre Dame,


Software Mining encompasses a broad range of tasks involving software, such as finding the location of a bug in the source code of a program, generating natural language descriptions of software behavior, and detecting when two programs do basically the same thing. Software tends to have an extremely well-defined structure, due to the linguistic confines of source code and the need for programmers to maintain readability and compatibility when working on large teams. A tradition of graph-based representations of software has therefore proliferated. Meanwhile, advances in software repository maintenance have recently helped create very large datasets of source code. The result is fertile ground for Graph Neural Network representations of software to facilitate a plethora of software mining tasks. This chapter will provide a brief history of these representations, describe typical software mining tasks that benefit from GNNs, demonstrate one of these tasks in detail, and explain the benefits that GNNs can provide. Caveats and recommendations will also be discussed.


  • Introduction
  • Software as a Graph
    • Macro versus Micro Representations
    • Combining the Macro- and Micro-level
  • Relevant Software Mining Tasks
  • Example Software Mining Task: Source Code Summarization
    • Primer GNN-based Code Summarization
    • Directions for Improvement
  • Summary


author = "McMillan, Collin",
editor = "Wu, Lingfei and Cui, Peng and Pei, Jian and Zhao, Liang",
title = "Graph Neural Networks in Software Mining",
booktitle = "Graph Neural Networks: Foundations, Frontiers, and Applications",
year = "2022",
publisher = "Springer Singapore",
address = "Singapore",
pages = "499--516",

C. McMillan, “Graph neural networks in software mining,” in Graph Neural Networks: Foundations, Frontiers, and Applications, L. Wu, P. Cui, J. Pei, and L. Zhao, Eds. Singapore: Springer Singapore, 2022, pp. 499–516.