Chapter20: Graph Neural Networks in Computer Vision

Siliang Tang, Zhejiang University,
Wenqiao Zhang, Zhejiang University,
Zongshen Mu, Zhejiang University,
Kai Shen, Zhejiang University,
Juncheng Li, Zhejiang University,
Jiacheng Li, Zhejiang University,
Lingfei Wu, JD.COM Silicon Valley Research Center,


Recently Graph Neural Networks (GNNs) have been incorporated into many Computer Vision (CV) models. They not only bring performance improvement to many CV-related tasks but also provide more explainable decomposition to these CV models. This chapter provides a comprehensive overview of how GNNs are applied to various CV tasks, ranging from single image classification to crossmedia understanding. It also provides a discussion of this rapidly growing field from a frontier perspective.


  • Introduction
  • Representing Visions as Graphs
    • Visual Node representation
    • Visual Edge representation
  • Case Study 1: Image
    • Object Detection
    • Image Classification
  • Case Study 2: Video
    • Video Action Recognition
    • Temporal Action Localization
  • Other Related Work: Cross-media
    • Vision Caption
    • Visual Question Answering
    • Cross-Media Retrieval
  • Frontiers for GNNs on Computer Vision
    • Advanced GNN Modeling Methods for Computer Vision
    • Broader Area of GNNs on Computer Vision
  • Summary


author = "Tang, Siliang and Zhang, Wenqiao and Mu, Zongshen and Shen, Kai and Li, Juncheng and Li, Jiacheng and Wu, Lingfei",
editor = "Wu, Lingfei and Cui, Peng and Pei, Jian and Zhao, Liang",
title = "Graph Neural Networks in Computer Vision",
booktitle = "Graph Neural Networks: Foundations, Frontiers, and Applications",
year = "2022",
publisher = "Springer Singapore",
address = "Singapore",
pages = "447--462",

S. Tang, W. Zhang, Z. Mu, K. Shen, J. Li, J. Li, and L. Wu, “Graph neural networks in computer vision,” in Graph Neural Networks: Foundations, Frontiers, and Applications, L. Wu, P. Cui, J. Pei, and L. Zhao, Eds. Singapore: Springer Singapore, 2022, pp. 447–462.