Graph AI

Graph Mining, Graph Machine Learning, and Graph Neural Networks

Xuhao Chen

Why Graph AI?

Recent success of deep learning has boosted research in almost every aspect of computer science. Machine learning tasks, like object detection, machine translation, and speech recognition, have been given new life with end-to-end deep learning paradigms like CNN, RNN, transformer, etc.

Deep Learning is good at capturing hidden patterns of Euclidean data (images, text, videos). But what about applications where data is generated from non-Euclidean domains, represented as graphs with complex relationships and interdependencies between objects?

That’s where Graph AI or Graph ML come in, which we’ll explore in this article.

What is Graph AI and Graph ML?

We know that conventional AI learns from independent data samples, like images. However, in many application scenarios, data objects are not isolated, like people in the social networks, they are inherently connected or corelated. In other words, the objects and relationships between objects from a graph. So Graph AI can leverage this relational information in the graph data to potentially learn better, by introducing a graph to model the relationships.

Directed graph
Figure 1: Comparing Conventional AI (left) with Graph AI (right).

Generally speaking, Graph AI refers to any intelligent algorithms on graph data. Particularly, here we talk about data mining and machine learning on graphs, which are refered to as Graph Mining and Graph Machine Learning respectively. Graph Mining and Graph ML can be thought of as two different approaches to extract information from the graph data. But they are not orthogonal; instead, they have been deeply interacting with each other.

Graph Mining is a process in which data mining techniques are used in finding a pattern or relationship in the given real-world collection of graphs. By mining the graph, frequent substructures and relationships can be identified which helps in clustering the graph sets, finding a relationship between graph sets, or discriminating or characterizing graphs. In the literature, Graph Mining includes many subtopics: Graph Pattern Mining, Graph Classification, Graph Compression, Graph Dynamics, Graph Clustering, Link Analysis, Graph Summarization and Visualization, Graph Modeling and Statistics, etc.

Graph Machine Learning uses machine learning techniques to process graph data, and leverages the power of the relation between entities that can be used for predictive, modeling, and analytics tasks. Graph ML covers many subtopics as well: Graph Embedding, Graph Neural Networks, Knowledge Graphs, Influence Maximization, Disease Outbreak Detection, Social Network Analysis, etc. With the booming ML, Graph ML has been shown to achieve better performance than the traditional graph mining approach, in many predictive and analytics tasks. So we will focus on graph ML in this article.

Next, we’ll start with graph theory and graph algorithms, move on to graph ML and GNN, and finish with some applications of GNN.

Table of contents

What is a Graph?

In computer science, a graph is a data structure consisting of two components: nodes (vertices) and edges. A graph G can be defined as G = (V, E), where V is the set of nodes, and E are the edges between them. If there are directional dependencies between nodes then edges are directed. If not, edges are undirected.

Directed graph
Directed Graph | Source

A graph can represent things like social media networks, protein networks, molecules, financial transactions, etc. For example, in a social network, nodes are the peopole, and each edge represents the relationships between two people, such as friendship.

A graph is often represented by A, an adjacency matrix. If a graph has n nodes, A has a dimension of (n × n). Sometimes the nodes have a set of features (for example, a user profile). If the node has f numbers of features, then the node feature matrix X has a dimension of (n × f).

Why is it hard to analyze a graph?

Graph data is so complex that it’s created a lot of challenges for existing machine learning (ML) algorithms and systems. The reason is that conventional ML tools are specialized in simple data types, like images with the same structure and size, which we can think of as fixed-size grid graphs. But graphs is irregular with a variable size of unordered nodes, where nodes can have different amounts of neighbors. This is unfriendly to modern computer systems, particularly GPUs, which are optimized for regular data and algorithms. Moreover, existing ML algorithms have a core assumption that instances are independent of each other. This is false for graph data, as each node is related to others by links of various types.

Graph Processing Systems

Graph Neural Network

Graph Neural Networks (GNNs) are a class of ML methods designed for graphs. GNNs borrows the idea of neural network (NN) and apply it to graphs, which provide an easy way to do node-level, edge-level, and graph-level prediction tasks. GNNs can do what Convolutional Neural Networks (CNNs) failed to do (See here for explaination). Representative GNN designs include Graph Convolutional Network (GCN) and GraphSAGE.

Applications of GNNs

Although proposed not long ago, GNNs have been widely used to boost performance in numerous applications, including product recomendation, fraud detection, social influence prediction, drug discovery, electrical health records modeling, brain networks, and adversarial attack prevention etc. Besides, there have been attempts to apply GNNs to a variety of computer system problems, such as program verification and program reasoning.

Here is a list of GNN applications. Let’s go through some applications across domains where GNN can resolve various challenges.

GNNs in Fraud Detection

Papers on GNN-based fraud detection

A Talk on GNN-based fraud detection

Resources