Want to share your content on python-bloggers? click here.
Convolutional Neural networks (CNNs) have been found to be very effective at tasks such as facial recognition, video analysis, anomaly detections and semantic parsing. An image can be considered a graph with a very regular structure (i.e., pixels are considered nodes, with edges connecting adjacent nodes). A natural idea is to extend convolutions to general graphs for tasks such as graph, node or edge classification. However, it is not immediately clear how one would go about extending the concept of convolutions in the context of CNNs to graphs.
In the Kipf and Welling paper, Semi-Supervised Classification with Graph Convolutional Networks, the authors propose a method to approximate the spectral graph convolution via truncated Chebyshev polynomials, resulting in an efficient and scalable method to train graph neural networks. In this post, we walkthrough the graph convolutional network (GCN) propagation model, which we also implement in pytorch geometric. Thomas Kipf’s original PyTorch implementation is available here.
GCN Propagation Model
The GCN forward pass is presented as:
Where: –
–
–
–
–
–
–
–
–
This becomes clear with an example. We create an undirected graph with 5 nodes:
import networkx as nx import matplotlib.pyplot as plt G = nx.Graph() edges = [ (1, 2), (1, 3), (1, 4), (1, 5), (2, 5) ] G.add_edges_from(edges) fig, ax = plt.subplots(1, 1, figsize=(4, 4), tight_layout=True) pos = nx.spring_layout(G, seed=508) nx.draw_networkx_nodes(G, pos, node_size=125, node_color="blue") nx.draw_networkx_edges(G, pos, edgelist=edges, width=1, alpha=0.5, edge_color="black", style="solid") nx.draw_networkx_labels(G, pos, font_size=9, font_color="white", font_weight="bold", font_family="sans-serif") plt.axis("off") plt.show();
The elements of the adjacency matrix indicate whether pairs of vertices are connected in the graph. For the graph shown above, the adjacency matrix is given by:
The first row indicates the nodes adjacent to node 1. Since node 1 is connected to all other nodes, all but the first cell are set to 1 (no self connections in
After adding self-connections, the graph becomes:
G = nx.Graph() edges = [ (1, 2), (1, 3), (1, 4), (1, 5), (2, 5), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5) ] G.add_edges_from(edges) fig, ax = plt.subplots(1, 1, figsize=(4, 4), tight_layout=True) pos = nx.spring_layout(G, seed=508) nx.draw_networkx_nodes(G, pos, node_size=125, node_color="blue") nx.draw_networkx_edges(G, pos, edgelist=edges, width=1, alpha=0.5, edge_color="black", style="solid") nx.draw_networkx_labels(G, pos, font_size=9, font_color="white", font_weight="bold", font_family="sans-serif") plt.axis("off") plt.show()
To get to
Note that during training,
Each node has an associated feature vector of length
For a 2-layer GCN, the full propagation model for a node classification task is:
Derivation of Propagation Model
The authors begin with an expression to perform spectral convolutions on graphs:
where
Citing a paper by Hammond et. al, they propose an approximation to
where
By leveraging
From this point, the authors make three simplifications/assumptions which result in the propagation model introduced earlier:
The first two simplifications are
The third simplification sets
To address the issue of exploding/vanishing gradients during training they apply a renormalization trick, essentially swapping
import numpy as np import networkx as nx np.set_printoptions(suppress=True, precision=5) G = nx.Graph() edges = [(1, 2), (1, 3), (1, 4), (1, 5), (2, 5)] G.add_edges_from(edges) A = nx.adjacency_matrix(G).todense() I = np.identity(5) D = np.diag(np.sum(A, axis=1)) # Check range of eigenvalues without renormalization. Dp = np.linalg.inv(np.power(D, .5)) evals0, _ = np.linalg.eig(I + Dp @ A @ Dp) # Check range of eigenvalues after renormalization trick. Atilde = A + I Dtilde = np.diag(np.sum(Atilde, axis=1)) Dtildep = np.linalg.inv(np.power(Dtilde, .5)) evals1, _ = np.linalg.eig(Dtildep @ Atilde @ Dtildep) print(f"\neigenvalues without renormalization: {evals0}") print(f"eigenvalues with renormalization : {evals1}")
eigenvalues without renormalization: [0.19098 2. 1.30902 0.5 1. ] eigenvalues with renormalization : [-0.22526 1. 0.59192 0.5 0. ]
By replacing
where
The final propagation model has nothing to do with the spectral decomposition: No eigenvectors/eigenvalues are actually computed. The approach was motivated by spectral convolutions on graphs, but is not a spectral method itself.
PyTorch Geometric GCN Implementation
In what follows, we demonstrate how to set up a 2-layer GCN for node classification using PyTorch Geometric and the Cora citation network.
import random import matplotlib.pyplot as plt import networkx as nx import numpy as np import pandas as pd import torch from torch_geometric.datasets import Planetoid import torch_geometric.transforms as T from torch_geometric.utils import to_networkx device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") dataset = Planetoid(root='/tmp/Cora', name='Cora') split = T.RandomNodeSplit(num_val=0.15, num_test=0.10) graph = split(dataset[0])
Distribution of classes in Cora
The Cora dataset is a citation network, where each node represents a paper belonging to one of the following 7 classes. Our goal is to predict the labels assigned to each node in the network:
dclasses = { 0: "Theory", 1: "Reinforcement_Learning", 2: "Genetic_Algorithms", 3: "Neural_Networks", 4: "Probabilistic_Methods", 5: "Case_Based", 6: "Rule_Learning" } df = ( pd.DataFrame(graph.y.tolist()) .groupby(0, as_index=False).size() .rename({0: "id", "size": "n"}, axis=1) ) df["desc"] = df["id"].map(dclasses) df["prop"] = df["n"] / df["n"].sum() df[["id", "desc", "n", "prop"]].head(7)
id | desc | n | prop | |
---|---|---|---|---|
0 | 0 | Theory | 351 | 0.129616 |
1 | 1 | Reinforcement_Learning | 217 | 0.080133 |
2 | 2 | Genetic_Algorithms | 418 | 0.154357 |
3 | 3 | Neural_Networks | 818 | 0.302068 |
4 | 4 | Probabilistic_Methods | 426 | 0.157312 |
5 | 5 | Case_Based | 298 | 0.110044 |
6 | 6 | Rule_Learning | 180 | 0.066470 |
GCN Model
Next we implement GCN using PyTorch Geometric. The original PyTorch implementation is available here.
import torch.nn as nn from torch_geometric.nn import GCNConv import torch.nn.functional as F class GCN(nn.Module): def __init__(self, num_features, num_classes): super().__init__() self.conv1 = GCNConv(num_features, 64) self.conv2 = GCNConv(64, num_classes) def forward(self, data): x, edge_index = data.x, data.edge_index x = self.conv1(x, edge_index) x = F.relu(x) output = self.conv2(x, edge_index) return(output) def train_classifier(model, graph, optimizer, criterion, n_epochs=100): """ Training loop. """ tm, vm = graph.train_mask, graph.val_mask for epoch in range(n_epochs): model.train() optimizer.zero_grad() output = model(graph) loss = criterion(output[tm], graph.y[tm]) loss.backward() optimizer.step() # Validate. model.eval() pred = model(graph).argmax(dim=1) correct = (pred[vm] == graph.y[vm]).sum() acc = correct / vm.sum() if epoch % 10 == 0: print(f"Epoch {epoch}: train loss={loss:.3f}, val. acc={acc:.3f}.") return(model)
Training our network:
# Capture activations from last hidden layer to re-create plot # from paper. activations = {} def get_activation(name): def hook(gcn, input, output): activations[name] = output.detach() return(hook) # Put graph on GPU if available. graph = graph.to(device) gcn = GCN(num_features=1433, num_classes=7).to(device) gcn.conv2.register_forward_hook(get_activation("conv2")) optimizer = torch.optim.Adam(gcn.parameters(), lr=0.01, weight_decay=1e-4) criterion = nn.CrossEntropyLoss() gcn = train_classifier(gcn, graph, optimizer, criterion) # Compute test accuracy. gcn.eval() pred = gcn(graph).argmax(dim=1) correct = (pred[graph.test_mask] == graph.y[graph.test_mask]).sum() test_acc = correct / graph.test_mask.sum() print(f"\nTest accuracy: {test_acc:.3f}\n")
Epoch 0: train loss=1.932, val. acc=0.303. Epoch 10: train loss=0.377, val. acc=0.862. Epoch 20: train loss=0.181, val. acc=0.867. Epoch 30: train loss=0.119, val. acc=0.857. Epoch 40: train loss=0.088, val. acc=0.852. Epoch 50: train loss=0.069, val. acc=0.845. Epoch 60: train loss=0.058, val. acc=0.847. Epoch 70: train loss=0.050, val. acc=0.850. Epoch 80: train loss=0.044, val. acc=0.855. Epoch 90: train loss=0.040, val. acc=0.855. Test accuracy: 0.893
Want to share your content on python-bloggers? click here.