Reader small image

You're reading from  Hands-On Graph Neural Networks Using Python

Product typeBook
Published inApr 2023
PublisherPackt
ISBN-139781804617526
Edition1st Edition
Right arrow
Author (1)
Maxime Labonne
Maxime Labonne
author image
Maxime Labonne

Maxime Labonne is currently a senior applied researcher at Airbus. He received a M.Sc. degree in computer science from INSA CVL, and a Ph.D. in machine learning and cyber security from the Polytechnic Institute of Paris. During his career, he worked on computer networks and the problem of representation learning, which led him to explore graph neural networks. He applied this knowledge to various industrial projects, including intrusion detection, satellite communications, quantum networks, and AI-powered aircrafts. He is now an active graph neural network evangelist through Twitter and his personal blog.
Read more about Maxime Labonne

Right arrow

Creating Node Representations with DeepWalk

DeepWalk is one of the first major successful applications of machine learning (ML) techniques to graph data. It introduces important concepts such as embeddings that are at the core of GNNs. Unlike traditional neural networks, the goal of this architecture is to produce representations that are then fed to other models, which perform downstream tasks (for example, node classification).

In this chapter, we will learn about the DeepWalk architecture and its two major components: Word2Vec and random walks. We’ll explain how the Word2Vec architecture works, with a particular focus on the skip-gram model. We will implement this model with the popular gensim library on a natural language processing (NLP) example to understand how it is supposed to be used.

Then, we will focus on the DeepWalk algorithm and see how performance can be improved using hierarchical softmax (H-Softmax). This powerful optimization of the softmax function...

Technical requirements

All the code examples from this chapter can be found on GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python/tree/main/Chapter03. Installation steps required to run the code on your local machine can be found in the Preface section of this book.

Introducing Word2Vec

The first step to comprehending the DeepWalk algorithm is to understand its major component: Word2Vec.

Word2Vec has been one of the most influential deep-learning techniques in NLP. Published in 2013 by Tomas Mikolov et al. (Google) in two different papers, it proposed a new technique to translate words into vectors (also known as embeddings) using large datasets of text. These representations can then be used in downstream tasks, such as sentiment classification. It is also one of the rare examples of patented and popular ML architecture.

Here are a few examples of how Word2Vec can transform words into vectors:

We can see in this example that, in terms of the Euclidian distance, the word vectors for king and queen are closer than the ones for king and woman (4.37 versus 8.47). In general, other metrics, such as the popular cosine similarity, are used to measure...

DeepWalk and random walks

Proposed in 2014 by Perozzi et al., DeepWalk quickly became extremely popular among graph researchers. Inspired by recent advances in NLP, it consistently outperformed other methods on several datasets. While more performant architectures have been proposed since then, DeepWalk is a simple and reliable baseline that can be quickly implemented to solve a lot of problems.

The goal of DeepWalk is to produce high-quality feature representations of nodes in an unsupervised way. This architecture is heavily inspired by Word2Vec in NLP. However, instead of words, our dataset is composed of nodes. This is why we use random walks to generate meaningful sequences of nodes that act like sentences. The following diagram illustrates the connection between sentences and graphs:

Figure 3.4 – Sentences can be represented as graphs

Figure 3.4 – Sentences can be represented as graphs

Random walks are sequences of nodes produced by randomly choosing a neighboring node at every step. Thus...

Implementing DeepWalk

Now that we have a good understanding of every component in this architecture, let’s use it to solve an ML problem.

The dataset we will use is Zachary’s Karate Club. It simply represents the relationships within a karate club studied by Wayne W. Zachary in the 1970s. It is a kind of social network where every node is a member, and members who interact outside the club are connected.

In this example, the club is divided into two groups: we would like to assign the right group to every member (node classification) just by looking at their connections:

  1. Let’s import the dataset using nx.karate_club_graph():
    G = nx.karate_club_graph()
  2. Next, we need to convert string class labels into numerical values (Mr. Hi = 0, Officer = 1):
    labels = []
    for node in G.nodes:
        label = G.nodes[node]['club']
        labels.append(1 if label == 'Officer' else 0)
  3. Let’s plot this graph...

Summary

In this chapter, we learned about DeepWalk architecture and its major components. Then, we transformed graph data into sequences using random walks to apply the powerful Word2Vec algorithm. The resulting embeddings can be used to find similarities between nodes or as input to other algorithms. In particular, we solved a node classification problem using a supervised approach.

In Chapter 4, Improving Embeddings with Biased Random Walks in Node2Vec, we will introduce a second algorithm based on Word2Vec. The difference with DeepWalk is that the random walks can be biased towards more or less exploration, which directly impacts the embeddings that are produced. We will implement this algorithm on a new example and compare its representations with those obtained using DeepWalk.

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Graph Neural Networks Using Python
Published in: Apr 2023Publisher: PacktISBN-13: 9781804617526
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Maxime Labonne

Maxime Labonne is currently a senior applied researcher at Airbus. He received a M.Sc. degree in computer science from INSA CVL, and a Ph.D. in machine learning and cyber security from the Polytechnic Institute of Paris. During his career, he worked on computer networks and the problem of representation learning, which led him to explore graph neural networks. He applied this knowledge to various industrial projects, including intrusion detection, satellite communications, quantum networks, and AI-powered aircrafts. He is now an active graph neural network evangelist through Twitter and his personal blog.
Read more about Maxime Labonne