Reader small image

You're reading from  Hands-On Graph Analytics with Neo4j

Product typeBook
Published inAug 2020
PublisherPackt
ISBN-139781839212611
Edition1st Edition
Tools
Right arrow
Author (1)
Estelle Scifo
Estelle Scifo
author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Right arrow
Graph Embedding - from Graphs to Matrices

In this chapter, we will continue to explore the topic of graph analytics and address the last piece of the puzzle: feature learning through graphs via embedding. Embedding became popular thanks to the word embedding used in Natural Language Processing (NLP). In this chapter, we will first address why embedding is important and learn about the different types of analyses covered by the term graph embedding. Following that, we will start learning about embedding algorithms from a number of algorithms based on the graph adjacency matrix trying to reduce its size.

Later on, we will continue our journey by discovering how neural networks can help with embedding. Starting with the example of word embedding, we will learn about the skip-gram model and draw parallels with graphs with the DeepWalk algorithm. Finally, in the last section, we will...

Technical requirements

Why do we need embedding?

Machine learning models are based on matrix calculations: our observations are organised into rows in a table, while the features are columns or vectors. Representing complex objects such as text or graphs as matrices of a reasonable size can be a challenge. This is the issue that embedding techniques are designed to address.

Why is embedding needed?

In Chapter 8, Using Graph-Based Features in Machine Learning, we drew the following schema:

The Feature engineering step involves extracting features from our dataset. When this dataset consists of observations that already have numerical or categorical characteristics, it is easy to imagine how to build features from these characteristics.

However, some datasets do not have that tabular structure. In such cases, we need to create that structure before feeding the dataset into a machine learning model.

Take a text, such as a book, for example, that contains thousands of words. Now imagine that your task is to predict...

Adjacency-based embedding

Graphs can be represented as large matrices pretty easily. The first technique we are going to study that can reduce the size of this matrix is called matrix factorization.

The adjacency matrix and graph Laplacian

Similar to text analysis, graphs can be represented by a very large matrix encoding the relationships between nodes. We have already used such a matrix in the preceding chapters – the adjacency matrix, named M in the following diagram:

Other algorithms rely on the graph Laplacian matrix L = D - M where D is the diagonal matrix containing the degree of each node. But the principles remain unchanged.

Eigenvectors embedding

One simple way of reducing the size of the matrix is to decompose it into eigenvectors, and use only a reduced number of these vectors as embedding.

An example of such graph representation can be seen when using graph positioning. Indeed, drawing a graph on a two-dimensional plane is a type of embedding. One of the positioning...

Extracting embeddings from artificial neural networks

Neural networks are the new gold standard of models for machine learning. Thanks to this structure, impressive progress has been made, from image analysis to speech recognition, and computers are now able to perform increasingly complex tasks. One surprising application of neural networks is their ability to model complex objects, such as images, text, or audio records, with fewer dimensions, while still preserving some aspects of the original dataset (shapes in the image, frequencies in the audio, and so on). In this section, following a quick general review of neural networks, we will focus on one architecture called skip-gram, which was first used in the context of word embedding but can be extended to graphs as well.

Artificial neural networks in a nutshell

Artificial neural networks were inspired by the human brain, where millions of neurons are connected to each other through synapses. The human brain is clearly adept at learning...

Graph neural networks

GNNs were introduced in 2005 and have received a lot of attention during the last 5 years or so. The key concept behind them is to try to generalize the ideas behind CNNs and RNNs to apply them to any type of dataset, including graphs. This section is only a short introduction to GNNs, since we would require an entire book to fully explore the topic. As usual, more references are given in the Further reading section if you would like to gain a deeper understanding of this topic.

Extending the principles of CNNs and RNNs to build GNNs

CNNs and RNNs both involve aggregating information from a neighborhood in a special context. For RNNs, the context is a sequence of inputs (words, for instance) and a sequence is nothing more than a special type of graph. The same applies to CNNs, which are used to analyze images, or pixel grids, which are also a special type of graph where each pixel is connected to its adjacent pixels. It is logical therefore to try and use neural...

Going further with graph algorithms

Graphs and graph algorithms are hot research topics, and new papers are published every week proposing new approaches to community detection, dynamic graph evolution, anomaly detection in networks, and so on. In this section, we will detail several ways to keep learning about graph algorithms and learn about the latest progress that makes them even more powerful.

State-of-the-art graph algorithms

Published papers about graphs can be found in dedicated journals such as the Journal of Graph Algorithms and Applications (http://jgaa.info). Papers with code also do amazing work collecting papers where the code is publicly available. The graph section (https://paperswithcode.com/area/graphs) provides a nice overview of the top current research topics regarding graphs.

However, if you can't afford to read multiple papers a week, you can still extend your knowledge about graphs and stay up to date with the latest advances by regularly checking packages...

Summary

This chapter provided an overview of the graph embedding algorithms. Starting with adjacency-based methods using similarity metrics, we moved to a neural network-based approach. After gaining an understanding of the skip-graph model using word embedding as an example, we drew a parallel with graphs using DeepWalk to generate sentences. We also studied a variant of DeepWalk called node2vec, where the traversal is configured by two parameters to enhance local or global graph structures. The following table provides a short summary of the assumption about the graph structure made in each of the algorithms studied:

Algorithm Hypothesis
Adjacency matrix The higher the weight of the edge between nodes i and j, the more similar nodes i and j are.
LLE Node embedding is a linear combination of its neighbors' embeddings.
HOPE Similarity between nodes in the graph can be measured by a metric such as the Adamic-Adar score.
DeepWalk The similarity between two nodes is given...

Questions

  • Run a clustering algorithm such as K-means on the karateclub graph embedding. What do you think of the results?
  • Use the karateclub package to generate a node embedding with the DeepWalk algorithm.

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Graph Analytics with Neo4j
Published in: Aug 2020Publisher: PacktISBN-13: 9781839212611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo