You're reading from Hands-On Graph Analytics with Neo4j

Product typeBook

Published inAug 2020

PublisherPackt

ISBN-139781839212611

Edition1st Edition

Tools

Neo4j

Concepts

Database Programming

Author (1)

Estelle Scifo

Graph Embedding - from Graphs to Matrices

In this chapter, we will continue to explore the topic of graph analytics and address the last piece of the puzzle: feature learning through graphs via embedding. Embedding became popular thanks to the word embedding used in Natural Language Processing (NLP). In this chapter, we will first address why embedding is important and learn about the different types of analyses covered by the term graph embedding. Following that, we will start learning about embedding algorithms from a number of algorithms based on the graph adjacency matrix trying to reduce its size.

Later on, we will continue our journey by discovering how neural networks can help with embedding. Starting with the example of word embedding, we will learn about the skip-gram model and draw parallels with graphs with the DeepWalk algorithm. Finally, in the last section, we will...

Technical requirements

In this chapter, we will use the following technologies and libraries:

Neo4j and the GDS
Python (>= 3.6) with the following packages (all pip-installable):
Jupyter Notebooks (or Jupyter Lab)
Pandas, NumPy, Matplotlib
Networkx
Karateclub: https://github.com/benedekrozemberczki/karateclub

The code used in this chapter is available on GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Analytics-with-Neo4j/ch10/.

Why do we need embedding?

Machine learning models are based on matrix calculations: our observations are organised into rows in a table, while the features are columns or vectors. Representing complex objects such as text or graphs as matrices of a reasonable size can be a challenge. This is the issue that embedding techniques are designed to address.

Why is embedding needed?

In Chapter 8, Using Graph-Based Features in Machine Learning, we drew the following schema:

The Feature engineering step involves extracting features from our dataset. When this dataset consists of observations that already have numerical or categorical characteristics, it is easy to imagine how to build features from these characteristics.

However, some datasets do not have that tabular structure. In such cases, we need to create that structure before feeding the dataset into a machine learning model.

Take a text, such as a book, for example, that contains thousands of words. Now imagine that your task is to predict...

Adjacency-based embedding

Graphs can be represented as large matrices pretty easily. The first technique we are going to study that can reduce the size of this matrix is called matrix factorization.

The adjacency matrix and graph Laplacian

Similar to text analysis, graphs can be represented by a very large matrix encoding the relationships between nodes. We have already used such a matrix in the preceding chapters – the adjacency matrix, named M in the following diagram:

Other algorithms rely on the graph Laplacian matrix L = D - M where D is the diagonal matrix containing the degree of each node. But the principles remain unchanged.

Eigenvectors embedding

One simple way of reducing the size of the matrix is to decompose it into eigenvectors, and use only a reduced number of these vectors as embedding.

An example of such graph representation can be seen when using graph positioning. Indeed, drawing a graph on a two-dimensional plane is a type of embedding. One of the positioning...

Extracting embeddings from artificial neural networks

Neural networks are the new gold standard of models for machine learning. Thanks to this structure, impressive progress has been made, from image analysis to speech recognition, and computers are now able to perform increasingly complex tasks. One surprising application of neural networks is their ability to model complex objects, such as images, text, or audio records, with fewer dimensions, while still preserving some aspects of the original dataset (shapes in the image, frequencies in the audio, and so on). In this section, following a quick general review of neural networks, we will focus on one architecture called skip-gram, which was first used in the context of word embedding but can be extended to graphs as well.

Artificial neural networks in a nutshell

Artificial neural networks were inspired by the human brain, where millions of neurons are connected to each other through synapses. The human brain is clearly adept at learning...

Graph neural networks

GNNs were introduced in 2005 and have received a lot of attention during the last 5 years or so. The key concept behind them is to try to generalize the ideas behind CNNs and RNNs to apply them to any type of dataset, including graphs. This section is only a short introduction to GNNs, since we would require an entire book to fully explore the topic. As usual, more references are given in the Further reading section if you would like to gain a deeper understanding of this topic.

Extending the principles of CNNs and RNNs to build GNNs

CNNs and RNNs both involve aggregating information from a neighborhood in a special context. For RNNs, the context is a sequence of inputs (words, for instance) and a sequence is nothing more than a special type of graph. The same applies to CNNs, which are used to analyze images, or pixel grids, which are also a special type of graph where each pixel is connected to its adjacent pixels. It is logical therefore to try and use neural...

Going further with graph algorithms

Graphs and graph algorithms are hot research topics, and new papers are published every week proposing new approaches to community detection, dynamic graph evolution, anomaly detection in networks, and so on. In this section, we will detail several ways to keep learning about graph algorithms and learn about the latest progress that makes them even more powerful.

State-of-the-art graph algorithms

Published papers about graphs can be found in dedicated journals such as the Journal of Graph Algorithms and Applications (http://jgaa.info). Papers with code also do amazing work collecting papers where the code is publicly available. The graph section (https://paperswithcode.com/area/graphs) provides a nice overview of the top current research topics regarding graphs.

However, if you can't afford to read multiple papers a week, you can still extend your knowledge about graphs and stay up to date with the latest advances by regularly checking packages...

Summary

This chapter provided an overview of the graph embedding algorithms. Starting with adjacency-based methods using similarity metrics, we moved to a neural network-based approach. After gaining an understanding of the skip-graph model using word embedding as an example, we drew a parallel with graphs using DeepWalk to generate sentences. We also studied a variant of DeepWalk called node2vec, where the traversal is configured by two parameters to enhance local or global graph structures. The following table provides a short summary of the assumption about the graph structure made in each of the algorithms studied:

Algorithm	Hypothesis
Adjacency matrix	The higher the weight of the edge between nodes i and j, the more similar nodes i and j are.
LLE	Node embedding is a linear combination of its neighbors' embeddings.
HOPE	Similarity between nodes in the graph can be measured by a metric such as the Adamic-Adar score.
DeepWalk	The similarity between two nodes is given...

Questions

Run a clustering algorithm such as K-means on the karateclub graph embedding. What do you think of the results?
Use the karateclub package to generate a node embedding with the DeepWalk algorithm.

A Tutorial on Network Embeddings, H. Chen et al.: https://arxiv.org/abs/1808.02590
Asymmetric Transitivity Preserving Graph Embedding, M. Ou et al.: https://www.kdd.org/kdd2016/papers/files/rfp0184-ouA.pdf
The paper behind karateclub:
An API Oriented Open Source Python Framework for Unsupervised Learning on Graphs, B. Rozemberczki et al.: https://arxiv.org/abs/2003.04819
Paper introducing DeepWalk: Online Learning of Social Representations, B. Perozzi et al.: https://arxiv.org/abs/1403.6652
node2vec: Scalable Feature Learning for Networks, A. Grover et al., ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 201: https://arxiv.org/abs/1607.00653
You will find a deeper introduction to GNNs in:
Chapter 13 of Advanced Deep Learning with Python, I. Vasilev, Packt Publishing.
Graph Neural Networks: A Review of Methods and Applications, J. Zhou et al.: https://arxiv.org/abs/1812.08434
A Comprehensive Survey on Graph Neural Networks, Z. Wu et al...

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Graph Analytics with Neo4j

Published in: Aug 2020Publisher: PacktISBN-13: 9781839212611

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Other recommended products

Related to this chapter

Graph Machine Learning

Data scientists working with network data will be able to put their knowledge to work with this practical guide to building machine learning algorithms using graph data. The book provides a hands-on approach to implementation and associated methodologies that will have you up and running and productive in no time.

BookJun 2021338 pages

Network Science with Python and NetworkX Quick Start Guide

The emerging field of Network Science is about understanding different kind of relationships. This book covers the latest version 2.x of NetworkX for performing Network Science with Python.You will also learn the fundamentals of network theory and see practical examples of how they are applied to real-world problems using Python and NetworkX.

BookApr 2019190 pages

Learning Neo4j 3.x

With increase in complexity of data relationships, graph databases are quickly becoming the de-facto standard for organizations who manage large volumes of connected data. This book aims at getting you started with the popular graph database Neo4j along with covering key concepts like modelling transitions, searches, traversals, relationships and protocols to navigate through complex networks of information. Also take a trip down the new and improved feature additions to version 3.x such as the APOC library, security, various plugins and extensions for spatial operations on data.

BookOct 2017316 pages

Geospatial Data Science Quick Start Guide

This book will help you leverage the power of data analysis and apply it to location and geospatial data to gain interesting insights. It presents useful tools and location intelligence techniques in Python to implement geospatial analytics from scratch.

BookMay 2019170 pages

Practical Discrete Mathematics

Discrete math deals with studying finite and distinct elements. With this book, you’ll learn the discrete math language and methods crucial to studying and describing objects and functions in computer science. You'll also focus on the mathematics of machine learning and computer science and prepare to understand real-world algorithm development.

BookFeb 2021330 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages