You're reading from Graph Data Science with Neo4j

Product typeBook

Published inJan 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804612743

Edition1st Edition

Languages

Python

Tools

Neo4j

Concepts

Mobile Application Development

Author (1)

Estelle Scifo

Predicting Future Edges

Link prediction (LP) is a key topic in Graph Data Science (GDS), since it is a problem very specific to graphs. While we can do classification for many kinds of datasets, not only graphs, LP can only be performed if we have links, meaning if our data is a graph. But the applications of these problems are quite wide: from understanding the dynamics of social network to product recommendations to criminal network analysis.

This chapter is going to give you a short introduction to the LP problem. We will define what observations are and how to build the initial dataset. We will also talk about the metrics that can be used to infer the presence of a hidden or future link and compute them using the GDS library. Finally, we will use a GDS pipeline to build a simple link prediction model, fit it on data stored in Neo4j, and make predictions.

In this chapter, we’re going to cover the following main topics:

Introducing the LP problem
LP features...

Technical requirements

In order to be able to reproduce the examples given in this chapter, you’ll need the following tools:

Neo4j 5.x installed on your computer (see the installation instructions in Chapter 1, Introducing and Installing Neo4j)
- The GDS plugin (version >= 2.2)
A Python environment with Jupyter to run notebooks
Any code listed in the book is available in the associated GitHub repository,https://github.com/PacktPublishing/Graph-Data-Science-with-Neo4j, in the corresponding chapter folder

Code samples

Unless otherwise indicated, all code snippets in this chapter and the following ones use the GDS Python client. Library import and client initialization are omitted in this chapter for brevity, but a detailed explanation can be found in Chapter 6, Building a Machine Learning Model with Graph Features, in the Introducing the GDS Python client section. Also note that the code in the code bundle provided with the book is fully runnable and...

Introducing the LP problem

Let’s pause for a minute and understand what exactly LP is and how we can formulate this kind of problem using machine learning (ML) vocabulary.

LP examples

In order to understand what LP is, let’s see some real-life scenarios where these problems can be and are used:

Social networks: In a social network containing people who have certain relationships with each other, we can try and predict who the next people to meet or collaborate on a project will be. We can think of the following types of relationships, but there are many more:
- Social media (know, follow)
- Communication network (phone call)
- Research paper authors: co-authorship of a research paper (research collaboration)
Criminal networks: A criminal network, by nature, is not fully known to the people analyzing it (police authorities). The LP technique helps in identifying unknown links between people and better predicting criminal behavior.
Entity resolution: Sometimes...

LP features

Here, we’ll describe the characteristics that can be attached to a pair of nodes and used as predictors for an LP model. We’ll start with topological features, which are built by analyzing both nodes’ neighborhoods. Then, we explore how to use each node’s features and combine them into a feature vector for the pair.

Topological features

Topological features rely on nodes’ neighborhoods and graph topology to infer new links. We can, for instance, use the following:

Common neighbors: Given two nodes, A and B, count the number of common neighbors between A and B. This metric assumes that the more common neighbors A and B have, the more likely they are to be connected.
Adamic-Adar: A variation of the common neighbors approach, the Adamic-Adar metric incorporates the fact that nodes with fewer connections give more information than nodes with many links. In a web page linking hundreds of other pages, the relevance of each...

Building an LP pipeline with the GDS

Our task will be to predict the future collaboration of actors and directors, using the homogeneous graph made of Person nodes and KNOWS relationships. We will only use the persons in the main component according to the connected component algorithm, identified by the MainComponent label.

Creating and configuring the pipeline

The process of creating, training, and making predictions with a GDS pipeline is very similar to the node classification case. We will detail the steps in the following subsections.

Building the projected graph

First, we are going to create a projected graph, as follows:

projected_graph_object = create_projected_graph(
    gds,
    graph_name="graph-lp-collab",
    node_spec={
        "Person": {
            "label": "...

Summary

In this chapter, you have learned about the LP problem, an ML technique that’s only possible with graph data. It can be used in many contexts to predict future or unknown links between any type of nodes, as long as we have some example or context data. You have learned how to build an LP pipeline with Neo4j’s GDS, which takes care of negative observation sampling, model training, and storage for us.

This chapter is the last one where we will talk about predictions and ML. Overall, we have studied several use cases for ML on graphs, including node classification and future/unknown LP. You have learned how to extract graph-based features or embeddings to feed an ML model in your preferred library (we’ve used scikit-learn). You have also learned that the whole ML pipeline can be managed within Neo4j and its GDS library thanks to built-in pipelines and models.

GDS contains many interesting tools, but it is generally still young compared to other ML tools...

An LP analysis with topological metrics and scikit-learn is presented in the book Graph Algorithms by M. Needham and A. Hodler, O’Reilly (Chapter 8, Building a GDS Pipeline for Node Classification Model Training).
This paper introducing LP problems: Link Prediction in Complex Networks: A Survey by L. Lu and T. Zhou: https://arxiv.org/abs/1010.0725.
Some more complex LP examples:
- LP on heterogeneous graphs:
  - Using PyG: Link Prediction on Heterogeneous Graphs with PyG by J. Eric Lenssen and M. Fey: https://medium.com/@pytorch_geometric/link-prediction-on-heterogeneous-graphs-with-pyg-6d5c29677c70
  - Using GraphSAGE for recommendations in heterogeneous graphs: Graph Neural Networks: Link Prediction (Part II) by L. Faik: https://medium.com/data-from-the-trenches/graphical-neural-networks-link-prediction-part-ii-c60f6d97fd97
- Multi-class link prediction...

The rest of the chapter is locked

You have been reading a chapter from

Graph Data Science with Neo4j

Published in: Jan 2023Publisher: PacktISBN-13: 9781804612743

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Graph Data Science with Neo4j

Predicting Future Edges

Technical requirements

Introducing the LP problem

LP examples

LP features

Topological features

Building an LP pipeline with the GDS

Creating and configuring the pipeline

Building the projected graph

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook