You're reading from Graph Data Science with Neo4j

Product typeBook

Published inJan 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804612743

Edition1st Edition

Languages

Python

Tools

Neo4j

Concepts

Mobile Application Development

Author (1)

Estelle Scifo

Building a GDS Pipeline for Node Classification Model Training

Classifying observations within categories is a classical machine learning (ML) task. As we learned in the preceding chapters, we can use existing ML models such as decision trees to classify a graph’s nodes. The graph structure is used to find extra features, bringing more knowledge into the model. In this chapter, we will discover another key feature of the Neo4j GDS library: pipelines. They let you configure and train an ML model, before using it to make predictions on unseen nodes. You can do all of this from Neo4j, without having to add another library such as scikit-learn to the tech stack.

Also, we are going to work on the Netflix dataset we created earlier in this book (the code is available on GitHub if you don’t have it yet). We will try and make predictions by building a node classification pipeline, focusing on the how rather than the why.

In this chapter, we’re going to cover the...

Technical requirements

In order to be able to reproduce the examples given in this chapter, you’ll need the following tools:

Neo4j 5.x installed on your computer (see the installation instructions from Chapter 1, Introducing and Installing Neo4j):
- The Graph Data Science plugin (version >= 2.2)
A Python environment with the following:
- Jupyter to run the notebooks
- scikit-learn
Any code listed in the book will be available in the associated GitHub repository (https://github.com/PacktPublishing/Graph-Data-Science-with-Neo4j) in the corresponding chapter folder

Code samples

Unless otherwise indicated, all code snippets in this chapter and the following ones use the GDS Python client. Library import and client initialization are omitted in this chapter for brevity, but a detailed explanation can be found in the Introducing the GDS Python client section of Chapter 6, Building a Machine Learning Model with Graph Features. Also, note that the code in the code...

The GDS pipelines

This section introduces GDS pipelines, where we explain what the purpose of this feature is, illustrate its intended usage, and show the basic usage of the pipeline catalog.

What is a pipeline?

As data scientists, we run data pipelines every day. Any logical flow of action is somehow a pipeline, and when you run your Jupyter notebook, you already have a pipeline. However, here, we refer to explicitly defined workflows, with sequential tasks such as the one we can build with scikit-learn. Let’s take a look at the Pipeline object in this library before focusing on GDS pipelines to understand their similarities and differences.

scikit-learn pipeline

Often, we think about ML as finding the best model for a given problem, but as data professionals, we know that finding the right model is only a small part of the problem. Before we can even think about fitting a model, many preliminary steps are required: from data gathering to feature extraction. Some...

Building and training a pipeline

Similarly to models, in order to add a pipeline to the catalog, we’ll have to train it. Pipeline training requires several steps:

Create and name the pipeline object.
Optionally, compute features from other GDS algorithms (such as graph algorithms, embeddings, or pre-processing).
Define the feature set from the features added in the previous step, and/or any node property included in the projected graph.
Select the ML models to be tested with their hyperparameters: The pipeline training will run all algorithms and select the best one.
Finally, train the model.

The following sub-sections detail each of these steps. The supporting notebook is Pipeline_Train_Predict. This can be found in the Chapter08 folder of the code bundle that comes with this book.

Creating the pipeline and choosing the features

In GDS, we can create three kinds of pipelines:

Node classification: Each node gets assigned to one target...

Making predictions

In order to make predictions, we are going to use the same projected graph that already contains the test nodes.

With this projected graph, and the model object returned by the pipeline training, we can now predict the class of new nodes:

predictions = model.predict_stream(
     projected_graph_object,
     targetNodeLabels=["Test", "Train"],
)

Note that the model object also exposes a predict_mutate function to store the results in the projected graph. This will be useful to us when dealing with embedding features in the last section of this chapter.

In the preceding code block, we include both the Test and Train nodes in order for the Louvain results to be computed properly, using the whole graph. We will filter out the predictions for the train nodes as we evaluate the model performances.

For instance, in order to evaluate our model, we can compute the confusion matrix using our...

Using embedding features

The performed analysis is equivalent to the analysis performed in Chapter 6, Building a Machine Learning Model with Graph Features, with scikit-learn, except that here, there is no need to add another package for model training, as everything is taken care of in GDS.

However, in the preceding chapter, we learned about another way to find node features, by learning them from the graph structure itself: node embeddings. In this section, we will use node embeddings as features for our classification task.

Choosing the graph embedding algorithm to use

In Chapter 7, Automatically Extracting Features with Graph Embeddings for Machine Learning, we talked about two graph embedding algorithms included in GDS: Node2Vec and GraphSAGE. They have some differences, and one of them is the kind of information they tend to encode. While Node2Vec tends to model the node positions in the graph (nodes close to each other in the graph will have close embeddings), GraphSAGE...

Summary

In this chapter, you learned about how to use GDS pipelines to simplify the processes of training an ML model involving graph-based features. GDS pipelines can be configured to run graph algorithms such as the Louvain algorithm and use the result as a feature in a classification or regression model. These models are part of the GDS, so we do not have to explicitly extract data from Neo4j and use another ML library. Everything can be run using the projected graph, which is stored in the model and pipeline catalogs, and used to make predictions on unseen nodes. This lets us use a single tool to compute graph features and perform ML tasks, including the training and prediction of different models, without explicit data exchange from and to the database.

Additionally, we played with the embedding algorithms included in the GDS, starting to surface their advantages and disadvantages.

In the next chapter, we will use another type of pipeline from the GDS to solve another kind...

Exercise

Use a Cypher projection to build the projected graph we used in the first section. It must include nodes with the MainTrain label and the nbMovies and isUSCitizen properties, along with relationships of the KNOWS type.
Create the graph represented in the following figure (same as Figure 7.4) in Neo4j. Then, run the Node2Vec algorithm by changing the p and q parameters and try and understand their behavior:

Figure 8.4 – An example graph

The rest of the chapter is locked

You have been reading a chapter from

Graph Data Science with Neo4j

Published in: Jan 2023Publisher: PacktISBN-13: 9781804612743

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Graph Data Science with Neo4j

Building a GDS Pipeline for Node Classification Model Training

Technical requirements

The GDS pipelines

What is a pipeline?

scikit-learn pipeline

Building and training a pipeline

Creating the pipeline and choosing the features

Making predictions

Using embedding features

Choosing the graph embedding algorithm to use

Summary

Further reading

Exercise

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook