You're reading from Hands-On Graph Analytics with Neo4j

Product typeBook

Published inAug 2020

PublisherPackt

ISBN-139781839212611

Edition1st Edition

Tools

Neo4j

Concepts

Database Programming

Author (1)

Estelle Scifo

Using Graph-based Features in Machine Learning

In this chapter, we will take what you have learned about graphs, graph databases, and the different types of information that can be extracted from graph structures (node importance, communities, and node similarity) and learn how to integrate this knowledge into a machine learning pipeline to make predictions out of data. We will start by using a classical CSV file, containing information from a questionnaire, and recap the different steps of a data science project using this data as the central theme. We will then explore how to transform this data into a graph and how to characterize this graph using graph algorithms. Finally, we will learn how to automate graph processing using Python and the Neo4j Python driver.

The following topics will be covered in this chapter:

Building a data science pipeline
The steps toward graph machine...

Technical requirements

The following tools will be used throughout this chapter:

Neo4j with the Graph Data Science plugin
Python (recommended ≥ 3.6) with the following requirements:
neo4j, the official Neo4j Python driver (≥ 4.0.2)
networkx for graph management in Python (optional)
matplotlib and seaborn for data visualization
pandas
scikit-learn
Jupyter to run notebooks (optional)

If you are using Neo4j < 4.0, then the latest compatible version of the GDS plugin is 1.1, whereas if you are using Neo4j ≥ 4.0, then the first compatible version of the GDS plugin is 1.2.

Building a data science project

Machine learning can be defined as the process from which an algorithm learns from data in order to be able to extract information that is useful for some business or research interest.

Even though all data science projects are different, a certain number of common steps can still be identified:

Problem definition
Data collection and cleaning
Feature engineering
Model building and evaluation
Deployment

Even if these steps follow a logical order, the process is never linear and consists of back and forth operations between these different steps. It can be useful to go back to the problem definition after the data collection phase, for example, as well as returning to the feature engineering and model evaluation phases as many times as required to reach the desired outcomes. The following diagram illustrates this idea of moving back and forth between the different steps of a project:

This project structure also applies when analyzing graph data, which...

The steps toward graph machine learning

Neo4j is primarily a database and can be used as such to fetch data. However, a change of perspective is needed to express the data as a graph, as well as to exploit this graph structure by using graph algorithms and formulating the problem as a graph problem.

Building a (knowledge) graph

When beginning to build a graph out of a dataset, the main question to ask is what are the relationships that exist in this data? If we consider the CSV file we studied in the previous section alone, it does not contain a lot of information about relationships since it only has aggregated data, such as the number of followers per user.

To learn more about relationships in the data, we will have to enrich this dataset. This can be done in two ways. Either we can use an external data source as we did in Chapter 3, Empowering Your Business with Pure Cypher, or we can transform the way we see our relational data.

Creating relationships from existing data

Data can come...

Using graph-based features with pandas and scikit-learn

In the previous section, we created a graph model connecting our users. We have also run some graph algorithms to understand the graph structure. We are now going to take full advantage of the GDS to extract graph-based features.

Extracting graph-based features from Neo4j Browser

In a prototyping phase, it is always good to be able to run single queries manually and extract the data from there. In the following subsections, we are going to review how to run graph algorithms from the GDS in Neo4j Browser and how to extract the data into a format usable by our data science tools – namely, CSV.

Creating the projected graph

We could create a named projected graph using the same parameters as in the previous section:

nodeProjection: "User",
relationshipProjection: {
FOLLOWS: {
type: "FOLLOWS",
orientation: "UNDIRECTED",
aggregation: "SINGLE"
}
}

However, we know that our graph contains several disconnected...

Automating graph-based feature creation with the Neo4j Python driver

Using Cypher to create our features is good for testing, but once we are in the production phase, it is not manageable to manually perform such operations. Fortunately, Neo4j officially provides drivers for several languages, including Java, .NET, and Go. In this book, we use Python, so we will learn about the Python driver in the following section.

Discovering the Neo4j Python driver

Python is officially supported by Neo4j, who provides a driver to connect to a Neo4j graph from Python at https://github.com/neo4j/neo4j-python-driver.

It can be installed through the pip Python package manager:

pip install neo4j
 # or
 conda install -c conda-forge neo4j

The code for this section is available in a Jupyter notebook: Neo4j_Python_Driver.ipynb.

In order to use this database, the first step is the connection definition, which requires the active graph URI and the authentication parameters. bolt is a client-server communication...

Summary

This chapter gave an overview of classical data science pipelines and how to integrate graph data into them. Thanks to the Neo4j Python driver, you are now able to import Neo4j data into a pandas DataFrame, which can then be used as usual in any other applications, such as model training with scikit-learn. You have also learned how to programmatically run a graph algorithm from the GDS and use the result as a new type of feature for your model.

In the following chapters, we will continue our journey through graph analytics. In this chapter, we stuck to classical machine learning methods such as decision trees. We will now go on to learn how the graph structure can be used to answer different kinds of questions, starting with the link prediction problem, which we are going to tackle in the next chapter.

Questions

Here are a couple of exercises that you can try on your own to get more confident with the concepts covered in this chapter:

Projected graph creation with Python: Modify the code studied in this chapter to create a Cypher projected graph.
PageRank score distribution: Can you explain the shape of the PageRank score distribution for users not contributing to Neo4j (label = False)?

You are also encouraged to try and create a graph out of your data and try to include graph-based features in your own pipeline.

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Other recommended products

Related to this chapter

Graph Machine Learning

Data scientists working with network data will be able to put their knowledge to work with this practical guide to building machine learning algorithms using graph data. The book provides a hands-on approach to implementation and associated methodologies that will have you up and running and productive in no time.

BookJun 2021338 pages

Network Science with Python and NetworkX Quick Start Guide

The emerging field of Network Science is about understanding different kind of relationships. This book covers the latest version 2.x of NetworkX for performing Network Science with Python.You will also learn the fundamentals of network theory and see practical examples of how they are applied to real-world problems using Python and NetworkX.

BookApr 2019190 pages

Learning Neo4j 3.x

With increase in complexity of data relationships, graph databases are quickly becoming the de-facto standard for organizations who manage large volumes of connected data. This book aims at getting you started with the popular graph database Neo4j along with covering key concepts like modelling transitions, searches, traversals, relationships and protocols to navigate through complex networks of information. Also take a trip down the new and improved feature additions to version 3.x such as the APOC library, security, various plugins and extensions for spatial operations on data.

BookOct 2017316 pages

Geospatial Data Science Quick Start Guide

This book will help you leverage the power of data analysis and apply it to location and geospatial data to gain interesting insights. It presents useful tools and location intelligence techniques in Python to implement geospatial analytics from scratch.

BookMay 2019170 pages

Practical Discrete Mathematics

Discrete math deals with studying finite and distinct elements. With this book, you’ll learn the discrete math language and methods crucial to studying and describing objects and functions in computer science. You'll also focus on the mathematics of machine learning and computer science and prepare to understand real-world algorithm development.

BookFeb 2021330 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Hands-On Graph Analytics with Neo4j

Technical requirements

Building a data science project

The steps toward graph machine learning

Building a (knowledge) graph

Creating relationships from existing data

Using graph-based features with pandas and scikit-learn

Extracting graph-based features from Neo4j Browser

Creating the projected graph

Automating graph-based feature creation with the Neo4j Python driver

Discovering the Neo4j Python driver

Summary

Questions

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Graph Machine Learning

Network Science with Python and NetworkX Quick Start Guide

Learning Neo4j 3.x

Geospatial Data Science Quick Start Guide

This book will help you leverage the power of data analysis and apply it to location and geospatial data to gain interesting insights. It presents useful tools and location intelligence techniques in Python to implement geospatial analytics from scratch.

Practical Discrete Mathematics

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook