You're reading from Graph Data Science with Neo4j

Product typeBook

Published inJan 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781804612743

Edition1st Edition

Languages

Python

Tools

Neo4j

Concepts

Mobile Application Development

Author (1)

Estelle Scifo

Characterizing a Graph Dataset

Two graphs can differ in many ways, depending on their number of nodes or types of edges, for instance. But many more metrics exist to characterize them so that we can get an idea of the graph based on some numbers. Just as the mean value and standard deviation help in comprehending a numeric variable distribution, graph metrics help in understanding the graph topology: is it a highly connected graph? Are there isolated nodes?

In this chapter, we are going to learn about a few metrics for characterizing a graph. Focusing on the degree and degree distribution, this will be an opportunity for us to draw our first plot using the NeoDash graph application. We will also use the Neo4j Python driver to extract data from Neo4j into a DataFrame and perform some basic analysis of this data.

In this chapter, we’re going to cover the following main topics:

Characterizing a graph from its node and edge properties
Computing the graph degree...

Technical requirements

To be able to reproduce the examples provided in this chapter, you’ll need the following tools:

Neo4j installed on your computer (see the installation instructions in Chapter 1, Introducing and Installing Neo4j).
The necessary Python and Jupyter notebooks installed. We are not going to cover the installation instructions in this book.
You’ll also need the following Python packages:
- matplotlib
- pandas
- neo4j
An internet connection to download the plugins and the dataset and to use the public API in the last section of this chapter.
Any code listed in the book will be available in the associated GitHub repository, https://github.com/PacktPublishing/Graph-Data-Science-with-Neo4j, in the corresponding chapter folder.

Characterizing a graph from its node and edge properties

There is not a single type of graph. Each of them has specific characteristics, depending on the modeled process. This section describes some of the characteristics of a graph you should question when starting your journey with a new dataset.

Link direction

Links between nodes can be directed (and are then called arcs in graph theory) or undirected (and are called edges).

While graph theory makes the distinction between directed and undirected links in their naming, the graph database vocabulary usually doesn’t, and all links are called edges or relationships, regardless of whether they’re considered directed or not. In a more general way, I’ll stick to the wording used within the Neo4j Graph Data Science Library, which may sound inaccurate to graph theorists.

Undirected graphs include the following:

Facebook social network: If you are connected to X, X is also connected to you.
Co...

Computing the graph degree distribution

After the number of nodes and edges, the node’s degree is one of the first metrics to compute when studying a new graph. It tells us whether the edges are equally split across nodes or if some nodes monopolize almost all connections, leaving the others disconnected. Now that we’ve defined the node’s degree, we will learn how to compute it with Cypher and draw the distribution using the NeoDash graph application.

Definition of a node’s degree

The degree of a node is the number of links connected to this node. For undirected graphs, there is only one degree, since we just count all the edges connected to a given node. For directed graphs, we can compute the node’s degree in three different ways:

Incoming degree: We count only the edges pointing toward the node
Outgoing degree: We count only the edges pointing outward of the node
Total degree: We count all edges attached to a node, regardless...

Installing and using the Neo4j Python driver

We can use the Neo4j Python driver to fetch data from Neo4j and analyze it from Python. In this section, we are going to plot the degree distribution using Python visualization packages.

Counting node labels and relationship types in Python

Let’s open the Neo4j_Driver notebook (https://github.com/PacktPublishing/Graph-Data-Science-with-Neo4J/blob/main/Chapter03/notebooks/Neo4j_Driver.ipynb). To install the Neo4j driver, run the following code in the first cell:

!pip install neo4j

Let’s instantiate the driver and fetch our first bit of data from Neo4j:

First, import the required objects:

from collections import defaultdict

from neo4j import GraphDatabase

Then, instantiate a driver object, providing it with the connection parameters:

driver = GraphDatabase.driver(

    "bolt://localhost:7687",

    auth=("neo4j", "<PASSWORD>")

With...

Learning about other characterizing metrics

The degree is not the only metric that can be computed to characterize a graph. Let’s look at a graph detail page on the Network Repository Project (for instance, https://networkrepository.com/socfb-UVA16.php). It contains data about the number of nodes, edges, degrees, and other metrics, such as the number of triangles and clustering coefficient.

In the rest of this section, we will provide definitions for some of the metrics listed in the preceding Figure 3.11. We will refer to this section in the next few chapters when we use graph-based metrics to build a machine learning model.

Triangle count

The name is self-explanatory, but a triangle is defined by three connected nodes. In a directed graph, edge orientation needs to be taken into account.

For a given node, n, its triangle count is found by checking whether its neighbors are also connected to another neighbor of n. Look at the following undirected graph:

...

Summary

This chapter taught you some aspects of graph statistics. You now know a few metrics you need to compute when you first start analyzing a new graph, from the number of nodes/edges and the node and edge types to degree-related metrics and distribution.

You also installed the Neo4j Python driver and learned how to extract data from Neo4j to Python and create a DataFrame from data exported from Neo4j.

In the next chapter, we will dig deeper into graph analytics by using unsupervised graph algorithms to learn even more about graph topology. We will learn how to find clusters or communities of nodes in the graph. On the way, we will install and learn about the basic principles of the Neo4j Graph Data Science Library, the plugin we will use intensively in the rest of this book.

Exercises

Challenge yourself with the following exercises related to the content covered in this chapter:

Can you imagine an example of a tri-partite graph?
Create the RELATED_TO relationship between movies that share at least one person (as actor or director).

Update the Cypher query we used to compute the degree distribution to obtain the normalized degree (divide by the total number of nodes in the graph).

Can you draw the weighted degree distribution (total)?

Hint: The weighted total degree is the sum of all weights of relationships attached to a given node.

Advanced: Can you write a Cypher query to compute the triangle count for each node?

Here is the code to create the small graph we used as an example in Neo4j:

CREATE (A:Label {id: "A"})
CREATE (B:Label {id: "B"})
CREATE (C:Label {id: "C"})
CREATE (D:Label {id: "D"})
CREATE (E:Label {id: "E"})
CREATE (A)-[:REL]->(B)
CREATE...

The rest of the chapter is locked

You have been reading a chapter from

Graph Data Science with Neo4j

Published in: Jan 2023Publisher: PacktISBN-13: 9781804612743

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Graph Data Science with Neo4j

Characterizing a Graph Dataset

Technical requirements

Characterizing a graph from its node and edge properties

Link direction

Computing the graph degree distribution

Definition of a node’s degree

Installing and using the Neo4j Python driver

Counting node labels and relationship types in Python

Learning about other characterizing metrics

Triangle count

Summary

Further reading

Exercises

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook