Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Graph Data Science with Neo4j

You're reading from  Graph Data Science with Neo4j

Product type Book
Published in Jan 2023
Publisher Packt
ISBN-13 9781804612743
Pages 288 pages
Edition 1st Edition
Languages
Author (1):
Estelle Scifo Estelle Scifo
Profile icon Estelle Scifo

Table of Contents (16) Chapters

Preface 1. Part 1 – Creating Graph Data in Neo4j
2. Chapter 1: Introducing and Installing Neo4j 3. Chapter 2: Importing Data into Neo4j to Build a Knowledge Graph 4. Part 2 – Exploring and Characterizing Graph Data with Neo4j
5. Chapter 3: Characterizing a Graph Dataset 6. Chapter 4: Using Graph Algorithms to Characterize a Graph Dataset 7. Chapter 5: Visualizing Graph Data 8. Part 3 – Making Predictions on a Graph
9. Chapter 6: Building a Machine Learning Model with Graph Features 10. Chapter 7: Automatically Extracting Features with Graph Embeddings for Machine Learning 11. Chapter 8: Building a GDS Pipeline for Node Classification Model Training 12. Chapter 9: Predicting Future Edges 13. Chapter 10: Writing Your Custom Graph Algorithms with the Pregel API in Java 14. Index 15. Other Books You May Enjoy

Using Graph Algorithms to Characterize a Graph Dataset

So far, you have been able to distinguish between different types and topologies of graphs using simple observations and metrics, such as degree distribution. But we can extract more information from a graph structure. In this chapter, we will learn how to find clusters of nodes—or communities—in a network, only based on the nodes and edges in a graph. We will also learn about node importance algorithms, such as PageRank. To do so, we will install and learn the principles of the Neo4j Graph Data Science (GDS) library, which allows us to run both unsupervised and supervised graph algorithms.

This chapter is a key chapter since lots of the concepts explored herein will be used in the rest of the book, so you are encouraged to stay focused until the end.

In this chapter, we’re going to cover the following main topics:

  • Digging into the Neo4j GDS library
  • Projecting a graph for use by GDS
  • Computing...

Technical requirements

In order to be able to reproduce the examples given in this chapter, you’ll need the following tools:

  • Neo4j 5.x installed on your computer (see the installation instructions in Chapter 1, Introducing and Installing Neo4j)
  • GDS plugin (version >= 2.2)
  • An internet connection to download the plugins and the datasets
  • Any code listed in the book will be available in the associated GitHub repository (https://github.com/PacktPublishing/Graph-Data-Science-with-Neo4j) in the corresponding chapter folder

Digging into the Neo4j GDS library

The GDS library was first released in 2020. It was the successor of the Graph Algorithm plugin, which first appeared in 2019. Since then, a lot of improvements have been performed in terms of performance and standardization, and a lot of new features have been added, both in terms of algorithm parametrization and new kinds of algorithms. In the following subsections, we give an overview of its content and working principles.

GDS content

As the name suggests, the GDS library contains tools to be used in a data science project using data stored in Neo4j. This includes the following:

  • Path-related algorithms
  • Graph algorithms
  • Machine learning (ML) models and pipelines
  • Python client

Let’s talk in a bit more detail about each of these aspects, to understand when and where they are useful.

Path-related algorithms

In graph theory, traversing a graph to find specific paths from one node to another (typically the...

Projecting a graph for use by GDS

GDS doesn’t operate directly on the data stored in Neo4j. Tuned for optimal performance, it uses its own data structure, which can be configured to contain a minimal amount of entities to optimize memory. While your Neo4j graph may contain tens of node labels, each with multiple properties, some algorithms will only use a single node label (for example, User) and no property. The GDS library offers the possibility to create a projected graph containing only these nodes. A so-called projected graph can be created using two different procedures:

  • gds.graph.project: For native projection
  • gds.graph.project.cypher: For Cypher projection

We are going to detail both of these procedures in the following sections.

Backward compatibility

If you used GDS prior to its 2.0 version, the aforementioned procedures used to be called gds.graph.create and gds.graph.create.cypher, respectively.

Native projections

In a native projection...

Computing a node’s degree with GDS

We have studied the node degree metric and its distribution in the preceding chapter, Chapter 3, Characterizing a Graph Dataset. At that time, we computed the node’s degree using a Cypher query. GDS provides a procedure to perform the same computation, on a projected graph. We are now going to use this procedure, whose results are well known, in order to understand the different algorithm modes and configuration options.

All algorithm procedures from GDS use the same syntax:

gds.<algoName>.<executionMode>(<graphName>, <algoConfiguration>)

Here, the following applies:

  • algoName is the name of the algorithm. Note that some algorithms are included in an alpha or beta version, in which case they are accessible via gds.alpha.<algoName> or gds.beta.<algoName>.
  • executionMode is one of stream, write, mutate, estimate or stats, as defined in the GDS project workflow section.
  • graphName...

Understanding a graph’s structure by looking for communities

In a graph, the repartition of edges is often a key characteristic. Indeed, graph traversal is used by many algorithms to propagate some values from one node to its neighbors, until some equilibrium is reached. Knowing in advance that some groups of nodes are totally isolated from, or share very few links with, the rest of the graph is key information to understand the result of such algorithms. Besides those technical details, the knowledge that some nodes tend to be more connected with each other with respect to other nodes in the graph, forming a community can also be used as an input feature for an ML model. You can, for instance, imagine finding communities in your user base depending on the products they frequently buy and identifying the group of coffee aficionados, different from the group of tea lovers, that will get different recommendations.

Number of components

The next goal of this analysis is to...

Summary

In this chapter, you have learned the basic principles of the Neo4j GDS library 2.x. You have been able to create projected graphs, configuring included nodes, relationships, and properties with native graph projection. You have also learned how to generate properties or relationships on the fly using Cypher projections. In the second section, you have run your first GDS algorithm—the degree algorithm—and got familiar with the stream, write, and mutate algorithm modes. You have also been made aware of the algorithm configuration, especially regarding relationship orientation.

Once GDS had no more secrets to you, we started using other types of algorithms—namely, community detection algorithms. We studied a few of them and learned about their differences and what they can teach us about our graph.

In the next chapter, we will learn how to use another powerful tool of the Neo4j universe: Neo4j Bloom, yet another graph application. Bloom is designed...

Further reading

To investigate further the topics covered in this chapter, you can check the following resources:

  • The GDS manual: https://neo4j.com/docs/graph-data-science/current/.
  • Hands-On Graph Analytics with Neo4j, my previous book, gives much more detail about each centrality and community detection algorithm, including example implementations to better understand what they are doing.
  • NEuler – the Graph Data Science Playground: An application, similar to Neo4j Desktop or neodash, to run graph algorithms from GDS. At the time of writing, it was not yet updated to work with Neo4j 5, but it’s worth keeping an eye on it since it can be very useful for investigations: https://github.com/neo4j-devtools/neuler.
lock icon The rest of the chapter is locked
You have been reading a chapter from
Graph Data Science with Neo4j
Published in: Jan 2023 Publisher: Packt ISBN-13: 9781804612743
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}