Reader small image

You're reading from  Hands-On Graph Analytics with Neo4j

Product typeBook
Published inAug 2020
PublisherPackt
ISBN-139781839212611
Edition1st Edition
Tools
Right arrow
Author (1)
Estelle Scifo
Estelle Scifo
author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Right arrow
The Graph Data Science Library and Path Finding

In this chapter, we will use the Graph Data Science (GDS) library for the first time, which is the successor of the Graph Algorithm library for Neo4j. After an introduction to the main principles of the library, we will learn about the pathfinding algorithms. Following that, we will use implementations in Python and Java to understand how they work. We will then learn how to use the optimized version of these algorithms, implemented in the GDS plugin. We will cover the Dijkstra and A* shortest path algorithms, alongside other path-related methods such as the traveling-salesman problem and minimum spanning trees.

The following topics will be covered in this chapter:

  • Introducing the Graph Data Science plugin
  • Understanding the importance of shortest path through its applications
  • Going through Dijkstra's shortest path algorithm...

Technical requirements

The following tools will be used throughout this chapter:

If you are using Neo4j < 4.0, then the latest compatible version of the GDS plugin is 1.1, whereas if you are using Neo4j ≥ 4.0, then the first compatible version of the GDS plugin is 1.2.

Introducing the Graph Data Science plugin

We'll start by introducing the GDS plugin. Provided by Neo4j, it extends the capabilities of its graph database for analytics purposes. In this section, we will go through naming conventions and introduce the very important concept of graph projection, which we will use intensively in the rest of this book.

The first implementation of this plugin was done in the Graph Algorithms library, which was first released in June 2017. In 2020, it was replaced by the GDS plugin. The GDS plugin includes performance optimization for the most used algorithms so that they can run on huge graphs (several billions of nodes). Even though I will be highlighting the optimized algorithms in this book, I would suggest you refer to the latest documentation to ensure you get the most up-to-date information (https://neo4j.com/docs/graph-data-science/current/).

The full code of the GDS plugin is open source and available on GitHub at: https://github.com/neo4j/graph...

Understanding the importance of shortest path algorithms through their applications

When trying to find applications for shortest pathfinders on a graph, we think of car navigation via GPS, but there are many more use cases. This section gives an overview of the different applications of pathfinding. We will talk about networks and video games, and give an introduction to the traveling-salesman problem.

Routing within a network

Routing often refers to GPS navigation, but some more surprising applications are also possible.

GPS

The name GPS is actually used for two different technologies:

  • The Global Positioning System (GPS) itself is a way of finding your precise location on Earth. It is made possible by a constellation of satellites orbiting around the planet and sending continuous signals. Depending on which signals your device receives, an algorithm based on triangulation methods can determine your position.
The satellites used by the GPS system all belong to the USA. Equivalent systems...

Dijkstra's shortest paths algorithm

Dijkstra's algorithm was developed by the Dutch computer scientist E. W. Dijkstra in the 1950s. Its purpose is to find the shortest path between two nodes in a graph. The first subsection will guide you through how the algorithm works. The second subsection will be dedicated to the use of Dijkstra's algorithm within Neo4j and the GDS plugin.

Understanding the algorithm

Dijkstra's algorithm is probably the most famous path finding algorithm. It is a greedy algorithm that will traverse the graph breadth first (see the following figure), starting from a given node (the start node) and trying to make the optimal choice regarding the shortest path at each step:

Graph traversal (reminder from Chapter 1, Graph Databases)

In order to understand the algorithm, let's run it on a simple graph.

Running Dijkstra's algorithm on a simple graph

As an example, we will use the following undirected weighted graph:

We are looking for the...

Finding the shortest path with the A* algorithm and its heuristics

Developed in 1968 by P. Hart, N. Nilsson and B. Raphael, the A* algorithm (pronounced A-star) is an extension of Dijkstra's algorithm. It tries to optimize searches by guessing the traversal direction thanks to heuristics. Thanks to this approach, it is known to be faster than Dijkstra's algorithm, especially for large graphs.

Algorithm principles

In Dijkstra's algorithm, all possible paths are explored. This can be very time-consuming, especially on large graphs. The A* algorithm tries to overcome this problem, with the idea that it can guess which paths to follow and which path expansions are less likely to be the shortest ones. This is achieved by modifying the criterion for choosing the next start node at each iteration. Instead of using only the cost of the path from the start to the current node, the A* algorithm adds another component: the estimated cost of going from the current node to the end node...

Optimizing processes using graphs

An optimization problem's objective is to find an optimal solution among a large set of candidates. The shape of your favorite soda can was derived from an optimization problem, trying to minimize the amount of material to use (the surface) for a given volume (33 cl). In this case, the surface, the quantity to minimize, is also called the objective function.

Optimization problems often come with some constraints on the variables. The fact that a length has to be positive is already a constraint, mathematically speaking. But constraints can be expressed in many different forms.

The simpler form of an optimization problem is so-called linear optimization, where both the objective function and the constraints are linear.

Graph optimization problems are also part of mathematical optimization problems. The most famous of them is the traveling-salesman problem (TSP). We are going to talk a bit more about this particular problem in the following section...

Summary

This chapter was a long one, as it was our introduction to the GDS plugin. It is important to understand how to define the projected graph and the different entities to be included in it. We will see more examples in the following chapters, as we are going to use this library in all remaining chapters of the book.

The following table summarizes the different algorithms we have studied in this chapter, with some important characteristics to keep in mind:

Algorithm Description Stream/Write Negative weights
shortestPath The shortest path between two nodes using Dijkstra's algorithm Both No
shortestPath.astar The shortest path between two nodes using the A* algorithm and great circle heuristics (requires nodes with latitude and longitude properties) Stream No
kShortestPath The k-shortest paths between two nodes using Yen's algorithm Both Yes
shortestPath.deltaStepping Single source shortest path: the shortest path between a node and all other nodes in the graph...

Questions

In order to test your understanding, try to answer the following questions. The answers are provided in the Assessment part at the end of this book:

  1. The GDS plugin and projected graphs:
  • Why does the GDS plugin use projected graphs?
  • Where are these projected graphs stored?
  • What are the differences between named and anonymous projected graphs?
  • Create a projected graph containing:
  • Nodes: label Node
  • Relationships: types REL1 and REL2
  • Create a projected graph with:
  • Nodes: labels Node1 and Node2
  • Relationships: type REL1 and properties prop1 and prop2
  • How do you consume the results of a graph algorithm from the GDS plugin?
  1. Pathfinding:
  • Which algorithms are based on Dijkstra's algorithm?
  • What is the important restriction regarding an edge's weight for these algorithms?
  • What information is needed to use the A* algorithm?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Graph Analytics with Neo4j
Published in: Aug 2020Publisher: PacktISBN-13: 9781839212611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo