You're reading from Hands-On Graph Analytics with Neo4j

Product typeBook

Published inAug 2020

PublisherPackt

ISBN-139781839212611

Edition1st Edition

Tools

Neo4j

Concepts

Database Programming

Author (1)

Estelle Scifo

Graph definition and examples

The question you may ask at this point is "Why should I care about graphs? After all, my company/business/interest is not about graphs or networks of any kind. I know my data model, well arranged into SQL tables or NoSQL documents, and I can retrieve the information I want when I want." This book will teach you how to empower your data by looking at it in a different way. Surprisingly enough, graphs can be used to model a lot of processes, from the more obvious ones such as road networks, to less intuitive use cases such as video games or credit card fraud detection, among many others.

Graph theory

Let's start from the beginning and answer the question "What is a graph?"

A bit of history: the Seven Bridges of Königsberg problem

Graph studies originate back to Leonhard Euler, a prolific Swiss mathematician who lived in the eighteenth century. In 1735, he published a paper proposing a solution to the Seven Bridges of Königsberg problem. The problem is the following:

Given the city whose geography is depicted in the following image, is there a way to walk across each of the seven bridges of the city once and only once, and return to our starting point?

As you can see, this city is crossed by a river that splits the city into two banks, A and B. The river meander additionally creates two islands, C and D, also part of the city. Those two banks and two islands are connected by a total of seven bridges: two bridges between A and C, two other bridges between C and B, one between C and D, one between B and D, and a last one between D and A:

Euler's reasoning (on the right side) was to reduce this complex geography to the most simple drawing, like the one you can see on the right of the previous image, since the route used within each island is not relevant. Each island then becomes a single point, or node, connected to another by one or several links, or edges, representing the bridges.

With this simple visualization, the mathematician was able to solve the initial problem by noting that, if you arrive at an island (vertex) via one bridge, you will need to leave it using another bridge (except for the start and end vertices). In other words, all vertices but two need to be connected to an even number of relationships. This is not the case in the Königsberg graph, since we have the following:

A: 3 connections (to C twice, and to D once)
B: 3 connections (to C twice, and to D once)
C: 5 connections (to A twice, to B twice and to D once)
D: 3 connections (to A once, to C once and to D once)

This kind of path, where each edge is used once and only once, is called a Eulerian cycle and it can be said that a graph has a Eulerian cycle if and only if all of its vertices have even degrees.

The number of connections for a node is called the degree of the node.

Graph definition

This leads us to the mathematical definition of a graph:

A graph G = (V, E) is a pair of:

V, a set of nodes or vertices: the islands in our previous example
E, a set of edges connecting nodes: the bridges

The Königsberg graph illustrated on the right of the preceding image can then be defined as follows:

V = [A, B, C, D]
E = [ 
      (A, C), 
      (A, C),
      (C, B),
      (C, B),
      (A, D),
      (C, D),
      (B, D)
]

Graphs, like many mathematical objects, are well defined. While it can be difficult to find a good visualization for some of those objects, graphs, on the other hand, suffer from the almost infinite number of ways to draw them.

Visualization

Apart from very special cases, there is no single way to draw a graph and visualize it. Indeed, graphs are most often an abstract representation of reality. For instance, all four graphs depicted in the following image represent the exact same set of nodes and edges, so, by definition, the same mathematical graph:

We cannot rely only on our eyes to find patterns within graphs. For instance, looking only at the lower-right plot, it would be impossible to see the pattern that is visible in the upper-right plot. That's where graph algorithms enter into the game, which will be discussed in more detail in Chapter 6, Node Importance, and Chapter 7, Community Detection and Similarity Measures.

Examples of graphs

Now that we have a better idea of what a graph is, it's time to discover some more examples to understand which purposes graphs can be useful for.

Networks

With the graph definition in mind (a set of nodes connected to each other via edges) and the bridges example from the last section, we can easily imagine how all kinds of networks can be seen as graphs, including road networks, computer networks, or even social networks.

Road networks

Road networks are a perfect example of graphs. In such networks, the nodes are the road intersections, and edges are the roads themselves, as you can see in the following image:

This image shows the road network around Central Park in New York City, wherein streets are edges between junctions representing nodes

With road networks, many questions can be answered with graph analysis, such as the following:

What is the shortest path between two points (nodes)?
How long is this shortest path?
Are there alternative routes?
How can you visit all nodes within a list in a minimal amount of time?

This last question is especially important for parcel delivery, in order to minimize the number of driven miles to maximize the number of delivered parcels and satisfied customers.

We will go into more detail about this topic in Chapter 4, The Graph Data Science Library and Path Finding.

Computer networks

In a computer network, each computer/router is a node and the cables between them are the edges. The following image illustrates some possible topologies used for a computer network (credit: https://commons.wikimedia.org/wiki/File:NetworkTopologies.svg):

You can now draw the parallel with the graph definition we discovered in the last section. Here again, the graph structure helps in answering some common questions you may ask yourself about your network:

How fast will this information be transferred from A to B? This sounds like a shortest path issue.
Which of my nodes is the most critical one? By critical, we mean that if this node is not working for some reason, the whole network will be impacted. Not all nodes have the same impact on the network. That's where centrality algorithms come into the game (see Chapter 6, Node Importance).

Social networks

Facebook, LinkedIn, and all of our favorite social networks use graphs to model their users and interactions. In the most basic example of a social graph, nodes represent people, and edges the friendship or professional relationship between them, as illustrated in the following image:

Here again, graphs allow us to see the data from a different perspective. For instance, we have seen this kind of information when looking at someone’s profile on LinkedIn:

In that case, it tells us that the connected user (me) is just two connections away from Clark Kent. In other words, one person in my network is already connected to a person who is connected to Clark Kent. The following image illustrates this more clearly, in terms of degrees of separation:

You've probably heard about the Six Degrees of Separation theory. In 1929, the Hungarian journalist Frigyes Karinthy proposed a theory according to which each person on Earth is at most six connections away from any other person. In other words, if you want to talk to one person, say Barack Obama, a friend of yours has a friend whose friend has a friend... who knows Barack Obama and can introduce you to him. According to Karinthy, this connection chain must contain less than six connections, or seven people in total, including you and Barack Obama.

Given that there are more than 7 billion human beings on Earth, that's a surprisingly small number! With the large databases that are available nowadays, such as the friendship connections from Facebook or email exchanges from Microsoft, researchers have tried to prove the preceding statement. From the Microsoft email database, for instance, it was shown in 2008 that the average degree of separation between 180 billion distinct pairs of people was around 6.6. But this is just an average, and the number of hops to connect two people could go up to 29 with that dataset.

Many other kinds of analyses can be performed over social graphs:

Node importance: Again, it might be very useful to have an idea of which nodes (persons) are the most important. However, the definition of importance here will be different than in the case of a computer network, since it is very unlikely that a single person's retirement from social media makes the whole world collapse. However, influencers have a particular interest for marketing experts.
Community detection: Also called clustering, is a way to find a group of nodes sharing some characteristics. For instance, finding users who share the same interests, or visit the same places, can be used to recommend products to them.
Link prediction: With a graph, you can think of creating intelligent models to predict whether two entities are likely to be connected in the future. Here again, recommendation engines are one possible application of such a tool.

You can find more information about the Facebook graph as an example at https://developers.facebook.com/docs/graph-api.

As you can see, networks of all kinds are very well suited to graph databases. But we can go far beyond that view and imagine all kinds of data as a graph, which will open up a lot of new perspectives.

Your data is also a graph

You may have noticed that in the previous image of a social graph, the edges have names. Indeed, some people are friends, while some others have a father/son relationship. Now, let's imagine we can have any kind of relationship, meaning we can start connecting different kinds of entities. For instance, a person is living in a particular country, so (s)he is connected to that country with a relationship of type LIVES_IN. Are you beginning to see the point? With that kind of reasoning, the world itself is a graph and your business is a subpart of it.

Graphs are about relationships, and the world is connected, meaning there are relationships everywhere. We’ll talk about this in more detail in Chapter 3, Empowering Your Business with Pure Cypher, which is dedicated to knowledge graphs.

Graph databases allow you to model the data in that way: nodes, connected by relationships of some type. Let's see how to migrate data stored in relational databases to graph databases.

You have been reading a chapter from

Hands-On Graph Analytics with Neo4j

Published in: Aug 2020Publisher: PacktISBN-13: 9781839212611

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Other recommended products

Related to this chapter

Graph Machine Learning

Data scientists working with network data will be able to put their knowledge to work with this practical guide to building machine learning algorithms using graph data. The book provides a hands-on approach to implementation and associated methodologies that will have you up and running and productive in no time.

BookJun 2021338 pages

Network Science with Python and NetworkX Quick Start Guide

The emerging field of Network Science is about understanding different kind of relationships. This book covers the latest version 2.x of NetworkX for performing Network Science with Python.You will also learn the fundamentals of network theory and see practical examples of how they are applied to real-world problems using Python and NetworkX.

BookApr 2019190 pages

Learning Neo4j 3.x

With increase in complexity of data relationships, graph databases are quickly becoming the de-facto standard for organizations who manage large volumes of connected data. This book aims at getting you started with the popular graph database Neo4j along with covering key concepts like modelling transitions, searches, traversals, relationships and protocols to navigate through complex networks of information. Also take a trip down the new and improved feature additions to version 3.x such as the APOC library, security, various plugins and extensions for spatial operations on data.

BookOct 2017316 pages

Geospatial Data Science Quick Start Guide

This book will help you leverage the power of data analysis and apply it to location and geospatial data to gain interesting insights. It presents useful tools and location intelligence techniques in Python to implement geospatial analytics from scratch.

BookMay 2019170 pages

Practical Discrete Mathematics

Discrete math deals with studying finite and distinct elements. With this book, you’ll learn the discrete math language and methods crucial to studying and describing objects and functions in computer science. You'll also focus on the mathematics of machine learning and computer science and prepare to understand real-world algorithm development.

BookFeb 2021330 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages