Reader small image

You're reading from  Graph Data Science with Neo4j

Product typeBook
Published inJan 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781804612743
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Estelle Scifo
Estelle Scifo
author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Right arrow

Neo4j in the graph databases landscape

Even when restricting the scope to graph databases, there are still different ways to envision such data stores:

  • Resource description framework (RDF): Each record is a triplet of the Subject Predicate Object type. This is a complex vocabulary that expresses a relationship of a certain type (the predicate) between a subject and an object; for instance:
    Alice(Subject) KNOWS(Predicate) Bob(Object)

Very famous knowledge bases such as DBedia and Wikidata use the RDF format. We will talk about this a bit more in the next chapter (Chapter 2, Using Existing Data to Build a Knowledge Graph).

  • Labeled-property graph (LPG): A labeled-property graph contains nodes and relationships. Both of these entities can be labeled (for instance, Alice and Bob are nodes with the Person label, and the relationship between them has the KNOWS label) and have properties (people have names; an acquaintance relationship can contain the date when both people first met as a property).

Neo4j is a labeled-property graph. And even there, like MySQL, PostgreSQL, and Microsoft SQL Server are all relational databases, you will find different vendors proposing LPG graph databases. They differ in many aspects:

  • Whether they use a native graph engine or not: As we discussed earlier, it is possible to use a KV store or even a SQL database to store graph data. In this case, we’re talking about non-native storage engines since the storage does not reflect the graphical nature of the data.
  • The query language: Unlike SQL, the query language to deal with graph data has not yet been standardized, even if there is an ongoing effort being led by the GQL group (see, for instance, https://gql.today/). Neo4j uses Cypher, a declarative query language developed by the company in 2011 and then open-sourced in the openCypher project, allowing other databases to use the same language (see, for instance, RedisGraph or Amazon Neptune). Other vendors have created their own languages (AQL for ArangoDB or CQL for TigerGraph, for instance). To me, this is a key point to take into account since the learning curve can be very different from one language to another. Cypher has the advantage of being very intuitive and a few minutes are enough to write your own queries without much effort.
  • Their (integrated or not) support for graph analytics and data science.

A note about performances

Almost every vendor claims to be the best one, at least in some aspects. This book won’t create another debate about that. The best option, if performances are crucial for your application, is to test the candidates with a scenario close to your final goal in terms of data volume and the type of queries/analysis.

Neo4j ecosystem

The Neo4j database is already very helpful by itself, but the amount of extensions, libraries, and applications related to it makes it the most complete solution. In addition, it has a very active community of members always keen to help each other, which is one of the reasons to choose it.

The core Neo4j database capabilities can be extended thanks to some plugins. Awesome Procedures on Cypher (APOC), a common Neo4j extension, contains some procedures that can extend the database and Cypher capabilities. We will use it later in this book to load JSON data.

The main plugin we will explore in this book is the Graph Data Science Library. Its predecessor, the Graph Algorithm Library, was first released in 2018 by the Neo4j lab team. It was quickly replaced by the Graph Data Science Library, a fully production-ready plugin, with improved performance. Algorithms are improved and added regularly. Version 2.0, released in 2021, takes graph data science even further, allowing us to train models and build analysis pipelines directly from the library. It also comes with a handy Python client, which is very convenient for including graph algorithms into your usual machine learning processes, whether you use scikit-learn or other machine learning libraries such as TensorFlow or PyTorch.

Besides the plugins, there are also lots of applications out there to help us deal with Neo4j and explore the data it contains. The first application we will use is Neo4j Desktop, which lets us manage several Neo4j databases. Continue reading to learn how to use it. Neo4j Desktop also lets you manage your installed plugins and applications.

Applications installed into Neo4j Desktop are granted access to your active database. While reading this book, you will use the following:

  • Neo4j Browser: A simple but powerful application that lets you write Cypher queries and visualize the result as a graph, table, or JSON:
Figure 1.4 – Neo4j Browser

Figure 1.4 – Neo4j Browser

  • Neo4j Bloom: A graph visualization application in which you can customize node styles (size, color, and so on) based on their labels and/or properties:
Figure 1.5 – Neo4j Bloom

Figure 1.5 – Neo4j Bloom

  • Neodash: This is a dashboard application that allows us to draw plots from the data stored in Neo4j, without having to extract this data into a DataFrame first. Plots can be organized into nice dashboards that can be shared with other users:
Figure 1.6 – Neodash

Figure 1.6 – Neodash

This list of applications is non-exhaustive. You can find out more here: https://install.graphapp.io/.

Good to know

You can create your own graph application to be run within Neo4j Desktop. This is why there are so many diverse applications, some of which are being developed by community members or Neo4j partners.

This section described Neo4j as a database and the various extensions that can be added to it to make it more powerful. Now, it is time to start using it. In the following section, you are going to install Neo4j locally on our computer so that you can run the code examples provided in this book (which you are highly encouraged to do!).

Previous PageNext Page
You have been reading a chapter from
Graph Data Science with Neo4j
Published in: Jan 2023Publisher: PacktISBN-13: 9781804612743
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo