Reader small image

You're reading from  Graph Data Science with Neo4j

Product typeBook
Published inJan 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781804612743
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Estelle Scifo
Estelle Scifo
author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo

Right arrow

Preface

Data science today is a core component of many companies and organizations taking advantage of its predictive power to improve their products or better understand their customers. It is an ever-evolving field, still undergoing intense research. One of the most trending research areas is graph data science (GDS), or how representing data as a connected network can improve models.

Among the different tools on the market to work with graphs, Neo4j, a graph database, is popular among developers for its ability to build simple and evolving data models and query data easily with Cypher. For a few years now, it has also stood out as a leader in graph analytics, especially since the release of the first version of its GDS library, allowing you to run graph algorithms from data stored in Neo4j, even at a large scale.

This book is designed to guide you through the field of GDS, always using Neo4j and its GDS library as the main tool. By the end of this book, you will be able to run your own GDS model on a graph dataset you created. By the end of the book, you will even be able to pass the Neo4j Data Science certification to prove your new skills to the world.

Who this book is for

This book is for people who are curious about graphs and how this data structure can be useful in data science. It can serve both data scientists who are learning about graphs and Neo4j developers who want to get into data science.

The book assumes minimal data science knowledge (classification, training sets, confusion matrices) and some experience with Python and its related data science toolkit (pandas, matplotlib, and scikit-learn).

What this book covers

Chapter 1, Introducing and Installing Neo4j, introduces the basic principles of graph databases and gives instructions on how to set up Neo4j locally, create your first graph, and write your first Cypher queries.

Chapter 2, Using Existing Data to Build a Knowledge Graph, guides you through loading data into Neo4j from different formats (CSV, JSON, and an HTTP API). This is where you will build the dataset that will be used throughout this book.

Chapter 3, Characterizing a Graph Dataset, introduces some key metrics to differentiate one graph dataset from another.

Chapter 4, Using Graph Algorithms to Characterize a Graph Dataset, goes deeper into understanding a graph dataset by using graph algorithms. This is the chapter where you will start to use the Neo4j GDS plugin.

Chapter 5, Visualizing Graph Data, delves into graph data visualization by drawing nodes and edges, starting from static representations and moving on to dynamic ones.

Chapter 6, Building a Machine Learning Model with Graph Features, talks about machine learning model training using scikit-learn. This is where we will first use the GDS Python client.

Chapter 7, Automating Feature Extraction with Graph Embeddings for Machine Learning, introduces the concept of node embedding, with practical examples using the Neo4j GDS library.

Chapter 8, Building a GDS Pipeline for Node Classification Model Training, introduces the topic of node classification within GDS without involving a third-party tool.

Chapter 9, Predicting Future Edges, gives a short introduction to the topic of link prediction, a graph-specific machine learning task.

Chapter 10, Writing Your Custom Graph Algorithms with the Pregel API in Java, covers the exciting topic of building an extension for the GDS plugin.

To get the most out of this book

You will need access to a Neo4j instance. Options and installation instructions are given in Chapter 1, Introducing and Installing Neo4j. We will also intensively use Python and the following packages: pandas, scikit-learn, network, and graphdatascience. The code was tested with Python 3.10 but should work with newer versions, assuming no breaking change is made in its dependencies. Python code is provided as a Jupyter notebook, so you’ll need Jupyter Server installed and running to go through it.

For the very last chapter, a Java JDK will also be required. The code was tested with OpenJDK 11.

Software/hardware covered in the book

Operating system requirements

Neo4j 5.x

Windows, macOS, or Linux

Python 3.10

Windows, macOS or Linux

Jupyter

Windows, macOS or Linux

OpenJDK 11

Windows, macOS or Linux

You will also need to install Neo4j plugins: APOC and GDS. Installation instructions for Neo4j Desktop are given in the relevant chapters. However, if you are not using a local Neo4j instance, please refer to the following pages for installation instructions, especially regarding version compatibilities:

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Graph-Data-Science-with-Neo4j. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

CREATE (:Movie {
    id: line.show_id,
    title: line.title,
    releaseYear: line.release_year
  }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

LOAD CSV WITH HEADERS
FROM 'file:///netflix/netflix_titles.csv' AS line
WITH split(line.director, ",") as directors_list
UNWIND directors_list AS director_name
CREATE (:Person {name: trim(director_name)})

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Graph Data Science with Neo4J, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application. 

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781804612743

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Graph Data Science with Neo4j
Published in: Jan 2023Publisher: PacktISBN-13: 9781804612743
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Estelle Scifo

Estelle Scifo possesses over 7 years experience as a data scientist, after receiving her PhD from the Laboratoire de lAcclrateur Linaire, Orsay (affiliated to CERN in Geneva). As a Neo4j certified professional, she uses graph databases on a daily basis and takes full advantage of its features to build efficient machine learning models out of this data. In addition, she is also a data science mentor to guide newcomers into the field. Her domain expertise and deep insight into the perspective of the beginners needs make her an excellent teacher.
Read more about Estelle Scifo