Packt+ | Advance your knowledge in tech

You're reading from Seven NoSQL Databases in a Week

Product typeBook

Published inMar 2018

PublisherPackt

ISBN-139781787288867

Edition1st Edition

Tools

MongoDB Cassandra

Concepts

Database Programming

Authors (2):

Sudarshan Kadambi

Xun (Brian) Wu

View More author details

Chapter 3. Neo4j

Some application use cases or data models may place as much (or more) importance on the relationships between entities as the entities themselves. When this is the case, a graph database may be the optimal choice for data storage. In this chapter, we will look at Neo4j, one of the most commonly used graph databases.

Over the course of this chapter, we will discuss several aspects of Neo4j:

Useful features
Appropriate use cases
Anti-patterns and pitfalls
Ways of using Neo4j with languages such as:
- Cypher
- Python
- Java

Once you have completed this chapter, you will begin to understand the significance of graph databases. You will have worked through installing and configuring Neo4j as you build up your own server. You will have employed simple scripts and code to interact with and utilize Neo4j, allowing you to further explore ideas around modeling interconnected data.

We'll start with a quick introduction to Neo4j. From there, we will move on to the appropriate graph database use cases...

What is Neo4j?

Neo4j is an open source, distributed data store used to model graph problems. It was released in 2007 and is sponsored by Neo4j, Inc., which also offers enterprise licensing and support for Neo4j. It departs from the traditional nomenclature of database technologies, in which entities are stored in schema-less, entity-like structures called nodes. Nodes are connected to other nodes via relationships or edges. Nodes can also be grouped together with optional structures called labels.

This relationship-centric approach to data modeling is known as the property graph model. Under the property graph model, both nodes and edges can have properties to store values. Neo4j embraces this approach. It is designed to ensure that nodes and edges are stored efficiently, and that nodes can share any number or type of relationships without sacrificing performance.^[8]

How does Neo4j work?

Neo4j stores nodes, edges, and properties on disk in stores that are specific to each type—for example, nodes are stored in the node store.^{[5, s.11]} They are also stored in two types of caches—the file system (FS) and the node/relationship caches. The FS cache is divided into regions for each type of store, and data is evicted on a least-frequently-used (LFU) policy.

Data is written in transactions assembled from commands and sorted to obtain a predictable update order. Commands are sorted at the time of creation, with the aim of preserving consistency. Writes are added to the transaction log and either marked as committed or rolled back (in the event of a failure). Changes are then applied (in sorted order) to the store files on disk.

It is important to note that transactions in Neo4j dictate the state and are therefore idempotent by nature.^{[5, s.34]} They do not directly modify the data. Reapplying transactions for a recovery event simply replays the transactions as of...

Features of Neo4j

Aside from its support of the property graph model, Neo4j has several other features that make it a desirable data store. Here, we will examine some of those features and discuss how they can be utilized in a successful Neo4j cluster.

Clustering

Enterprise Neo4j offers horizontal scaling through two types of clustering. The first is the typical high-availability clustering, in which several slave servers process data overseen by an elected master. In the event that one of the instances should fail, a new master is chosen.

The second type of clustering is known as causal clustering. This option provides additional features, such as disposable read replicas and built-in load balancing, that help abstract the distributed nature of the clustered database from the developer. It also supports causal consistency, which aims to support Atomicity Consistency Isolation and Durability (ACID) compliant consistency in use cases where eventual consistency becomes problematic. Essentially...

Evaluating your use case

Because of Neo4j's focus on node/edge traversal, it is a good fit for use cases requiring analysis and examination of relationships. The property graph model helps to define those relationships in meaningful ways, enabling the user to make informed decisions. Bearing that in mind, there are several use cases for Neo4j (and other graph databases) that seem to fit naturally.

Social networks

Social networks seem to be a natural fit for graph databases. Individuals have friends, attend events, check in to geographical locations, create posts, and send messages. All of these different aspects can be tracked and managed with a graph database such as Neo4j.

Who can see a certain person's posts? Friends? Friends of friends? Who will be attending a certain event? How is a person connected to others attending the same event? In small numbers, these problems could be solved with a number of data stores. But what about an event with several thousand people attending, where each...

Neo4j anti-patterns

Relative to other NoSQL databases, Neo4j does not have a lot of anti-patterns. However, there are some common troubles that seem to befall new users, and we will try to detail them here.

Applying relational modeling techniques in Neo4j

Using relational modeling techniques can lead to trouble with almost every NoSQL database, and Neo4j is no exception to that rule. Similar to other NoSQL databases, building efficient models in Neo4j involves appropriately modeling the required queries. Relational modeling requires you to focus on how your data is stored, and not as much on how it is queried or returned.

Whereas modeling for Neo4j requires you to focus on what your nodes are, and how they are related to each other. Additionally, the relationships should be dependent on the types of questions (queries) that your model will be answering. Failure to apply the proper amount of focus on your data model can lead to performance and operational troubles later.

Using Neo4j for the first...

Neo4j hardware selection, installation, and configuration

Building your Neo4j instance(s) with the right hardware is essential to running a successful cluster. Neo4j runs best when there is plenty of RAM at its disposal.

Random access memory

One aspect to consider is that Neo4j runs on a Java virtual machine (JVM). This means that you need to have at least enough random-access memory (RAM) to hold the JVM heap, plus extra for other operating system processes. While Neo4j can be made to run on as little as 2 GB of RAM, a memory size of 32 GB of RAM (or more) is recommended for production workloads. This will allow you to configure your instances to map as much data into memory as possible, leading to optimal performance.

CPU

Neo4j supports both x86 and OpenPOWER architectures. It requires at least an Intel Core i3, while an Intel Core i7 or IBM POWER8 is recommended for production.

Disk

As with most data store technologies, disk I/O is a potential performance bottleneck. Therefore, it is recommended...

Using Neo4j

You should now be able to start your Neo4j server process in the foreground:

bin/neo4j console

This yields the following output:

Active database: graph.dbDirectories in use:  home:         /local/neo4jconfig:       /local/neo4j/conf  logs:         /local/neo4j/logs  plugins:      /local/neo4j/plugins  import:       /local/neo4j/import  data:         /local/neo4j/data  certificates: /local/neo4j/certificates  run:          /local/neo4j/runStarting Neo4j.2017-07-09 17:10:05.300+0000 INFO  ======== Neo4j 3.2.2 ========2017-07-09 17:10:05.342+0000 INFO  Starting...2017-07-09 17:10:06.464+0000 INFO  Bolt enabled on 192.168.0.100:7687.2017-07-09 17:10:09.576+0000 INFO  Started.2017-07-09 17:10:10.982+0000 INFO  Remote interface available at http://192.168.0.100:7474/

Alternatively, Neo4j can be started with the start command (instead of console) to run the process in the background. For this, current logs for the server process can be obtained by tailing the log/debug.log file:

tail -f...

Tips for success

Run Neo4j on Linux or BSD
Take advantage of the training options offered by Neo4j, Inc.
Talk to others in the community
Don't use Neo4j for the first time on something mission-critical
Recruit someone to your team who has graph database experience
Once in production, continue to monitor your instances' JVMs for GC performance and tune as necessary

As with all NoSQL data stores, it is important to remember that Neo4j is not a general-purpose database. It works well with specific use cases. Usually such cases are where the relationship is as (or more) important as the entities that it connects. To that end, Neo4j makes a great fit for things such as social networks, matchmaking sites, network management systems, and recommendation engines.

Equally as important as applying Neo4j to a proper use case is knowing what Neo4j antipatterns look like. Be sure to avoid using Neo4j with a full relational model (it is not a RDBMS). Try to avoid improper use of relationship types, as well as...

Summary

In this chapter, we have introduced the Neo4j database and how to use it with relationship-based modeling problems. One of the main advantages of Neo4j is the robust tutorial and help system that can be used with Neo4j Browser. It is the author's opinion that more databases should follow Neo4j's example, intrinsically providing intuitive examples and ways to get started. This can certainly improve both the adoption of the technology and proper use case selection.

One aspect of Neo4j that this chapter has spent some time discussing are the subtle differences between the Community and Enterprise Editions. The Community Edition may contain enough of a feature set to develop a prototype or demonstrate a use case. However, if features such as hot backups, security integration, and clustering for heavy operational workloads are required, the Enterprise Edition should be given serious consideration. Also, if your team is new to Neo4j or graph databases in general, an enterprise support contract...

References

Armbruster S (San Francisco,CA, 2016) Welcome to the Dark Side: Neo4j Worst Practices (& How to Avoid Them), Neo4j Blog-Originally presented at GraphConnect San Francisco. Retrieved on 20170723 from: https://neo4j.com/blog/dark-side-neo4j-worst-practices/
Bachman M (London, 2013) GraphAware: Towards Online Analytical Processing in Graph Databases. Imperial College London, section 2.5, pp 13-14. Retrieved on 20170722 from: https://graphaware.com/assets/bachman-msc-thesis.pdf
Brewer E., Fox, A (Berkeley, CA, 1999) Harvest, Yield, and Scalable Tolerant Systems. University of California at Berkeley, Doi: 10.1.1.24.3690. Retrieved on 20170530 from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3690&rep=rep1&type=pdf
EMA (2015) Interview: Solve Network Management Problems with Neo4j. Enterprise Management Associates, Retrieved on 20170722 from: https://neo4j.com/blog/solve-network-management-problems-with-neo4j/
Lindaaker T (2012) An overview of Neo4j Internals...

The rest of the chapter is locked

You have been reading a chapter from

Seven NoSQL Databases in a Week

Published in: Mar 2018Publisher: PacktISBN-13: 9781787288867

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Sudarshan Kadambi

Sudarshan has a background in Distributed systems and Database design. He has been a user and contributor to various NoSQL databases and is passionate about solving large-scale data management challenges.
Read more about Sudarshan Kadambi

Xun (Brian) Wu

Xun (Brian) Wu is a senior blockchain architect and consultant. With over 20 years of hands-on experience across various technologies, including Blockchain, big data, cloud, AI, systems, and infrastructure, Brian has worked on more than 50 projects in his career. He has authored nine books, which have been published by O'Reilly, Packt, and Apress, focusing on popular fields within the Blockchain industry. The titles of his books include: Learn Ethereum (First Edition), Learn Ethereum (Second Edition), Blockchain for Teens, Hands-On Smart Contract Development with Hyperledger Fabric V2, Hyperledger Cookbook, Blockchain Quick Start Guide, Security Tokens and Stablecoins Quick Start Guide, Blockchain by Example, and Seven NoSQL Databases in a Week.
Read more about Xun (Brian) Wu

Other recommended products

Related to this chapter

Mastering Apache Cassandra 3.x

This practical guide explains you to program and understand the power of Apache Cassandra 3.x. You will explore the integration and interaction of Cassandra components, and explore features such as the token allocation algorithm, CQL3, vnodes, lightweight transactions, and data modelling in detail.

BookOct 2018348 pages

Redis 4.x Cookbook

Redis is a popular key-value store database used commonly across many enterprises. Based on the latest version of Redis 4.x, this book provides useful recipes to help you overcome any obstacle when it comes to the different tasks associated with Redis - from working with data types to administering and troubleshooting your Redis solution.

BookFeb 2018382 pages

Amazon Web Services Bootcamp

AWS Bootcamp is designed to teach you how to build and manage AWS resources using different ways. This highly practical guide leverages the reliability, versatility, and flexible design of the AWS Cloud. It enables you to perform tasks such as hosting multi-tier websites, running large-scale applications, data storage and archival, and a lot more with ease.

BookMar 2018338 pages

Learning Neo4j 3.x

With increase in complexity of data relationships, graph databases are quickly becoming the de-facto standard for organizations who manage large volumes of connected data. This book aims at getting you started with the popular graph database Neo4j along with covering key concepts like modelling transitions, searches, traversals, relationships and protocols to navigate through complex networks of information. Also take a trip down the new and improved feature additions to version 3.x such as the APOC library, security, various plugins and extensions for spatial operations on data.

BookOct 2017316 pages

Learning Apache Cassandra

Apache Cassandra is second generation distributed NoSQL database and a popular choice for enterprises across the globe for it scalable and customizable features. This book offers you a steady learning path to understand its capabilities and develop skills to build highly reliable big data applications. This edition comes with examples to implement the new and improved features of version 3.x along with covering topics like data design considerations, tuning consistency, elastic scalability, query performance and optimizations. You’ll have gained all the skills required to become a proficient developer ready to design, create and deliver applications for organizations.

BookApr 2017360 pages

HBase High Performance Cookbook

BookJan 2017350 pages

MongoDB Fundamentals

MongoDB Fundamentals will get you started using MongoDB for data processing in a cloud computing environment. Starting with the fundamentals of NoSQL, you'll build up to learning advanced data manipulation techniques and application development with the help of hands-on case-studies.

BookDec 2020748 pages

Mastering MongoDB 3.x

MongoDB has gone from being a niche database to the king of NoSQL databases in a short time and this is no small feat. Mastering MongoDB will help you gain proficiency in developing apps using MongoDB. This book covers a range of topics such as CRUD operations, Indexing, aggregation, monitoring, sharding, cluster operations, and more. If you are a developer, architect, or DBA using MongoDB and want to be more productive when designing and administering MongoDB-backed applications, then this book can take you there in the minimum time.

BookNov 2017342 pages

Mastering MongoDB 4.x

This book will help you build expert proficiency in developing large-scale applications using MongoDB 4.x. You will master CRUD operations and perform tasks such as indexing, aggregation, monitoring, sharding, cluster management, and administration. You take building and administering scalable MongoDB applications to the next level.

BookMar 2019394 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages