Reader small image

You're reading from  Seven NoSQL Databases in a Week

Product typeBook
Published inMar 2018
PublisherPackt
ISBN-139781787288867
Edition1st Edition
Right arrow
Authors (2):
Sudarshan Kadambi
Sudarshan Kadambi
author image
Sudarshan Kadambi

Sudarshan has a background in Distributed systems and Database design. He has been a user and contributor to various NoSQL databases and is passionate about solving large-scale data management challenges.
Read more about Sudarshan Kadambi

Xun (Brian) Wu
Xun (Brian) Wu
author image
Xun (Brian) Wu

Xun (Brian) Wu is a senior blockchain architect and consultant. With over 20 years of hands-on experience across various technologies, including Blockchain, big data, cloud, AI, systems, and infrastructure, Brian has worked on more than 50 projects in his career. He has authored nine books, which have been published by O'Reilly, Packt, and Apress, focusing on popular fields within the Blockchain industry. The titles of his books include: Learn Ethereum (First Edition), Learn Ethereum (Second Edition), Blockchain for Teens, Hands-On Smart Contract Development with Hyperledger Fabric V2, Hyperledger Cookbook, Blockchain Quick Start Guide, Security Tokens and Stablecoins Quick Start Guide, Blockchain by Example, and Seven NoSQL Databases in a Week.
Read more about Xun (Brian) Wu

View More author details
Right arrow

Chapter 3. Neo4j

Some application use cases or data models may place as much (or more) importance on the relationships between entities as the entities themselves. When this is the case, a graph database may be the optimal choice for data storage. In this chapter, we will look at Neo4j, one of the most commonly used graph databases.

Over the course of this chapter, we will discuss several aspects of Neo4j:

  • Useful features
  • Appropriate use cases
  • Anti-patterns and pitfalls
  • Ways of using Neo4j with languages such as:
    • Cypher
    • Python
    • Java

Once you have completed this chapter, you will begin to understand the significance of graph databases. You will have worked through installing and configuring Neo4j as you build up your own server. You will have employed simple scripts and code to interact with and utilize Neo4j, allowing you to further explore ideas around modeling interconnected data.

We'll start with a quick introduction to Neo4j. From there, we will move on to the appropriate graph database use cases...

What is Neo4j?


Neo4j is an open source, distributed data store used to model graph problems. It was released in 2007 and is sponsored by Neo4j, Inc., which also offers enterprise licensing and support for Neo4j. It departs from the traditional nomenclature of database technologies, in which entities are stored in schema-less, entity-like structures called nodes. Nodes are connected to other nodes via relationships or edges. Nodes can also be grouped together with optional structures called labels.

This relationship-centric approach to data modeling is known as the property graph model. Under the property graph model, both nodes and edges can have properties to store values. Neo4j embraces this approach. It is designed to ensure that nodes and edges are stored efficiently, and that nodes can share any number or type of relationships without sacrificing performance.[8]

How does Neo4j work?


Neo4j stores nodes, edges, and properties on disk in stores that are specific to each type—for example, nodes are stored in the node store.[5, s.11] They are also stored in two types of caches—the file system (FS) and the node/relationship caches. The FS cache is divided into regions for each type of store, and data is evicted on a least-frequently-used (LFU) policy.

Data is written in transactions assembled from commands and sorted to obtain a predictable update order. Commands are sorted at the time of creation, with the aim of preserving consistency. Writes are added to the transaction log and either marked as committed or rolled back (in the event of a failure). Changes are then applied (in sorted order) to the store files on disk.

It is important to note that transactions in Neo4j dictate the state and are therefore idempotent by nature.[5, s.34] They do not directly modify the data. Reapplying transactions for a recovery event simply replays the transactions as of...

Features of Neo4j


Aside from its support of the property graph model, Neo4j has several other features that make it a desirable data store. Here, we will examine some of those features and discuss how they can be utilized in a successful Neo4j cluster.

Clustering

Enterprise Neo4j offers horizontal scaling through two types of clustering. The first is the typical high-availability clustering, in which several slave servers process data overseen by an elected master. In the event that one of the instances should fail, a new master is chosen.

The second type of clustering is known as causal clustering. This option provides additional features, such as disposable read replicas and built-in load balancing, that help abstract the distributed nature of the clustered database from the developer. It also supports causal consistency, which aims to support Atomicity Consistency Isolation and Durability (ACID) compliant consistency in use cases where eventual consistency becomes problematic. Essentially...

Evaluating your use case


Because of Neo4j's focus on node/edge traversal, it is a good fit for use cases requiring analysis and examination of relationships. The property graph model helps to define those relationships in meaningful ways, enabling the user to make informed decisions. Bearing that in mind, there are several use cases for Neo4j (and other graph databases) that seem to fit naturally.

Social networks

Social networks seem to be a natural fit for graph databases. Individuals have friends, attend events, check in to geographical locations, create posts, and send messages. All of these different aspects can be tracked and managed with a graph database such as Neo4j.

Who can see a certain person's posts? Friends? Friends of friends? Who will be attending a certain event? How is a person connected to others attending the same event? In small numbers, these problems could be solved with a number of data stores. But what about an event with several thousand people attending, where each...

Neo4j anti-patterns


Relative to other NoSQL databases, Neo4j does not have a lot of anti-patterns. However, there are some common troubles that seem to befall new users, and we will try to detail them here.

Applying relational modeling techniques in Neo4j

Using relational modeling techniques can lead to trouble with almost every NoSQL database, and Neo4j is no exception to that rule. Similar to other NoSQL databases, building efficient models in Neo4j involves appropriately modeling the required queries. Relational modeling requires you to focus on how your data is stored, and not as much on how it is queried or returned.

Whereas modeling for Neo4j requires you to focus on what your nodes are, and how they are related to each other. Additionally, the relationships should be dependent on the types of questions (queries) that your model will be answering. Failure to apply the proper amount of focus on your data model can lead to performance and operational troubles later.

Using Neo4j for the first...

Neo4j hardware selection, installation, and configuration


Building your Neo4j instance(s) with the right hardware is essential to running a successful cluster. Neo4j runs best when there is plenty of RAM at its disposal.

Random access memory

One aspect to consider is that Neo4j runs on a Java virtual machine (JVM). This means that you need to have at least enough random-access memory (RAM) to hold the JVM heap, plus extra for other operating system processes. While Neo4j can be made to run on as little as 2 GB of RAM, a memory size of 32 GB of RAM (or more) is recommended for production workloads. This will allow you to configure your instances to map as much data into memory as possible, leading to optimal performance.

CPU

Neo4j supports both x86 and OpenPOWER architectures. It requires at least an Intel Core i3, while an Intel Core i7 or IBM POWER8 is recommended for production.

Disk

As with most data store technologies, disk I/O is a potential performance bottleneck. Therefore, it is recommended...

Using Neo4j


You should now be able to start your Neo4j server process in the foreground:

bin/neo4j console

This yields the following output:

Active database: graph.dbDirectories in use:  home:         /local/neo4jconfig:       /local/neo4j/conf  logs:         /local/neo4j/logs  plugins:      /local/neo4j/plugins  import:       /local/neo4j/import  data:         /local/neo4j/data  certificates: /local/neo4j/certificates  run:          /local/neo4j/runStarting Neo4j.2017-07-09 17:10:05.300+0000 INFO  ======== Neo4j 3.2.2 ========2017-07-09 17:10:05.342+0000 INFO  Starting...2017-07-09 17:10:06.464+0000 INFO  Bolt enabled on 192.168.0.100:7687.2017-07-09 17:10:09.576+0000 INFO  Started.2017-07-09 17:10:10.982+0000 INFO  Remote interface available at http://192.168.0.100:7474/

Alternatively, Neo4j can be started with the start command (instead of console) to run the process in the background. For this, current logs for the server process can be obtained by tailing the log/debug.log file:

tail -f...

Tips for success


  • Run Neo4j on Linux or BSD
  • Take advantage of the training options offered by Neo4j, Inc.
  • Talk to others in the community
  • Don't use Neo4j for the first time on something mission-critical
  • Recruit someone to your team who has graph database experience
  • Once in production, continue to monitor your instances' JVMs for GC performance and tune as necessary

As with all NoSQL data stores, it is important to remember that Neo4j is not a general-purpose database. It works well with specific use cases. Usually such cases are where the relationship is as (or more) important as the entities that it connects. To that end, Neo4j makes a great fit for things such as social networks, matchmaking sites, network management systems, and recommendation engines.

Equally as important as applying Neo4j to a proper use case is knowing what Neo4j antipatterns look like. Be sure to avoid using Neo4j with a full relational model (it is not a RDBMS). Try to avoid improper use of relationship types, as well as...

Summary


In this chapter, we have introduced the Neo4j database and how to use it with relationship-based modeling problems. One of the main advantages of Neo4j is the robust tutorial and help system that can be used with Neo4j Browser. It is the author's opinion that more databases should follow Neo4j's example, intrinsically providing intuitive examples and ways to get started. This can certainly improve both the adoption of the technology and proper use case selection.

One aspect of Neo4j that this chapter has spent some time discussing are the subtle differences between the Community and Enterprise Editions. The Community Edition may contain enough of a feature set to develop a prototype or demonstrate a use case. However, if features such as hot backups, security integration, and clustering for heavy operational workloads are required, the Enterprise Edition should be given serious consideration. Also, if your team is new to Neo4j or graph databases in general, an enterprise support contract...

References 


  1. Armbruster S (San Francisco,CA, 2016) Welcome to the Dark Side: Neo4j Worst Practices (& How to Avoid Them), Neo4j Blog-Originally presented at GraphConnect San Francisco. Retrieved on 20170723 from: https://neo4j.com/blog/dark-side-neo4j-worst-practices/
  2. Bachman M (London, 2013) GraphAware: Towards Online Analytical Processing in Graph Databases. Imperial College London, section 2.5, pp 13-14. Retrieved on 20170722 from: https://graphaware.com/assets/bachman-msc-thesis.pdf
  3. Brewer E., Fox, A (Berkeley, CA, 1999) Harvest, Yield, and Scalable Tolerant Systems. University of California at Berkeley, Doi: 10.1.1.24.3690. Retrieved on 20170530 from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3690&rep=rep1&type=pdf
  4. EMA (2015) Interview: Solve Network Management Problems with Neo4j. Enterprise Management Associates, Retrieved on 20170722 from: https://neo4j.com/blog/solve-network-management-problems-with-neo4j/
  5. Lindaaker T (2012) An overview of Neo4j Internals...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Seven NoSQL Databases in a Week
Published in: Mar 2018Publisher: PacktISBN-13: 9781787288867
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Sudarshan Kadambi

Sudarshan has a background in Distributed systems and Database design. He has been a user and contributor to various NoSQL databases and is passionate about solving large-scale data management challenges.
Read more about Sudarshan Kadambi

author image
Xun (Brian) Wu

Xun (Brian) Wu is a senior blockchain architect and consultant. With over 20 years of hands-on experience across various technologies, including Blockchain, big data, cloud, AI, systems, and infrastructure, Brian has worked on more than 50 projects in his career. He has authored nine books, which have been published by O'Reilly, Packt, and Apress, focusing on popular fields within the Blockchain industry. The titles of his books include: Learn Ethereum (First Edition), Learn Ethereum (Second Edition), Blockchain for Teens, Hands-On Smart Contract Development with Hyperledger Fabric V2, Hyperledger Cookbook, Blockchain Quick Start Guide, Security Tokens and Stablecoins Quick Start Guide, Blockchain by Example, and Seven NoSQL Databases in a Week.
Read more about Xun (Brian) Wu